PARALLEL EFFICIENCY OF MAP-REDUCE It is worth trying to figure out what is the parallel efficiency of MapReduce. Now let us assume that the data produced after the map phase is σ times the original data size D (σD), further, we assume that there are P processors which in turn perform map and reduce operations depending on which phase they are used in, so we do not have any wastage of processors. Also, the algorithm itself is assumed to do wD useful work, w is not necessarily a constant it could be D^2 or something like that, but the point is that there is some amount of useful work being done even if you have a single processor and that's wD. Now let us look at the overheads of doing the computation wD using MapReduce. After the map operation, instead of D data items, we have σD data items in P pieces, so each mapper writes data to their local disk. So, P there is some overhead associated with writing this data. Next, this data has to be read by each reducer before it can begin ...
Comments
Post a Comment