Difference between revisions of "CSC352 MapReduce/Hadoop Class Notes"

Revision as of 17:55, 31 March 2010

This section is only visible to computers located at Smith College

@@ Line 119: / Line 119: @@
 * '''Each map task runs the user-defined map function for each record of a split'''.
+* Hadoop does its best to run the map task on the node where the split resides, '''but it is not always the case'''.
+* The '''sorted''' map outputs are transfered across the network to where the reduce task is running.  These '''sorted''' outputs  are '''merged''' and fed to the user-defined '''reduce function.'''
+* The '''output''' of the '''reduce task''' is stored in the '''HDFS'''.
+* When they are many reducers, the map tasks '''partition''' their output into '''partitions'''.  There is '''one''' partition per '''reduce task'''.
+=== Examples of Data Flows===
+<center>
+[[Image:MapReduceDataFlowOneReduce.png]]
+</center>
+<center>
+[[Image:MapReduceDataFlowTwoReduces.png]]
+</center>
+<center>
+[[Image:MapReduceDataFlowNoReduce.png]]
+</center>