Difference between revisions of "CSC352 MapReduce/Hadoop Class Notes"

Revision as of 17:36, 31 March 2010

This section is only visible to computers located at Smith College

@@ Line 93: / Line 93: @@
 Taken from <ref name="hadoopGuide">[http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/0596521979 Hadoop, the definitive guide], Tim White, O'Reilly Media, June 2009, ISBN 0596521979. The Web site for the book is http://www.hadoopbook.com/</ref>
-* A '''Map Reduce job''' is a unit of work submitted by a client
+* A '''Map Reduce job''' is a unit of work submitted by a client.
+* A '''job''' contains
+** the data
+** a MapReduce program
+** a configuration
@@ Line 100: / Line 109: @@
 * There are two types of '''nodes''': a '''jobTracker''' node, which oversees the execution of a '''job''', and '''taskTraker''' nodes that execute '''tasks'''.
+* Hadoop divides the '''input''' into '''splits'''.
+* '''Hadoop creates one map task for each split'''
+* '''Each map task runs the user-defined map function for each record of a split'''.