Difference between revisions of "CSC352 MapReduce/Hadoop Class Notes"
Line 93: | Line 93: | ||
Taken from <ref name="hadoopGuide">[http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/0596521979 Hadoop, the definitive guide], Tim White, O'Reilly Media, June 2009, ISBN 0596521979. The Web site for the book is http://www.hadoopbook.com/</ref> | Taken from <ref name="hadoopGuide">[http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/0596521979 Hadoop, the definitive guide], Tim White, O'Reilly Media, June 2009, ISBN 0596521979. The Web site for the book is http://www.hadoopbook.com/</ref> | ||
− | * A '''Map Reduce job''' is a unit of work submitted by a client | + | * A '''Map Reduce job''' is a unit of work submitted by a client. |
+ | |||
+ | |||
+ | * A '''job''' contains | ||
+ | |||
+ | ** the data | ||
+ | |||
+ | ** a MapReduce program | ||
+ | |||
+ | ** a configuration | ||
Line 100: | Line 109: | ||
* There are two types of '''nodes''': a '''jobTracker''' node, which oversees the execution of a '''job''', and '''taskTraker''' nodes that execute '''tasks'''. | * There are two types of '''nodes''': a '''jobTracker''' node, which oversees the execution of a '''job''', and '''taskTraker''' nodes that execute '''tasks'''. | ||
+ | |||
+ | |||
+ | * Hadoop divides the '''input''' into '''splits'''. | ||
+ | |||
+ | |||
+ | * '''Hadoop creates one map task for each split''' | ||
+ | |||
+ | |||
+ | * '''Each map task runs the user-defined map function for each record of a split'''. | ||
Revision as of 17:36, 31 March 2010