Difference between revisions of "CSC352 MapReduce/Hadoop Class Notes"

From dftwiki3
Jump to: navigation, search
Line 93: Line 93:
 
Taken from <ref name="hadoopGuide">[http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/0596521979 Hadoop, the definitive guide], Tim White, O'Reilly Media, June 2009, ISBN 0596521979. The Web site for the book is http://www.hadoopbook.com/</ref>
 
Taken from <ref name="hadoopGuide">[http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/0596521979 Hadoop, the definitive guide], Tim White, O'Reilly Media, June 2009, ISBN 0596521979. The Web site for the book is http://www.hadoopbook.com/</ref>
  
* A '''Map Reduce job''' is a unit of work submitted by a client
+
* A '''Map Reduce job''' is a unit of work submitted by a client
 +
 
 +
 
 +
* A '''job''' contains
 +
 
 +
** the data
 +
 
 +
** a MapReduce program
 +
 
 +
** a configuration
  
  
Line 100: Line 109:
  
 
* There are two types of '''nodes''': a '''jobTracker''' node, which oversees the execution of a '''job''', and '''taskTraker''' nodes that execute '''tasks'''.
 
* There are two types of '''nodes''': a '''jobTracker''' node, which oversees the execution of a '''job''', and '''taskTraker''' nodes that execute '''tasks'''.
 +
 +
 +
* Hadoop divides the '''input''' into '''splits'''.
 +
 +
 +
* '''Hadoop creates one map task for each split'''
 +
 +
 +
* '''Each map task runs the user-defined map function for each record of a split'''.
  
  

Revision as of 17:36, 31 March 2010


This section is only visible to computers located at Smith College