Difference between revisions of "CSC352 MapReduce/Hadoop Class Notes"

From dftwiki3
Jump to: navigation, search
 
(One intermediate revision by the same user not shown)
Line 394: Line 394:
 
* This is illustrated and explained in Section 4 of  [[Hadoop_Tutorial_1_--_Running_WordCount#Running_Your_Own_Version_of_WordCount.java Tutorial #1 | Tutorial #1: Compiling your own version of WordCoung.java]]  
 
* This is illustrated and explained in Section 4 of  [[Hadoop_Tutorial_1_--_Running_WordCount#Running_Your_Own_Version_of_WordCount.java Tutorial #1 | Tutorial #1: Compiling your own version of WordCoung.java]]  
  
=How does hadoop compare on Ulysses.txt versus a single Linux machine?=
+
=How does hadoop on 6 compare to Linux on 1?=
  
 
* This is very interesting!   
 
* This is very interesting!   
Line 401: Line 401:
 
[[Image:ComputerLogo.png|100px|right]]
 
[[Image:ComputerLogo.png|100px|right]]
 
;Lab Experiment 2
 
;Lab Experiment 2
:Jump to the Section 5 of the  [[Hadoop_Tutorial_1_--_Running_WordCount | Hadoop Lab #1]] and see how Hadoop compares with basic Linux.
+
:Jump to the Section 5 of the  [[Hadoop_Tutorial_1_--_Running_WordCount | Hadoop Lab #1]] and see how Hadoop compares with basic Linux for Ulysses, and for Ulysses plus 5 other books
  
  
Line 408: Line 408:
 
<br />
 
<br />
  
 +
;Question 1
 +
: Comment on the timing you observe, for 1 book, and for 6 books.
  
 +
;Question 2
 +
: There a 4 large files in the HDFS, in '''wikipages/block/'''.  Each is approximately 180 MByte in size.  Run another experiment and  compare the execution time of hadoop on the 4 files (~3/4 GByte) and of one of the Linux boxes on the same 4 files using Linux commands.  Compare the execution times again.
 
=Generating Task Timelines=
 
=Generating Task Timelines=
  
Line 421: Line 425:
 
<br />
 
<br />
 
<br />
 
<br />
 +
 +
=Debugging/Testing using Counters=
 +
 +
Section 6 of [[Hadoop_Tutorial_1_--_Running_WordCount#Counters | Tutorial #1]] shows how to create counters.  Hadoop Counters are special variables that are gathered after each task runs and the values are accumulated and reported at the end and during the computation.  They are useful for counting quantities such as amount of data processed, number of tasks executed, etc.
 +
 +
<br />
 +
<br />
 +
<greenbox>
 +
[[Image:ComputerLogo.png|right |100px]]
 +
;Lab Experiment #4:
 +
: [[Hadoop_Tutorial_1_--_Running_WordCount#Counters | Tutorial 1 on Counters]].  Create counters in your Java version of WordCount and count the number of Map tasks and the number of Reduce tasks.
 +
 +
</greenbox>
 +
<br />
 +
<br />
 +
  
 
=Running WordCount in Python=
 
=Running WordCount in Python=
Line 428: Line 448:
 
<greenbox>
 
<greenbox>
 
[[Image:ComputerLogo.png|right |100px]]
 
[[Image:ComputerLogo.png|right |100px]]
;Lab Experiment #4:  
+
;Lab Experiment #5:  
 
: [[Hadoop Tutorial 2 -- Running WordCount in Python | Tutorial 2]] on running Python programs with MapReduce/Hadoop.
 
: [[Hadoop Tutorial 2 -- Running WordCount in Python | Tutorial 2]] on running Python programs with MapReduce/Hadoop.
  

Latest revision as of 08:16, 6 April 2010


This section is only visible to computers located at Smith College