Difference between revisions of "CSC352 MapReduce/Hadoop Class Notes"
(6 intermediate revisions by the same user not shown) | |||
Line 197: | Line 197: | ||
[[Image:ComputerLogo.png|100px|right]] | [[Image:ComputerLogo.png|100px|right]] | ||
;Lab Experiment 1 | ;Lab Experiment 1 | ||
− | :Jump to the first Hadoop/MapReduce [[Hadoop_Tutorial_1_--_Running_WordCount | Lab #1]]! | + | :Jump to the first Hadoop/MapReduce [[Hadoop_Tutorial_1_--_Running_WordCount | Lab #1]]! Run all the sections up to, but not including Section 4. |
</greenbox> | </greenbox> | ||
Line 389: | Line 389: | ||
</pre></code> | </pre></code> | ||
|} | |} | ||
+ | |||
+ | =Compiling Your Own Version of the WordCount Program= | ||
+ | |||
+ | * This is illustrated and explained in Section 4 of [[Hadoop_Tutorial_1_--_Running_WordCount#Running_Your_Own_Version_of_WordCount.java Tutorial #1 | Tutorial #1: Compiling your own version of WordCoung.java]] | ||
+ | |||
+ | =How does hadoop on 6 compare to Linux on 1?= | ||
+ | |||
+ | * This is very interesting! | ||
+ | |||
+ | <greenbox> | ||
+ | [[Image:ComputerLogo.png|100px|right]] | ||
+ | ;Lab Experiment 2 | ||
+ | :Jump to the Section 5 of the [[Hadoop_Tutorial_1_--_Running_WordCount | Hadoop Lab #1]] and see how Hadoop compares with basic Linux for Ulysses, and for Ulysses plus 5 other books | ||
+ | |||
+ | |||
+ | </greenbox> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | ;Question 1 | ||
+ | : Comment on the timing you observe, for 1 book, and for 6 books. | ||
+ | |||
+ | ;Question 2 | ||
+ | : There a 4 large files in the HDFS, in '''wikipages/block/'''. Each is approximately 180 MByte in size. Run another experiment and compare the execution time of hadoop on the 4 files (~3/4 GByte) and of one of the Linux boxes on the same 4 files using Linux commands. Compare the execution times again. | ||
+ | =Generating Task Timelines= | ||
+ | |||
+ | <br /> | ||
+ | <br /> | ||
+ | <greenbox> | ||
+ | [[Image:ComputerLogo.png|right |100px]] | ||
+ | ;Lab Experiment #3: | ||
+ | : [[Hadoop Tutorial 1.1 -- Generating Task Timelines | Tutorial 1.1]] on generating '''Timelines'''. | ||
+ | |||
+ | </greenbox> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | =Debugging/Testing using Counters= | ||
+ | |||
+ | Section 6 of [[Hadoop_Tutorial_1_--_Running_WordCount#Counters | Tutorial #1]] shows how to create counters. Hadoop Counters are special variables that are gathered after each task runs and the values are accumulated and reported at the end and during the computation. They are useful for counting quantities such as amount of data processed, number of tasks executed, etc. | ||
+ | |||
+ | <br /> | ||
+ | <br /> | ||
+ | <greenbox> | ||
+ | [[Image:ComputerLogo.png|right |100px]] | ||
+ | ;Lab Experiment #4: | ||
+ | : [[Hadoop_Tutorial_1_--_Running_WordCount#Counters | Tutorial 1 on Counters]]. Create counters in your Java version of WordCount and count the number of Map tasks and the number of Reduce tasks. | ||
+ | |||
+ | </greenbox> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
+ | |||
+ | =Running WordCount in Python= | ||
+ | |||
+ | <br /> | ||
+ | <br /> | ||
+ | <greenbox> | ||
+ | [[Image:ComputerLogo.png|right |100px]] | ||
+ | ;Lab Experiment #5: | ||
+ | : [[Hadoop Tutorial 2 -- Running WordCount in Python | Tutorial 2]] on running Python programs with MapReduce/Hadoop. | ||
+ | |||
+ | </greenbox> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | |||
=References= | =References= | ||
Line 408: | Line 474: | ||
<br /> | <br /> | ||
<br /> | <br /> | ||
− | [[Category:CSC352]][[Category:MapReduce]][[Category:Hadoop]] | + | [[Category:CSC352]][[Category:Class Notes]][[Category:MapReduce]][[Category:Hadoop]] |
Latest revision as of 08:16, 6 April 2010