CSC352 Homework 4
This homework is due 4/20/10, at midnight.
Problem #1
Below is the timeline of the WordCount program running on 4 xml files of 180 MB each on a cluster of 6 single-core Linux servers. (data available here). The files are in the HDFS on our cluster, in wikipages/block.
- Question
- Explain the camel back of the time line, in particular the dip in the middle of the Map tasks.