Difference between revisions of "CSC352 Problem of the Day"
(→Homework #4, Problem #2) |
(→Homework #4, Problem #2) |
||
Line 35: | Line 35: | ||
* Discuss these results | * Discuss these results | ||
− | * If you were to add another column to this table, what quantity would you add? | + | * If you were to add another column to this table, what quantity would you add? |
* Identify the parties responsible for this surprising difference | * Identify the parties responsible for this surprising difference | ||
− | + | <font color="white">I would add a column showing the number of splits. I think the main culprit is the HDFS, and the fact that a lot of information has to flow through one ethernet switch that is pretty old, and slow...</font> | |
<br /> | <br /> | ||
<br /> | <br /> |
Latest revision as of 09:20, 27 April 2010
Homework #4, Problem #2
- Conditions:
- Processing of wiki pages with Hadoop on 6-PC cluster
- Same Mapper and same Reducer program to process two different input folders
Number of files | Number of wiki pages | Number of categories | Execution Time (seconds) |
---|---|---|---|
589 | 589 | 832 | 388 |
1 | 117,617 | 51,120 | 30.7 |
Ratio=589/1 | Ratio=1/199 | Ratio=1/61.4 | Ratio=12.6/1 |
- Discuss these results
- If you were to add another column to this table, what quantity would you add?
- Identify the parties responsible for this surprising difference
I would add a column showing the number of splits. I think the main culprit is the HDFS, and the fact that a lot of information has to flow through one ethernet switch that is pretty old, and slow...