Difference between revisions of "CSC352 Problem of the Day"

From dftwiki3
Jump to: navigation, search
(Homework #4, Problem #2)
(Homework #4, Problem #2)
 
(One intermediate revision by the same user not shown)
Line 35: Line 35:
  
 
* Discuss these results
 
* Discuss these results
* If you were to add another column to this table, what quantity would you add?
+
* If you were to add another column to this table, what quantity would you add?  
 
* Identify the parties responsible for this surprising difference
 
* Identify the parties responsible for this surprising difference
 +
<font color="white">I would add a column showing the number of splits.  I think the main culprit is the HDFS, and the fact that a lot of information has to flow through one ethernet switch that is pretty old, and slow...</font>
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
 +
[[Category:CSC352]][[Category:Hadoop]][[Category:MapReduce]]

Latest revision as of 09:20, 27 April 2010

Homework #4, Problem #2

  • Conditions:
    • Processing of wiki pages with Hadoop on 6-PC cluster
    • Same Mapper and same Reducer program to process two different input folders



Number of files Number of wiki pages Number of categories Execution Time
(seconds)
589 589 832 388
1 117,617 51,120 30.7
Ratio=589/1 Ratio=1/199 Ratio=1/61.4 Ratio=12.6/1



  • Discuss these results
  • If you were to add another column to this table, what quantity would you add?
  • Identify the parties responsible for this surprising difference

I would add a column showing the number of splits. I think the main culprit is the HDFS, and the fact that a lot of information has to flow through one ethernet switch that is pretty old, and slow...