CSC352 Problem of the Day

From dftwiki3
Revision as of 09:17, 27 April 2010 by Thiebaut (talk | contribs) (Homework #4, Problem #2)
Jump to: navigation, search

Homework #4, Problem #2

  • Conditions:
    • Processing of wiki pages with Hadoop on 6-PC cluster
    • Same Mapper and same Reducer program to process two different input folders



Number of files Number of wiki pages Number of categories Execution Time
(seconds)
589 589 832 388
1 117,617 51,120 30.7
Ratio=589/1 Ratio=1/199 Ratio=1/61.4 Ratio=12.6/1



  • Discuss these results
  • If you were to add another column to this table, what quantity would you add?
  • Identify the parties responsible for this surprising difference