CSC352 Homework 3

From dftwiki3
Revision as of 15:34, 9 March 2010 by Thiebaut (talk | contribs) (Created page with '==Programming the XGrid== <bluebox> The class decided on the contents of this homework, and its due date: March 30th. </bluebox> <br /> <br /> <br /> <br /> ==Problem Statement…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Programming the XGrid

The class decided on the contents of this homework, and its due date: March 30th.





Problem Statement

Process N wiki pages, and for each one

  • keep track of the categories contained in the page
  • find the 5 most frequent words (not including stop words)
  • associate with each category the most frequent words that have been associated with it over the N pages processed
  • output the result (or a sample of it)
  • measure the execution time of the program
  • write a summary of it as illustrated in the guidelines presented in class (3/9, 3/11).

Details

The details of how to obtain the Ids of wiki pages, and fetch wiki pages is presented in the XGrid Lab 2.