CSC352 Homework 3
Programming the XGrid
The class decided on the contents of this homework, and its due date: March 30th.
Problem Statement
Process N wiki pages, and for each one
- keep track of the categories contained in the page
- find the 5 most frequent words (not including stop words)
- associate with each category the most frequent words that have been associated with it over the N pages processed
- output the result (or a sample of it)
- measure the execution time of the program
- write a summary of it as illustrated in the guidelines presented in class (3/9, 3/11).
Details
The details of how to obtain the Ids of wiki pages, and fetch wiki pages is presented in the XGrid Lab 2.