Difference between revisions of "CSC352 Project 2"
Line 2: | Line 2: | ||
<onlysmith> | <onlysmith> | ||
+ | __TOC__ | ||
+ | |||
+ | <bluebox> | ||
+ | This is the extension of [[CSC352_Homework_3 | Homework #3]], which is built on top of the [[XGrid Tutorial Part 2: Processing Wikipedia Pages | XGrid Lab 2]]. | ||
+ | </bluebox> | ||
+ | |||
+ | =Assignment= | ||
+ | |||
+ | * Process N wiki pages, and for each one keep track of the categories contained in the page find the 5 most frequent words (not including stop words) in the page. | ||
+ | * Associate with each category the most frequent words that have been associated with it over the N pages processed | ||
+ | output the result (or a sample of it) | ||
+ | * Measure the execution time of the program | ||
+ | * write a summary of it as illustrated in the guidelines presented in class (3/9, 3/11). | ||
+ | * For this project, build on top of the homework and concentrate on the formatting of the project, and include graphs, and an analysis of your results. | ||
+ | * Submit a pdf with your presentation, graphs, and analysis. Submit your programs, even if they are the same as the files you submitted for the homework. | ||
+ | |||
+ | submit project2 file1 | ||
+ | submit project2 file2 | ||
+ | ... | ||
+ | |||
+ | =Project Details= | ||
==Accessing Wiki Pages== | ==Accessing Wiki Pages== | ||
Revision as of 07:42, 23 March 2010
This project is currently under construction...