Difference between revisions of "CSC352 Project 2"

From dftwiki3
Jump to: navigation, search
Line 2: Line 2:
 
<onlysmith>
 
<onlysmith>
  
 +
__TOC__
 +
 +
<bluebox>
 +
This is the extension of [[CSC352_Homework_3 | Homework #3]], which is built on top of the [[XGrid Tutorial Part 2: Processing Wikipedia Pages | XGrid Lab 2]].
 +
</bluebox>
 +
 +
=Assignment=
 +
 +
* Process N wiki pages, and for each one keep track of the categories contained in the page find the 5 most frequent words (not including stop words) in the page.
 +
* Associate with each category the most frequent words that have been associated with it over the N pages processed
 +
output the result (or a sample of it)
 +
* Measure the execution time of the program
 +
* write a summary of it as illustrated in the guidelines presented in class (3/9, 3/11).
 +
* For this project, build on top of the homework and concentrate on the formatting of the project, and include graphs, and an analysis of your results.
 +
* Submit a pdf with your presentation, graphs, and analysis.  Submit your programs, even if they are the same as the files you submitted for the homework.
 +
 +
    submit project2 file1
 +
    submit project2 file2
 +
    ...
 +
 +
=Project Details=
 
==Accessing Wiki Pages==
 
==Accessing Wiki Pages==
  

Revision as of 08:42, 23 March 2010

This project is currently under construction...


This section is only visible to computers located at Smith College