Difference between revisions of "CSC352 DT's Class Notes 2013"
Line 43: | Line 43: | ||
* test with delay: http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?delay=20&search='local%20government' | * test with delay: http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?delay=20&search='local%20government' | ||
Where delay is number of 1/10s of a second to wait. This is a bound as the true delay is random between 0.1 sec and the integer specified times 1/10 seconds.) | Where delay is number of 1/10s of a second to wait. This is a bound as the true delay is random between 0.1 sec and the integer specified times 1/10 seconds.) | ||
+ | |||
+ | ==Project== | ||
+ | |||
+ | ;Project 1: | ||
+ | :Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1. Filter the docs received and keep only those that contain most of the words in List2. | ||
+ | |||
+ | ;Project 2: | ||
+ | :XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)! | ||
+ | |||
+ | ;Project 3: | ||
+ | :Map-Reduce: process wikipedia pages and create an index of words and their associated categories | ||
==Papers== | ==Papers== |