CSC352 2017 DT's Notes
--D. Thiebaut (talk) 13:25, 14 November 2016 (EST)
<onlydft>
Contents
2013
Threads
Programs
- NQueens.py
- threadedNQueens.py
- classExample1.py
- classExample2.py
- classExample3.py
- serialPing.py
- threadedPing.py
- searchKeywordsRetrieveEtexts.py
- threadedSearchKeywordRetrieveEtexts.py
Setting up documents and swish-e
(Note: there are 2 other alternatives: sphinx and zend-lucene. Sphinx requires data in xml form or in mysql database)
- all operations on xgridmac
- downloaded & installed swish-e. Install dir is ~thiebaut/research/swish-e/
- downloaded & installed www.etext.org in http://xgridmac.dyndns.org/~thiebaut/www_etext_org/
- swish-e index stored in ~thiebaut/Site/swish-e
- added swishe.php in ~thiebaut/Site/swish-e/
- test:
cd
cd Site/swish-e
php swishe.php search=love
...
<br>
<br>rank: 20
<br>score: 809
<br>url: http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html
<br>link: <a href="http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html">link</a>
<br>file: Keys2LovingUnity.html
<br>offset: 47813
<br>
- test on Web at url http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?search=love
- test with delay: http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?delay=20&search='local%20government'
Where delay is number of 1/10s of a second to wait. This is a bound as the true delay is random between 0.1 sec and the integer specified times 1/10 seconds.)
Project
- Project 1
- Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1. Filter the docs received and keep only those that contain most of the words in List2.
- Project 2
- XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)!
- Project 3
- Map-Reduce: process wikipedia pages and create an index of words and their associated categories
Papers
Notes on a View from Berkeley paper
2017
Ideas
- Latex still important
- last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover. Need 10-min presentation plus 2 page prospectus.
- GPU
- Deeplearning
- Top 500. Why, what, what do we learn, lessons?
- CUDA
- OpenMP
- Debugging parallel programming
- Tensorflow
- Vampir: Trace Analyzer tool
- TotalView: Debugger
- C/C++ tutorial. Still good
- Optimization options in C -O, -O2, -O3
Resources
Spark
- Apache Spark: A Unified Engine for Big Data Processing, Matei Zaharia et al (pdf)
<onlydft>