--D. Thiebaut (talk) 13:25, 14 November 2016 (EST)

2013

Threads

good example with multiple ping processes: [1]
multi-core not used by python [2]

Programs

Setting up documents and swish-e

(Note: there are 2 other alternatives: sphinx and zend-lucene. Sphinx requires data in xml form or in mysql database)

all operations on xgridmac
downloaded & installed swish-e. Install dir is ~thiebaut/research/swish-e/
downloaded & installed www.etext.org in http://xgridmac.dyndns.org/~thiebaut/www_etext_org/
swish-e index stored in ~thiebaut/Site/swish-e
added swishe.php in ~thiebaut/Site/swish-e/
test:

   cd 
   cd Site/swish-e
   php swishe.php search=love
  ...
 <br>
 <br>rank:   20
 <br>score:  809
 <br>url:    http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html
 <br>link:   <a href="http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html">link</a>
 <br>file:   Keys2LovingUnity.html
 <br>offset: 47813
 <br>

test on Web at url http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?search=love
test with delay: http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?delay=20&search='local%20government'

Where delay is number of 1/10s of a second to wait. This is a bound as the true delay is random between 0.1 sec and the integer specified times 1/10 seconds.)

Project

Project 1: Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1. Filter the docs received and keep only those that contain most of the words in List2.

Project 2: XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)!

Project 3: Map-Reduce: process wikipedia pages and create an index of words and their associated categories

Papers

Notes on a View from Berkeley paper

2017

Ideas

Latex still important
last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover. Need 10-min presentation plus 2 page prospectus.
- GPU
- Deeplearning
- Top 500. Why, what, what do we learn, lessons?
- CUDA
- OpenMP
- Debugging parallel programming
- Tensorflow
- Vampir: Trace Analyzer tool
- TotalView: Debugger

C/C++ tutorial. Still good
Optimization options in C -O, -O2, -O3

Resources

Spark

Apache Spark: A Unified Engine for Big Data Processing, Matei Zaharia et al (pdf)

CSC352 2017 DT's Notes

Contents

2013

Threads

Programs

Setting up documents and swish-e

Project

Papers

2017

Ideas

Resources

Spark

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools