CSC352 2017 DT's Notes

From dftwiki3
Revision as of 15:02, 7 December 2016 by Thiebaut (talk | contribs)
Jump to: navigation, search

--D. Thiebaut (talk) 13:25, 14 November 2016 (EST)


<onlydft>

2013

Threads

  • good example with multiple ping processes: [1]
  • multi-core not used by python [2]

Programs

Setting up documents and swish-e

(Note: there are 2 other alternatives: sphinx and zend-lucene. Sphinx requires data in xml form or in mysql database)

   cd 
   cd Site/swish-e
   php swishe.php search=love
  ...
 <br>
 <br>rank:   20
 <br>score:  809
 <br>url:    http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html
 <br>link:   <a href="http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html">link</a>
 <br>file:   Keys2LovingUnity.html
 <br>offset: 47813
 <br>

Where delay is number of 1/10s of a second to wait. This is a bound as the true delay is random between 0.1 sec and the integer specified times 1/10 seconds.)

Project

Project 1
Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1. Filter the docs received and keep only those that contain most of the words in List2.
Project 2
XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)!
Project 3
Map-Reduce: process wikipedia pages and create an index of words and their associated categories

Papers

Notes on a View from Berkeley paper


2017

Ideas

  • Latex still important
  • last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover. Need 10-min presentation plus 2 page prospectus.
    • GPU
    • Deeplearning
    • Top 500. Why, what, what do we learn, lessons?
    • CUDA
    • OpenMP
    • Debugging parallel programming
    • Tensorflow
    • Vampir: Trace Analyzer tool
    • TotalView: Debugger
  • C/C++ tutorial. Still good
  • Optimization options in C -O, -O2, -O3

Resources

Spark

  • Apache Spark: A Unified Engine for Big Data Processing, Matei Zaharia et al (pdf)





<onlydft>