Difference between revisions of "CSC352 2017 DT's Notes"

From dftwiki3
Jump to: navigation, search
Line 2: Line 2:
 
----
 
----
 
<onlydft>
 
<onlydft>
=Ideas=
+
=2013=
 +
==Threads==
 +
* good example with multiple ping processes: [http://www.wellho.net/solutions/python-python-threads-a-first-example.html]
 +
* multi-core not used by python [http://smoothspan.wordpress.com/2007/09/14/guido-is-right-to-leave-the-gil-in-python-not-for-multicore-but-for-utility-computing/]
 +
 
 +
===Programs===
 +
* [[NQueens.py | NQueens.py]]
 +
* [[threadedNQueens.py | threadedNQueens.py ]]
 +
* [[classExample1.py | classExample1.py ]]
 +
* [[classExample2.py | classExample2.py ]]
 +
* [[classExample3.py | classExample3.py ]]
 +
* [[serialPing.py | serialPing.py ]]
 +
* [[threadedPing.py | threadedPing.py ]]
 +
* [[searchKeywordsRetrieveEtexts.py | searchKeywordsRetrieveEtexts.py ]]
 +
* [[threadedSearchKeywordRetrieveEtexts.py | threadedSearchKeywordRetrieveEtexts.py ]]
 +
 
 +
===Setting up documents and swish-e===
 +
(Note: there are 2 other alternatives: sphinx and zend-lucene.  Sphinx requires data in xml form or in mysql database)
 +
 
 +
* all operations on xgridmac
 +
* downloaded &amp; installed swish-e.  Install dir is ~thiebaut/research/swish-e/
 +
* downloaded &amp; installed www.etext.org in http://xgridmac.dyndns.org/~thiebaut/www_etext_org/
 +
* swish-e index stored in ~thiebaut/Site/swish-e
 +
* added [[www.etext.org_swish-e.php | swishe.php]] in ~thiebaut/Site/swish-e/
 +
* test:
 +
<code><pre>
 +
  cd
 +
  cd Site/swish-e
 +
  php swishe.php search=love
 +
  ...
 +
<br>
 +
<br>rank:  20
 +
<br>score:  809
 +
<br>url:    http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html
 +
<br>link:  <a href="http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html">link</a>
 +
<br>file:  Keys2LovingUnity.html
 +
<br>offset: 47813
 +
<br>
 +
</pre></code>
 +
 
 +
* test on Web at url http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?search=love
 +
* test with delay: http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?delay=20&search='local%20government'
 +
Where delay is number of 1/10s of a second to wait.  This is a bound as the true delay is random between 0.1 sec and the integer specified times 1/10 seconds.)
 +
 
 +
==Project==
 +
 
 +
;Project 1:
 +
:Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1.  Filter the docs received and keep only those that contain most of the words in List2.
 +
 
 +
;Project 2:
 +
:XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)!
 +
 
 +
;Project 3:
 +
:Map-Reduce: process wikipedia pages and create an index of words and their associated categories
 +
 
 +
==Papers==
 +
 
 +
[[CSC352 Notes on A View From Berkeley| Notes]] on a View from Berkeley paper
 +
 
 +
 
 +
<!-- ================================================================ -->
 +
=2017=
 +
==Ideas==
 
* Latex still important
 
* Latex still important
 
* last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover.  Need 10-min presentation plus 2 page prospectus.   
 
* last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover.  Need 10-min presentation plus 2 page prospectus.   
Line 18: Line 80:
 
* Optimization options  in C  -O, -O2, -O3
 
* Optimization options  in C  -O, -O2, -O3
  
=Resources=
+
==Resources==
==Spark==
+
===Spark===
 
* <videoflash type="vimeo">185645796</videoflash>
 
* <videoflash type="vimeo">185645796</videoflash>
 
* Apache Spark: A Unified Engine for Big Data Processing, Matei Zaharia et al [[Media:ApacheSparkUnifiedEngineBigData.pdf|(pdf)]]
 
* Apache Spark: A Unified Engine for Big Data Processing, Matei Zaharia et al [[Media:ApacheSparkUnifiedEngineBigData.pdf|(pdf)]]

Revision as of 15:02, 7 December 2016

--D. Thiebaut (talk) 13:25, 14 November 2016 (EST)


<onlydft>

2013

Threads

  • good example with multiple ping processes: [1]
  • multi-core not used by python [2]

Programs

Setting up documents and swish-e

(Note: there are 2 other alternatives: sphinx and zend-lucene. Sphinx requires data in xml form or in mysql database)

   cd 
   cd Site/swish-e
   php swishe.php search=love
  ...
 <br>
 <br>rank:   20
 <br>score:  809
 <br>url:    http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html
 <br>link:   <a href="http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html">link</a>
 <br>file:   Keys2LovingUnity.html
 <br>offset: 47813
 <br>

Where delay is number of 1/10s of a second to wait. This is a bound as the true delay is random between 0.1 sec and the integer specified times 1/10 seconds.)

Project

Project 1
Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1. Filter the docs received and keep only those that contain most of the words in List2.
Project 2
XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)!
Project 3
Map-Reduce: process wikipedia pages and create an index of words and their associated categories

Papers

Notes on a View from Berkeley paper


2017

Ideas

  • Latex still important
  • last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover. Need 10-min presentation plus 2 page prospectus.
    • GPU
    • Deeplearning
    • Top 500. Why, what, what do we learn, lessons?
    • CUDA
    • OpenMP
    • Debugging parallel programming
    • Tensorflow
    • Vampir: Trace Analyzer tool
    • TotalView: Debugger
  • C/C++ tutorial. Still good
  • Optimization options in C -O, -O2, -O3

Resources

Spark

  • Apache Spark: A Unified Engine for Big Data Processing, Matei Zaharia et al (pdf)





<onlydft>