Difference between revisions of "CSC352 2017 DT's Notes"
(Created page with "--~~~~ ---- <onlydft> =Ideas= * Latex still important * last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover. Need 10-min presentation plus 2 page p...") |
|||
(12 intermediate revisions by the same user not shown) | |||
Line 2: | Line 2: | ||
---- | ---- | ||
<onlydft> | <onlydft> | ||
− | =Ideas= | + | =2013= |
+ | ==Page with Public & Private Class Notes== | ||
+ | <br /> | ||
+ | * Go to [[CSC352_Notes_2013 | this page]] for the 2013 Class Notes | ||
+ | |||
+ | TOC: | ||
+ | :1 Resources 2013 | ||
+ | :1.1 Rocco's Presentation 10/10/13 | ||
+ | ::1.2 Hadoop | ||
+ | ::1.3 On-Line | ||
+ | ::1.4 Papers | ||
+ | ::1.5 Art | ||
+ | ::1.6 Some good references | ||
+ | :2 Misc. Topics | ||
+ | :3 XSEDE.ORG | ||
+ | :4 Update 2015: Downloading images to Hadoop0 | ||
+ | :5 Downloading All Wikipedia Images | ||
+ | :6 Download the page statistics | ||
+ | ::6.1 Links of Interest | ||
+ | :7 Resources 2010 | ||
+ | :8 Map-Reduce/Hadoop | ||
+ | ::8.1 Options for Setup | ||
+ | :::8.1.1 Xen Live CD | ||
+ | :::8.1.2 Setting up Hadoop using VmWare | ||
+ | ::8.2 Setting Up Hadoop and Eclipse on the Mac | ||
+ | :::8.2.1 Install Hadoop | ||
+ | :::8.2.2 Verify configuration of Hadoop | ||
+ | ::8.3 Setting up Eclipse for Hadoop | ||
+ | :::8.3.1 Map-Reduce Locations | ||
+ | :::8.3.2 DFS Locations | ||
+ | ::8.4 Create a new project with Eclipse | ||
+ | :::8.4.1 Project | ||
+ | ::8.5 Map/Reduce driver class | ||
+ | :::8.5.1 Running the Project | ||
+ | :9 WordCount Example on Eclipse on Mac | ||
+ | ::9.1 Mapper | ||
+ | ::9.2 Reducer | ||
+ | ::9.3 Driver | ||
+ | ::9.4 Run WordCount Project | ||
+ | :10 Notes on doing example in Yahoo Tutorial, Module 2 | ||
+ | <br /> | ||
+ | |||
+ | ---- | ||
+ | ==2013 Private Notes Page== | ||
+ | ==Threads== | ||
+ | * good example with multiple ping processes: [http://www.wellho.net/solutions/python-python-threads-a-first-example.html] | ||
+ | * multi-core not used by python [http://smoothspan.wordpress.com/2007/09/14/guido-is-right-to-leave-the-gil-in-python-not-for-multicore-but-for-utility-computing/] | ||
+ | |||
+ | ===Programs=== | ||
+ | * [[NQueens.py | NQueens.py]] | ||
+ | * [[threadedNQueens.py | threadedNQueens.py ]] | ||
+ | * [[classExample1.py | classExample1.py ]] | ||
+ | * [[classExample2.py | classExample2.py ]] | ||
+ | * [[classExample3.py | classExample3.py ]] | ||
+ | * [[serialPing.py | serialPing.py ]] | ||
+ | * [[threadedPing.py | threadedPing.py ]] | ||
+ | * [[searchKeywordsRetrieveEtexts.py | searchKeywordsRetrieveEtexts.py ]] | ||
+ | * [[threadedSearchKeywordRetrieveEtexts.py | threadedSearchKeywordRetrieveEtexts.py ]] | ||
+ | |||
+ | ===Setting up documents and swish-e=== | ||
+ | (Note: there are 2 other alternatives: sphinx and zend-lucene. Sphinx requires data in xml form or in mysql database) | ||
+ | |||
+ | * all operations on xgridmac | ||
+ | * downloaded & installed swish-e. Install dir is ~thiebaut/research/swish-e/ | ||
+ | * downloaded & installed www.etext.org in http://xgridmac.dyndns.org/~thiebaut/www_etext_org/ | ||
+ | * swish-e index stored in ~thiebaut/Site/swish-e | ||
+ | * added [[www.etext.org_swish-e.php | swishe.php]] in ~thiebaut/Site/swish-e/ | ||
+ | * test: | ||
+ | <code><pre> | ||
+ | cd | ||
+ | cd Site/swish-e | ||
+ | php swishe.php search=love | ||
+ | ... | ||
+ | <br> | ||
+ | <br>rank: 20 | ||
+ | <br>score: 809 | ||
+ | <br>url: http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html | ||
+ | <br>link: <a href="http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html">link</a> | ||
+ | <br>file: Keys2LovingUnity.html | ||
+ | <br>offset: 47813 | ||
+ | <br> | ||
+ | </pre></code> | ||
+ | |||
+ | * test on Web at url http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?search=love | ||
+ | * test with delay: http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?delay=20&search='local%20government' | ||
+ | Where delay is number of 1/10s of a second to wait. This is a bound as the true delay is random between 0.1 sec and the integer specified times 1/10 seconds.) | ||
+ | |||
+ | ==Project== | ||
+ | |||
+ | ;Project 1: | ||
+ | :Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1. Filter the docs received and keep only those that contain most of the words in List2. | ||
+ | |||
+ | ;Project 2: | ||
+ | :XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)! | ||
+ | |||
+ | ;Project 3: | ||
+ | :Map-Reduce: process wikipedia pages and create an index of words and their associated categories | ||
+ | |||
+ | ==Papers== | ||
+ | |||
+ | [[CSC352 Notes on A View From Berkeley| Notes]] on a View from Berkeley paper | ||
+ | |||
+ | |||
+ | <!-- ================================================================ --> | ||
+ | =2017= | ||
+ | ==Papers== | ||
+ | * Skip GPU paper (throughput oriented) for next time. | ||
+ | |||
+ | ==Ideas== | ||
* Latex still important | * Latex still important | ||
* last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover. Need 10-min presentation plus 2 page prospectus. | * last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover. Need 10-min presentation plus 2 page prospectus. | ||
Line 12: | Line 120: | ||
** Debugging parallel programming | ** Debugging parallel programming | ||
** Tensorflow | ** Tensorflow | ||
+ | ** Vampir: Trace Analyzer tool | ||
+ | ** TotalView: Debugger | ||
+ | * C/C++ tutorial. Still good | ||
+ | * Optimization options in C -O, -O2, -O3 | ||
+ | ==Resources== | ||
+ | ===Spark=== | ||
+ | * <videoflash type="vimeo">185645796</videoflash> | ||
+ | * Apache Spark: A Unified Engine for Big Data Processing, Matei Zaharia et al [[Media:ApacheSparkUnifiedEngineBigData.pdf|(pdf)]] | ||
+ | |||
+ | ==Papers== | ||
+ | * [https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/ Microsoft bets future chips reprogram fly], Wired.com. ([[Media:MicrosoftBetsFutureChipReprogramFly.pdf|pdf]]) | ||
+ | * [[Comments on Burger's FPGA Paper| Comments on Microsoft bets future chips reprogram fly, Bob Burger, FPGA]] | ||
+ | ==Keynotes== | ||
+ | * [[CSC352_Keynote_Presentations_2013| CSC352 Keynote Presentations 2013]] | ||
− | + | ==Hadoop== | |
+ | * WordCount tutorial that works on driftwood.smith.edu: [https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0 https://hadoop.apache.org/.../MapReduceTutorial.html#Example:_WordCount_v1.0] | ||
Latest revision as of 12:52, 16 February 2017
--D. Thiebaut (talk) 13:25, 14 November 2016 (EST)
<onlydft>
Contents
2013
Page with Public & Private Class Notes
- Go to this page for the 2013 Class Notes
TOC:
- 1 Resources 2013
- 1.1 Rocco's Presentation 10/10/13
- 1.2 Hadoop
- 1.3 On-Line
- 1.4 Papers
- 1.5 Art
- 1.6 Some good references
- 2 Misc. Topics
- 3 XSEDE.ORG
- 4 Update 2015: Downloading images to Hadoop0
- 5 Downloading All Wikipedia Images
- 6 Download the page statistics
- 6.1 Links of Interest
- 7 Resources 2010
- 8 Map-Reduce/Hadoop
- 8.1 Options for Setup
- 8.1.1 Xen Live CD
- 8.1.2 Setting up Hadoop using VmWare
- 8.2 Setting Up Hadoop and Eclipse on the Mac
- 8.2.1 Install Hadoop
- 8.2.2 Verify configuration of Hadoop
- 8.3 Setting up Eclipse for Hadoop
- 8.3.1 Map-Reduce Locations
- 8.3.2 DFS Locations
- 8.4 Create a new project with Eclipse
- 8.4.1 Project
- 8.5 Map/Reduce driver class
- 8.5.1 Running the Project
- 8.1 Options for Setup
- 9 WordCount Example on Eclipse on Mac
- 9.1 Mapper
- 9.2 Reducer
- 9.3 Driver
- 9.4 Run WordCount Project
- 10 Notes on doing example in Yahoo Tutorial, Module 2
2013 Private Notes Page
Threads
Programs
- NQueens.py
- threadedNQueens.py
- classExample1.py
- classExample2.py
- classExample3.py
- serialPing.py
- threadedPing.py
- searchKeywordsRetrieveEtexts.py
- threadedSearchKeywordRetrieveEtexts.py
Setting up documents and swish-e
(Note: there are 2 other alternatives: sphinx and zend-lucene. Sphinx requires data in xml form or in mysql database)
- all operations on xgridmac
- downloaded & installed swish-e. Install dir is ~thiebaut/research/swish-e/
- downloaded & installed www.etext.org in http://xgridmac.dyndns.org/~thiebaut/www_etext_org/
- swish-e index stored in ~thiebaut/Site/swish-e
- added swishe.php in ~thiebaut/Site/swish-e/
- test:
cd
cd Site/swish-e
php swishe.php search=love
...
<br>
<br>rank: 20
<br>score: 809
<br>url: http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html
<br>link: <a href="http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html">link</a>
<br>file: Keys2LovingUnity.html
<br>offset: 47813
<br>
- test on Web at url http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?search=love
- test with delay: http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?delay=20&search='local%20government'
Where delay is number of 1/10s of a second to wait. This is a bound as the true delay is random between 0.1 sec and the integer specified times 1/10 seconds.)
Project
- Project 1
- Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1. Filter the docs received and keep only those that contain most of the words in List2.
- Project 2
- XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)!
- Project 3
- Map-Reduce: process wikipedia pages and create an index of words and their associated categories
Papers
Notes on a View from Berkeley paper
2017
Papers
- Skip GPU paper (throughput oriented) for next time.
Ideas
- Latex still important
- last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover. Need 10-min presentation plus 2 page prospectus.
- GPU
- Deeplearning
- Top 500. Why, what, what do we learn, lessons?
- CUDA
- OpenMP
- Debugging parallel programming
- Tensorflow
- Vampir: Trace Analyzer tool
- TotalView: Debugger
- C/C++ tutorial. Still good
- Optimization options in C -O, -O2, -O3
Resources
Spark
- Apache Spark: A Unified Engine for Big Data Processing, Matei Zaharia et al (pdf)
Papers
- Microsoft bets future chips reprogram fly, Wired.com. (pdf)
- Comments on Microsoft bets future chips reprogram fly, Bob Burger, FPGA
Keynotes
Hadoop
- WordCount tutorial that works on driftwood.smith.edu: https://hadoop.apache.org/.../MapReduceTutorial.html#Example:_WordCount_v1.0
<onlydft>