Latest revision as of 12:52, 16 February 2017

--D. Thiebaut (talk) 13:25, 14 November 2016 (EST)

2013

Page with Public & Private Class Notes

Go to this page for the 2013 Class Notes

TOC:

1 Resources 2013

1.1 Rocco's Presentation 10/10/13

1.2 Hadoop

1.3 On-Line

1.4 Papers

1.5 Art

1.6 Some good references

2 Misc. Topics

3 XSEDE.ORG

4 Update 2015: Downloading images to Hadoop0

5 Downloading All Wikipedia Images

6 Download the page statistics

6.1 Links of Interest

7 Resources 2010

8 Map-Reduce/Hadoop

8.1 Options for Setup

8.1.1 Xen Live CD

8.1.2 Setting up Hadoop using VmWare

8.2 Setting Up Hadoop and Eclipse on the Mac

8.2.1 Install Hadoop

8.2.2 Verify configuration of Hadoop

8.3 Setting up Eclipse for Hadoop

8.3.1 Map-Reduce Locations

8.3.2 DFS Locations

8.4 Create a new project with Eclipse

8.4.1 Project

8.5 Map/Reduce driver class

8.5.1 Running the Project

9 WordCount Example on Eclipse on Mac

9.1 Mapper

9.2 Reducer

9.3 Driver

9.4 Run WordCount Project

10 Notes on doing example in Yahoo Tutorial, Module 2

2013 Private Notes Page

Threads

good example with multiple ping processes: [1]
multi-core not used by python [2]

Programs

Setting up documents and swish-e

(Note: there are 2 other alternatives: sphinx and zend-lucene. Sphinx requires data in xml form or in mysql database)

all operations on xgridmac
downloaded & installed swish-e. Install dir is ~thiebaut/research/swish-e/
downloaded & installed www.etext.org in http://xgridmac.dyndns.org/~thiebaut/www_etext_org/
swish-e index stored in ~thiebaut/Site/swish-e
added swishe.php in ~thiebaut/Site/swish-e/
test:

   cd 
   cd Site/swish-e
   php swishe.php search=love
  ...
 <br>
 <br>rank:   20
 <br>score:  809
 <br>url:    http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html
 <br>link:   <a href="http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html">link</a>
 <br>file:   Keys2LovingUnity.html
 <br>offset: 47813
 <br>

test on Web at url http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?search=love
test with delay: http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?delay=20&search='local%20government'

Where delay is number of 1/10s of a second to wait. This is a bound as the true delay is random between 0.1 sec and the integer specified times 1/10 seconds.)

Project

Project 1: Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1. Filter the docs received and keep only those that contain most of the words in List2.

Project 2: XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)!

Project 3: Map-Reduce: process wikipedia pages and create an index of words and their associated categories

Papers

Notes on a View from Berkeley paper

2017

Papers

Skip GPU paper (throughput oriented) for next time.

Ideas

Latex still important
last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover. Need 10-min presentation plus 2 page prospectus.
- GPU
- Deeplearning
- Top 500. Why, what, what do we learn, lessons?
- CUDA
- OpenMP
- Debugging parallel programming
- Tensorflow
- Vampir: Trace Analyzer tool
- TotalView: Debugger

C/C++ tutorial. Still good
Optimization options in C -O, -O2, -O3

Resources

Spark

Apache Spark: A Unified Engine for Big Data Processing, Matei Zaharia et al (pdf)

Papers

Keynotes

CSC352 Keynote Presentations 2013

Hadoop

WordCount tutorial that works on driftwood.smith.edu: https://hadoop.apache.org/.../MapReduceTutorial.html#Example:_WordCount_v1.0

@@ Line 2: / Line 2: @@
 ----
 <onlydft>
-=Ideas=
+=2013=
+==Page with Public & Private Class Notes==
+<br />
+* Go to [[CSC352_Notes_2013 | this page]] for the 2013 Class Notes
+TOC:
+:1 Resources 2013
+:1.1 Rocco's Presentation 10/10/13
+::1.2 Hadoop
+::1.3 On-Line
+::1.4 Papers
+::1.5 Art
+::1.6 Some good references
+:2 Misc. Topics
+:3 XSEDE.ORG
+:4 Update 2015: Downloading images to Hadoop0
+:5 Downloading All Wikipedia Images
+:6 Download the page statistics
+::6.1 Links of Interest
+:7 Resources 2010
+:8 Map-Reduce/Hadoop
+::8.1 Options for Setup
+:::8.1.1 Xen Live CD
+:::8.1.2 Setting up Hadoop using VmWare
+::8.2 Setting Up Hadoop and Eclipse on the Mac
+:::8.2.1 Install Hadoop
+:::8.2.2 Verify configuration of Hadoop
+::8.3 Setting up Eclipse for Hadoop
+:::8.3.1 Map-Reduce Locations
+:::8.3.2 DFS Locations
+::8.4 Create a new project with Eclipse
+:::8.4.1 Project
+::8.5 Map/Reduce driver class
+:::8.5.1 Running the Project
+:9 WordCount Example on Eclipse on Mac
+::9.1 Mapper
+::9.2 Reducer
+::9.3 Driver
+::9.4 Run WordCount Project
+:10 Notes on doing example in Yahoo Tutorial, Module 2
+<br />
+----
+==2013 Private Notes Page==
+==Threads==
+* good example with multiple ping processes: [http://www.wellho.net/solutions/python-python-threads-a-first-example.html]
+* multi-core not used by python [http://smoothspan.wordpress.com/2007/09/14/guido-is-right-to-leave-the-gil-in-python-not-for-multicore-but-for-utility-computing/]
+===Programs===
+* [[NQueens.py | NQueens.py]]
+* [[threadedNQueens.py | threadedNQueens.py ]]
+* [[classExample1.py | classExample1.py ]]
+* [[classExample2.py | classExample2.py ]]
+* [[classExample3.py | classExample3.py ]]
+* [[serialPing.py | serialPing.py ]]
+* [[threadedPing.py | threadedPing.py ]]
+* [[searchKeywordsRetrieveEtexts.py | searchKeywordsRetrieveEtexts.py ]]
+* [[threadedSearchKeywordRetrieveEtexts.py | threadedSearchKeywordRetrieveEtexts.py ]]
+===Setting up documents and swish-e===
+(Note: there are 2 other alternatives: sphinx and zend-lucene.  Sphinx requires data in xml form or in mysql database)
+* all operations on xgridmac
+* downloaded &amp; installed swish-e.  Install dir is ~thiebaut/research/swish-e/
+* downloaded &amp; installed www.etext.org in http://xgridmac.dyndns.org/~thiebaut/www_etext_org/
+* swish-e index stored in ~thiebaut/Site/swish-e
+* added [[www.etext.org_swish-e.php | swishe.php]] in ~thiebaut/Site/swish-e/
+* test:
+<code><pre>
+   cd
+   cd Site/swish-e
+   php swishe.php search=love
+  ...
+ <br>
+ <br>rank:   20
+ <br>score:  809
+ <br>url:    http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html
+ <br>link:   <a href="http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html">link</a>
+ <br>file:   Keys2LovingUnity.html
+ <br>offset: 47813
+ <br>
+</pre></code>
+* test on Web at url http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?search=love
+* test with delay: http://xgridmac.dyndns.org/~thiebaut/swish-e/swishe.php?delay=20&search='local%20government'
+Where delay is number of 1/10s of a second to wait.  This is a bound as the true delay is random between 0.1 sec and the integer specified times 1/10 seconds.)
+==Project==
+;Project 1:
+:Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1.  Filter the docs received and keep only those that contain most of the words in List2.
+;Project 2:
+:XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)!
+;Project 3:
+:Map-Reduce: process wikipedia pages and create an index of words and their associated categories
+==Papers==
+[[CSC352 Notes on A View From Berkeley| Notes]] on a View from Berkeley paper
+<!-- ================================================================ -->
+=2017=
+==Papers==
+* Skip GPU paper (throughput oriented) for next time.
+==Ideas==
 * Latex still important
 * last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover.  Need 10-min presentation plus 2 page prospectus.
@@ Line 12: / Line 120: @@
 ** Debugging parallel programming
 ** Tensorflow
+** Vampir: Trace Analyzer tool
+** TotalView: Debugger
+* C/C++ tutorial.  Still good
+* Optimization options  in C  -O, -O2, -O3
+==Resources==
+===Spark===
+* <videoflash type="vimeo">185645796</videoflash>
+* Apache Spark: A Unified Engine for Big Data Processing, Matei Zaharia et al [[Media:ApacheSparkUnifiedEngineBigData.pdf|(pdf)]]
+==Papers==
+* [https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/ Microsoft bets future chips reprogram fly], Wired.com. ([[Media:MicrosoftBetsFutureChipReprogramFly.pdf|pdf]])
+* [[Comments on Burger's FPGA Paper| Comments on Microsoft bets future chips reprogram fly, Bob Burger, FPGA]]
+==Keynotes==
+* [[CSC352_Keynote_Presentations_2013| CSC352 Keynote Presentations 2013]]
+==Hadoop==
+* WordCount tutorial that works on driftwood.smith.edu: [https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0 https://hadoop.apache.org/.../MapReduceTutorial.html#Example:_WordCount_v1.0]

Difference between revisions of "CSC352 2017 DT's Notes"

Latest revision as of 12:52, 16 February 2017

Contents

2013

Page with Public & Private Class Notes

2013 Private Notes Page

Threads

Programs

Setting up documents and swish-e

Project

Papers

2017

Papers

Ideas

Resources

Spark

Papers

Keynotes

Hadoop

Navigation menu

Personal tools

Namespaces

Variants

Views

More

Search

Navigation

Tools