Difference between revisions of "CSC352 2017 DT's Notes"

From dftwiki3
Jump to: navigation, search
 
(4 intermediate revisions by the same user not shown)
Line 107: Line 107:
 
<!-- ================================================================ -->
 
<!-- ================================================================ -->
 
=2017=
 
=2017=
 +
==Papers==
 +
* Skip GPU paper (throughput oriented) for next time.
 +
 
==Ideas==
 
==Ideas==
 
* Latex still important
 
* Latex still important
Line 129: Line 132:
 
   
 
   
 
==Papers==
 
==Papers==
* [[Comments on Burger's FPGA Paper| Comments on Burger's FPGA Paper]]
+
* [https://www.wired.com/2016/09/microsoft-bets-future-chip-reprogram-fly/ Microsoft bets future chips reprogram fly], Wired.com. ([[Media:MicrosoftBetsFutureChipReprogramFly.pdf|pdf]])
 
+
* [[Comments on Burger's FPGA Paper| Comments on Microsoft bets future chips reprogram fly, Bob Burger, FPGA]]
  
 +
==Keynotes==
  
 +
* [[CSC352_Keynote_Presentations_2013| CSC352 Keynote Presentations 2013]]
  
 +
==Hadoop==
 +
* WordCount tutorial that works on driftwood.smith.edu: [https://hadoop.apache.org/docs/stable/hadoop-mapreduce-client/hadoop-mapreduce-client-core/MapReduceTutorial.html#Example:_WordCount_v1.0 https://hadoop.apache.org/.../MapReduceTutorial.html#Example:_WordCount_v1.0]
  
  

Latest revision as of 12:52, 16 February 2017

--D. Thiebaut (talk) 13:25, 14 November 2016 (EST)


<onlydft>

2013

Page with Public & Private Class Notes


TOC:

1 Resources 2013
1.1 Rocco's Presentation 10/10/13
1.2 Hadoop
1.3 On-Line
1.4 Papers
1.5 Art
1.6 Some good references
2 Misc. Topics
3 XSEDE.ORG
4 Update 2015: Downloading images to Hadoop0
5 Downloading All Wikipedia Images
6 Download the page statistics
6.1 Links of Interest
7 Resources 2010
8 Map-Reduce/Hadoop
8.1 Options for Setup
8.1.1 Xen Live CD
8.1.2 Setting up Hadoop using VmWare
8.2 Setting Up Hadoop and Eclipse on the Mac
8.2.1 Install Hadoop
8.2.2 Verify configuration of Hadoop
8.3 Setting up Eclipse for Hadoop
8.3.1 Map-Reduce Locations
8.3.2 DFS Locations
8.4 Create a new project with Eclipse
8.4.1 Project
8.5 Map/Reduce driver class
8.5.1 Running the Project
9 WordCount Example on Eclipse on Mac
9.1 Mapper
9.2 Reducer
9.3 Driver
9.4 Run WordCount Project
10 Notes on doing example in Yahoo Tutorial, Module 2



2013 Private Notes Page

Threads

  • good example with multiple ping processes: [1]
  • multi-core not used by python [2]

Programs

Setting up documents and swish-e

(Note: there are 2 other alternatives: sphinx and zend-lucene. Sphinx requires data in xml form or in mysql database)

   cd 
   cd Site/swish-e
   php swishe.php search=love
  ...
 <br>
 <br>rank:   20
 <br>score:  809
 <br>url:    http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html
 <br>link:   <a href="http://xgridmac.dyndns.org/~thiebaut/www_etext_org/Religious_357/Polyamory/Keys2LovingUnity.html">link</a>
 <br>file:   Keys2LovingUnity.html
 <br>offset: 47813
 <br>

Where delay is number of 1/10s of a second to wait. This is a bound as the true delay is random between 0.1 sec and the integer specified times 1/10 seconds.)

Project

Project 1
Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1. Filter the docs received and keep only those that contain most of the words in List2.
Project 2
XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)!
Project 3
Map-Reduce: process wikipedia pages and create an index of words and their associated categories

Papers

Notes on a View from Berkeley paper


2017

Papers

  • Skip GPU paper (throughput oriented) for next time.

Ideas

  • Latex still important
  • last 2 weeks presentations: 10 minutes each, on a subject that we didn't cover. Need 10-min presentation plus 2 page prospectus.
    • GPU
    • Deeplearning
    • Top 500. Why, what, what do we learn, lessons?
    • CUDA
    • OpenMP
    • Debugging parallel programming
    • Tensorflow
    • Vampir: Trace Analyzer tool
    • TotalView: Debugger
  • C/C++ tutorial. Still good
  • Optimization options in C -O, -O2, -O3

Resources

Spark

  • Apache Spark: A Unified Engine for Big Data Processing, Matei Zaharia et al (pdf)

Papers

Keynotes

Hadoop



<onlydft>