Difference between revisions of "CSC352 Class Page 2010"

From dftwiki3
Jump to: navigation, search
m (Thiebaut moved page CSC352 Class Page to CSC352 Class Page 2010)
 
(187 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
__TOC__
 +
 +
<br />
 +
<br />
 +
<center>[[CSC352 | Main Page]] | [[CSC352_Syllabus | Syllabus]] | [[CSC352_Class_Page | Schedule]] |
 +
[[CSC352 Resources | Links &amp; Resources]]</center>
 +
<br />
 +
<br />
 +
 +
=Hadoop-Related=
 +
 +
* [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ Hadoop FAQ Page]
 +
* [http://maven.smith.edu/~thiebaut/showhadoopip.php Hadoop IPs]
 +
 +
=Projects=
 +
 +
* [[CSC352 Project 1 | Project 1]]: Started Feb 16, due March 2nd. A [[CSC352 Project1 Solution|good example of a solution]].
 +
* [[CSC352 Project 2 | Project 2]]: Deals with the Xgrid.  A [[CSC352 Project 2 Solution | collection of good proposed solutions]].
 +
* [[CSC352 Project 3 | Project 3]] Deals with processing Wikipedia pages on Hadoop/MapReduce.
 +
 +
* '''Project 4 has officially started on 1/26/10! and is now Over.'''  Find its shared wiki page [http://cs.smith.edu/classwiki/index.php/CSC352_Page#Projects here]!  Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
 +
 +
[[File:SmilingPython.png | 100px | right]]
 +
 
=Python Threads=
 
=Python Threads=
 +
{| style="width:100%" border="1"
 +
|- style="background:#ffdead;"
 +
|'''Week''' || '''Topics''' || '''Reading'''
 +
 +
<!-- ================================================================== -->
 +
|-valign="top"
 +
|width="15%"| Week 1 <br /> 1/25
 +
|width="60%"|
 +
* '''Tuesday'''
 +
** Introduce Syllabus
 +
** Interrupts
 +
[[Image:pogoplug.jpg | right]]
 +
* '''Thursday'''
 +
** The [[PogoPlug]]...
 +
** The [http://www.apple.com/ipad/ iPad]...
 +
** Python Review
 +
*** [[NQueens.py | NQueens.py]]: a python program to find the first solution for ''N'' queens on an ''N''xN board.
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
;Read:
 +
* What is a Thread? [http://en.wikipedia.org/wiki/Thread_(computer_science)]
 +
* What is a Processes? [http://en.wikipedia.org/wiki/Process_(computing)]
 +
;Reference material for when we start programming:
 +
* [http://www.python.org/doc/2.5.2/lib/thread-objects.html Python.org's page] on Thread Objects.  It's the reference, but is not always crystal clear...
 +
* [http://linuxgazette.net/107/pai.html Linux Gazette Article] on Python Threads.  Good overall introduction.
 +
* [http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf Norm Matloff's] introduction to Python Threads.  More details, less easy to follow.
 +
;For Discussion next Tuesday
 +
*read  the paper by  Asanovic K. ''et al'', The Landscape of Parallel Computing Research: A View from Berkeley.
 +
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| Week 2 <br /> 2/1<br />
 +
||
 +
* '''Tuesday'''
 +
** discussion (prepare a 1-page summary) of the paper by  Asanovic K. ''et al'', [http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf The Landscape of Parallel Computing Research: A View from Berkeley], Dec. 2006. ([[media:LandscapeParallelProcessingBerkeley1206.pdf|cached copy]])
 +
 +
* '''Thursday'''
 +
** Python
 +
*** [[classExample1.py | classExample1.py ]]
 +
*** [[classExample2.py | classExample2.py ]]
 +
*** [[classExample3.py | classExample3.py ]]
 +
*** [[serialPing.py | serialPing.py ]]
 +
** Python Threads
 +
*** [[threadedPing.py | threadedPing.py ]]: a threaded solution to the serial ping program
 +
*** [[ThreadedNQueens.py | ThreadedNQueens.py ]]: a threaded solution to the N-Queens program
 +
 +
----
 +
* [[CSC352 Homework 1 | Homework #1]]
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
&nbsp;
 +
* Make sure to read Norman Matloff's tutorial on threads in the [[CSC352_Resources#Documentation_on_Python_Threads|
 +
section on Python]] on the [[CSC352_Resources | Resource]] Page.
 +
* Also, don't hesitate to check [http://python.org Python.org] for information on threads.
 +
<!-- ================================================================== -->
 +
|- style="background:#ffffff" valign="top"
 +
| Week 3 <br /> 2/8 <br />
 +
||
 +
* '''Tuesday'''
 +
** An experiment with Control-C...
 +
** Sharing in shared memory
 +
*** the problem
 +
*** atomicity of operations
 +
*** semaphores
 +
**** fair
 +
**** safe
 +
**** live
 +
*** Python locks
 +
*** examples:
 +
**** [[CS352_threadedpingWithLocks.py | threadedPingWithLocks.py]]
 +
**** [[CS352_threadedpingWithSemaphores.py | threadedPingWithSemaphores.py]]
 +
* '''Thursday'''
 +
** Discussion: "Why Von Neumann?"
 +
** Continuation of Shared Memory and Sharing
 +
*** Python Queues
 +
**** [[CS352_threadedpingWithQueues.py | threadedPingWithQueues.py]]
 +
** Some [[CSC352 Notes on the Python GIL| notes]] on Python and treads
 +
** An alternative to Threading: [http://docs.python.org/library/multiprocessing.html Multiprocessing]
 +
*** Available with Python 2.6 and up only
 +
*** [[CSC352_multiprocessingNQueens.py | multiprocessingNQueens.py]]
 +
*** [[CSC352 Comparison threading to multiprocessing| Comparison]] of the two methods
 +
** Performance measures
 +
*** Speedup
 +
*** Execution time
 +
*** Processor utilization
 +
*** Processor efficiency
 +
*** Processor efficacy
 +
** Guidlines for presenting research papers
 +
** Deadlocks
 +
 +
----
 +
* [[CSC352 Homework 2 | Homework #2]], [[CSC352 Homework 2 Solution 1 | Solution 1]] and [[CSC352 Homework 2 Solution 2 | Solution 2 ]]
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
* [http://maven.smith.edu/~thiebaut/transputer/chapter8/chap8-1.html Measuring Performance]
 +
* [http://insidehpc.com/2010/01/05/sun-video-tutorial-optimizing-performance-in-parallel-processing/ Video] from Sun Microsystems on parallelizing applications and  improving their performance (19 minutes)
 +
* [http://en.wikipedia.org/wiki/Deadlock Deadlocks]
 +
* [http://www.python.org/dev/peps/pep-0371/ PEP 371] ([[Media:Python_PEP0371.pdf|cached copy]])
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| Week 4 <br /> 2/15 <br />
 +
||
 +
* '''Tuesday'''
 +
** Discusion of [[media:XenAndTheArtOfVirtualization_3.pdf | Xen and the Art of Virtualization]] (Presentation by '''Le''')
 +
*** One person presents
 +
*** Everybody else submits a 1-page summary in 3 parts.
 +
** Review of HW #1
 +
** A visit of the XGrid system ([http://www.facebook.com/pages/Northampton-MA/Computer-Science-Smith-College/264041891883?ref=ts Photos])
 +
* '''Thursday'''
 +
** Python Q&A
 +
** Amdahl's Law
 +
** Deadlocks: the main rule
 +
** [[How to Read Technical Papers]]
 +
** Summarize [[Media:TechnologyBitsNYT02162010.pdf| this!]]
 +
** Good background information: [http://en.wikipedia.org/wiki/Message_Passing_Interface#Example_program MPI]
 +
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
* Additional information on Xen can be found in  Mauer, R., [http://www.linuxjournal.com/article/8812 Xen Virtualization and Linux Clustering], [http://www.linuxjournal.com Linux Journal] January 12th, 2006.
 +
* A [http://blip.tv/file/2232410 video] on the Python GIL discovered by Diana.
 +
* A [http://oblong.com/ video] by Oblong Industries (in reference to the NYT article to summarize)
 +
 +
 +
|}
 +
 +
[[File:XgridLogo.png| right | 100px | link=http://rocinante.smith.edu/ganglia/]]
  
 
=XGrid Programming=
 
=XGrid Programming=
 +
 +
{| style="width:100%" border="1"
 +
|- style="background:#ffdead;"
 +
|'''Week''' || '''Topics''' || '''Reading'''
 +
 +
<!-- ================================================================== -->
 +
|-valign="top"
 +
|width="15%"| Week 5 <br /> 
 +
|width="60%"|
 +
* '''Tuesday'''
 +
** [[CSC352 Introduction to the Projects | Introduction to the next projects]]
 +
** [http://data.scl.utah.edu/fmi/xsl/stream/details.xsl?-recid=104&a::v=2212a4Eaya A Video presentation of the XGrid] (watch first 10 minutes)
 +
** [[XGrid Tutorial Part 1: Monte Carlo | Tutorial/Lab #1: Monte Carlo on the XGrid]]
 +
* '''Thursday''':
 +
** Presentation and discussion of [http://www.cs.wisc.edu/multifacet/papers/ieeecomputer08_amdahl_multicore.pdf Amdahl's Law in the Multicore Era], Mark Hill and Michael Marty, IEEE Computer, July 2008, and accompanying [http://www.cs.wisc.edu/multifacet/amdahl/ dynamic graph]. ([[Media:ieeecomputer08_amdahl_multicore.pdf |cached copy]])
 +
** [[XGrid Tutorial Part 1: Monte Carlo | Continuation of the Xgrid Monte-Carlo tutorial]]
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
* Apple's [http://developer.apple.com/mac/library/documentation/MacOSXServer/Conceptual/Xgrid_Programming_Guide/Introduction/Introduction.html XGrid Introduction]
 +
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| Week 6 <br /> <br />
 +
||
 +
* '''Tuesday'''
 +
** Presentation of Projects #2 and #3
 +
*** [[CSC352_Resources#Videos:_Big_Data_and_Analytics | Videos]]
 +
*** Pres by DT
 +
**** [http://jonudell.net/udell/gems/umlaut/umlaut.html John Udell's capture of the page ''Heavy Metal Umlaut'']
 +
**** [http://meta.wikimedia.org/wiki/Research Wikipedia Research Projects]
 +
*** [[CSC352_Project_2 | Intro to Projects #2 and #3]] ([http://cs.smith.edu/~thiebaut/freevideos/BigData2.swf  presentation])
 +
* '''Thursday'''
 +
** Canceled by DT
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
 +
||
 +
* Read  Hughes, B., [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.7248&rep=rep1&type=pdf Building Computational Grids with Apple's XGrid Middleware],  ''ACM International Conference Proceeding Series'', Vol. 167, Hobart, Tasmania, Australia, 2006. For next Tuesday.
 +
<!-- ================================================================== -->
 +
|- style="background:#ffffff" valign="top"
 +
| Week 7 <br />  <br />
 +
||
 +
* '''Tuesday'''
 +
** Presentation and discussion of  [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.7248&rep=rep1&type=pdf Building Computational Grids with Apple's XGrid Middleware], by Hughes, B. (''ACM International Conference Proceeding Series'', Vol. 167, Hobart, Tasmania, Australia, 2006.)  ([[media:buildingComputationalGrids.pdf|cached copy]])
 +
** Define Homework #3
 +
** Guidelines for projects. ([http://cs.smith.edu/~thiebaut/freevideos/WhatsInAProject.swf Presentation]) (We stopped on the bell-curve).
 +
* '''Thursday'''
 +
** Guidelines for projects. ([http://cs.smith.edu/~thiebaut/freevideos/WhatsInAProject.swf Presentation])
 +
** [[XGrid Tutorial Part 2: Processing Wikipedia Pages | XGrid Lab 2]]
 +
** Scheduling
 +
*** Processor Scheduling (OS)
 +
*** Multiprocessor Scheduling
 +
*** XGrid Scheduling
 +
 +
----
 +
* [[CSC352 Homework 3 | Homework #3 ]] and its [[CSC352 Homework 3 Solution | solution programs]].
 +
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
&nbsp;
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| &nbsp; <br />  <br />
 +
||
 +
<center>[[Image:SpringBreak.gif |  150px]]</center>
 +
||
 +
&nbsp;
 +
<!-- ================================================================== -->
 +
|- style="background:#ffffff" valign="top"
 +
| Week 8 <br />  <br />
 +
||
 +
* '''Tuesday'''
 +
** Sign-up for paper presentations [http://cs.smith.edu/classwiki/index.php/CSC352_Sign-Up_Sheet_for_Paper_Presentations here]!
 +
** Class participation on the decomposition of Homework 3/Project 2 (the serial part)
 +
* '''Thursday'''
 +
** Continuation of decomposition of Homework 3/Project 2 (the parallel part)
 +
** [[XGrid Tutorial Part 3: Monte Carlo on the Science Center XGrid | XGrid Lab 3]]: running jobs on the Science Center XGrid.
 +
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
&nbsp;
 +
|}
 +
 +
<!--
 +
        _                      _
 +
  ___| |  ___  _  _  __| |
 +
/ __|  |/ _ \ | |  |  |/ _` |
 +
| (__|  | (_)  | |_|  | (_| |
 +
\___|_|\___/\__,_|\__,_|
 +
                         
 +
-->
 +
 +
[[File:HadoopCartoon.png|right|100px]]
  
 
=Cloud Computing=
 
=Cloud Computing=
 +
{| style="width:100%" border="1"
 +
|- style="background:#ffdead;"
 +
|'''Week''' || '''Topics''' || '''Reading'''
 +
 +
<!-- ================================================================== -->
 +
|-valign="top"
 +
|width="15%"| Week 9 <br /> 
 +
|width="60%"|
 +
* '''Tuesday'''
 +
** Presentation of [http://labs.google.com/papers/mapreduce-osdi04.pdf MapReduce: Simplified Data Processing on Large Clusters] (Yang)
 +
** [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] (We stopped at Section 4).
 +
* '''Thursday'''
 +
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] and [[Hadoop_Tutorial_1_--_Running_WordCount | Tutorial #1]], [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial #2]]
  
=Resources: References &amp; Bibliography=
+
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
* [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ  Hadoop Howtos and FAQs]
 +
||
 +
[[File:HadoopOReilly.jpg | 70px | right]]
 +
* [http://hadoop.apache.org/common/docs/current/mapred_tutorial.html Map-Reduce tutorial] from apache.org: a must-read!
 +
* [http://developer.yahoo.com/hadoop/tutorial/module4.html Map-Reduce Basics] from Yahoo.com: another must-read!
  
+
* Section 6 in Tom White's ''Hadoop, the Definitive Guide'', available on [http://books.google.com/books?id=bKPEwR-Pt6EC&printsec=frontcover&dq=hadoop+definitive+guide&source=bl&ots=kOdw-xf9Gg&sig=GyHDzyATSbMVcPysVbSAQKuhv58&hl=en&ei=YJ-0S6HSOIS0lQfm3u1q&sa=X&oi=book_result&ct=result&resnum=6&ved=0CB0Q6AEwBQ#v=onepage&q=&f=false Google Books].
<onlysmith>
+
<!-- ================================================================== -->
==Parallel Processing/Good background information==
+
|- style="background:#eeeeff" valign="top"
* Asanovic K. ''et al'', [http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf The Landscape of Parallel Computing Research: A View from Berkeley], Dec. 2006. ([[media:LandscapeParallelProcessingBerkeley1206.pdf|cached copy]])
+
| Week 10 <br /> <br />
* Xen
+
||
** Mauer, R., [http://www.linuxjournal.com/article/8812 Xen Virtualization and Linux Clustering], [http://www.linuxjournal.com Linux Journal] January 12th, 2006
+
[[Image:CSC352HadoopPerformanceMachineLearning.png| right|150px]]
** Barham P., ''et al.'', [[media:XenAndTheArtOfVirtualization_3.pdf | Xen and the Art of Virtualization]], University of Cambridge Computer Laboratory 15 JJ Thomson Avenue, Cambridge, UK, CB3 0FD
+
* '''Tuesday'''
* AMD News
+
** Presentation of [http://www.icsi.berkeley.edu/~arlo/publications/gillick_cs262a_proj.pdf MapReduce: Distributed Computing for Machine Learning]
** Hardwidge, B., [http://www.bit-tech.net/custompc/news/605374/amd-plans-supercomputer-with-1000-gpus.html AMD plans supercomputer with 1,000 GPUs], Jan. 2009, [http://www.bit-tech.net bit-tech.net] (or graphics goes to the clouds!)
+
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]]  
** Halfacree G., [http://www.bit-tech.net/news/hardware/2009/11/17/amd-supercomputer-tops-top500-list/1 AMD supercomputer tops TOP500 list], November 2009, [http://www.bit-tech.net bit-tech.net] (or Intel gets a black eye!)
+
*** [[Hadoop_Tutorial_1_--_Running_WordCount | Tutorial #1]],
* Google University Code
+
*** [[Hadoop_Tutorial_1_--_Running_WordCount#Analyzing_the_Hadoop_Logs | Tutorial #1: output logs]]
** [http://code.google.com/edu/submissions/rutgers/index.html Lecture Notes] by  Paul Krzyzanowski for a course on Distributed Computing at Rutgers.  Quite complete, and covering the basics of parallelism, RPC, synchronization, fault tolerance, security, and distributed file systems.
+
***  [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial #1.1: Task Timelines]]  
 +
[[Image:WrongTaskTimeline.png| 200px|right]]
 +
* '''Thursday'''
 +
** Question of the day: What's wrong with this picture?
 +
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]]  
 +
** Lab for today:
 +
*** Compare WordCount on 1 vs WordCount on 6 [[Hadoop_Tutorial_1_--_Running_WordCount#Moment_of_Truth:_Compare_5-PC_Hadoop_cluster_to_1_Linux_PC | Section 5 of Tutorial 1]]
 +
*** Create your own version of the Java WordCount program [[Hadoop_Tutorial_1_--_Running_WordCount#Running_Your_Own_Version_of_WordCount.java | Section 4 of Tutorial 1]]
 +
*** Create your own Counters [[Hadoop_Tutorial_1_--_Running_WordCount#Counters | Section 6 of Tutorial 1]]: count Buck!
 +
*** Generate Timelines [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial 1.1]]
 +
*** Counting words in Python [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial 2]]
 +
----
 +
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
* [[CSC352 Homework 4 | Homework #4]] and [[CSC352 Homework 4 Solution | a solution]].
 +
||
 +
&nbsp;
  
==Python Threads==
+
<!-- ================================================================== -->
<greenbox>
+
|- style="background:#ffffff" valign="top"
[[Image:smilingPython.png| right| 100px]]
+
| Week 11 <br />  <br />
* [http://python.org/ The main Python reference]
+
||
* [http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf Norman Matloff and Francis Hsu's Tutorial] on Python Threads (University of California, Davis) ([[media:matlof_PythonTutorial.pdf|cached copy]])
+
* '''Tuesday'''
* [http://linuxgazette.net/107/pai.html Understanding Threading in Python], Krishna G Pai, Linux Gazette, Oct. 2004
+
** Presentation of [http://portal.acm.org/ft_gateway.cfm?id=1629198&type=pdf&coll=GUIDE&dl=GUIDE&CFID=82739837&CFTOKEN=94683258 MapReduce: A Flexible Data Processing Tool] ([[media:MapReduceFlexibleDataProcessingTool.pdf | cached copy]])
* [http://www.python.org/doc/2.3.5/lib/thread-objects.html Thread Objects] from [http://www.python.org Python.Org]
+
** Compare with  Paulson, Rasin, Abadi, DeWitt, Madden, and Stonebraker's paper [[Media:ComparisonOfApproachesToLargeScaleDataAnalysis.pdf |A Comparison of Approaches to Large Scale Data-Analysis]], SIGMOD-09, June 2009.
</greenbox>
+
* '''Thursday'''
 +
** [http://en.wikipedia.org/wiki/Hypertable Hypertable] is an open-source project parallel to Google's BigTable...
 +
** Art vs Science...
 +
** Some preliminary thinking about the final project...
 +
** [[Hadoop_Tutorial_2.1_--_Streaming_XML_Files | Streaming whole files]]
 +
** [[Hadoop Tutorial 2.2 -- Running C++ Programs on Hadoop | WordCount in C++]]
 +
**  Visualizations of Hadoop Data Transfers, from the U. of Nebraska ([http://www.google.com/search?q=university+of+Nebraska+hadoop+visualization&hl=en&safe=off&tbs=vid:1&tbo=u&ei=oKO4S6GMCoH7lwfq88SXCg&sa=X&oi=video_result_group&ct=title&resnum=1&ved=0CBEQqwQwAA more videos])
 +
<br /><br /><center><videoflash>qoBoEzOkeDQ</videoflash></center><br /><br />
 +
** Monitoring a Cluster of Computers as a school of fish (U. Nebraska)
 +
<br /><br /><center><videoflash>LM1j_8sWSEk</videoflash></center><br /><br />
 +
** The evolution of Hadoop (Code-Swarm)
 +
<br /><br /><center><videoflash type="vimeo">2513321</videoflash></center><br /><br />
  
==XGrid==
+
----
<bluebox>
+
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
__NOTOC__
+
* A [[CSC352 Project 2 Solution | solution]] for Project 2 has been posted!
[[Image:xgridLogo.png | right|100px]]
+
||
* [http://tango.csc.smith.edu/classwiki/index.php/Xgrid_Programming Programming Examples, Setup, and References]
+
* [http://developer.yahoo.com/hadoop/tutorial/index.html Hadoop Tutorial from Yahoo Developer Network (YDN)]
 +
&nbsp;
  
* What's an XGrid system?
+
<!-- ================================================================== -->
** [http://data.scl.utah.edu/fmi/xsl/stream/details.xsl?-recid=104&a::v=2212a4Eaya A Video] presentation of the XGrid (click on movie reel icon to start).
+
|- style="background:#eeeeff" valign="top"
** A very good overview of the XGrid from [http://www.macdevcenter.com/pub/a/mac/2005/08/23/xgrid.html?page=1 macdevcenter.com]
+
| Week 12 <br />  <br />
</bluebox>
+
||
 +
* '''Tuesday'''
 +
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part I
 +
** [[Hadoop_Tutorial_3_--_Hadoop_on_Amazon_AWS | Signing on to Amazon AWS]]
 +
** [[Hadoop_Tutorial_3.1_--_Using_Amazon%27s_WordCount_program | Uploading data to AWS and counting words]]
 +
** [[Hadoop_Tutorial_3.2_--_Using_Your_Own_WordCount_program | Word-counting using Streaming Python on AWS]]
 +
** [[Hadoop_Tutorial_3.3_--_How_Much%3F | Costs of maintaining a Hadoop cluster on AWS]]
 +
* '''Thursday'''
 +
** Continuation of the AWS labs
 +
** [[Hadoop_Tutorial_4:_Start_an_EC2_Instance | Starting an EC2 instance on AWS]]
 +
----
 +
[http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
&nbsp;
 +
<!-- ================================================================== -->
 +
|- style="background:#ffffff" valign="top"
 +
| Week 13 <br />  <br />
 +
||
 +
* '''Tuesday'''
 +
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part II
 +
** [[CSC352 Problem of the Day| Problem of the day]]: discussion
 +
** Work on projects
  
==Cloud Computing==
+
* '''Thursday'''
<blockquote>"''Failure is the defining difference between distributed and local programming''" <br>
+
** Presentation of [http://www.hulu.com/watch/116372/cnbc-originals-inside-the-mind-of-google Inside the Mind of Google]
Ken Arnold, CORBA Designer
+
** wrap up
</blockquote>
+
----
<tanbox>
+
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
__NOTOC__
+
||
===Literature===
+
&nbsp;
* [[Image:hadoopOReilly.jpg | right |100px]] [http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/0596521979  Hadoop, the definitive guide], Tim White, O'Reilly Media, June 2009, ISBN 0596521979.  The Web site for the book is http://www.hadoopbook.com/ (with the data used as examples in the book)
 
* Dean, J., and S. Ghemawat, [http://labs.google.com/papers/mapreduce-osdi04.pdf MapReduce: Simplified Data Processing on Large Clusters], Dec. 2004,  ([[media:MapReduce1204.pdf|cached copy]])
 
*  Czajkowski G., [http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html  Sorting 1 PB with MapReduce], Nov. 2008, ([[media:Sorting1PBWithMapReduce.pdf|cached copy]])
 
* Armbrust M, ''et al'', [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Tech Rep. CB/EECS-2009-28, Feb. 2009 ([[media:AboveTheCloudsBerkeley.pdf|cached copy]])
 
* Olson C. ''et. al.'', [[Media:pigLatinNotSoForeignLanguage.pdf |Pig  Latin: A Not-So-Foreign Language for Data Processing]], SIGMOD’08, June 9–12, 2008, Vancouver, BC, Canada.
 
* Ghemawat S., H. Gobioff, and S.T. Leung, [http://labs.google.com/papers/gfs-sosp2003.pdf The Google File System], SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA.
 
* [http://research.microsoft.com/en-us/collaboration/fourthparadigm/ The Fourth Paradigm: Data-Intensive Scientific Discovery], Microsoft Research, 2009. [http://research.microsoft.com/en-us/collaboration/fourthparadigm/contents.aspx Table of Contents],  ([http://tango.csc.smith.edu/dftwiki/images/4thParadigmMicrosoftGray09.pdf Low-res cached copy]). 
 
  
===Media Reports===
+
|}
* Markoff, J., [[media:DelugeOfDataShapesNewEraInComputing.pdf | A Deluge of Data Shapes a New Era in Computing]], ''New York Times'', 12/15/09
 
  
===Class Material on the Web===
+
=Selected Solutions for papers, homework, or projects=
* [http://code.google.com/edu/submissions/mapreduce-minilecture/listing.html Google]'s series of 4 lectures on map-reduce, distributed file-system, and clustering algorithms.
 
* '''University of Washington''': [http://code.google.com/edu/submissions/uwspr2007_clustercourse/listing.html  Problem Solving on Large Scale Clusters]
 
* '''Brandeis University''': [http://www.cs.brandeis.edu/~cs147a/ Distributed Systems Course]
 
** [http://www.cs.brandeis.edu/~cs147a/lab/hadoop-intro/ Introduction to Hadoop Lab]
 
** [http://www.cs.brandeis.edu/~cs147a/lab/hadoop-singlenode/ Single Node setup Lab]
 
** [http://www.cs.brandeis.edu/~cs147a/lab/hadoop-example/ Hadoop Example Program Lab]
 
** [http://www.cs.brandeis.edu/~cs147a/lab/hadoop-cluster/ Hadoop Cluster Setup Lab]
 
* '''Google''': [http://code.google.com/edu/parallel/mapreduce-tutorial.html Introduction to Parallel Programming and MapReduce]
 
* '''U. C. Berkeley''': [http://code.google.com/edu/submissions/ucberkeley-parallelism/index.html Intro to Parallel Programming and Threading]
 
* '''California PolyTech''': [http://code.google.com/edu/submissions/capolytech-parallel-programming/ A lab on the NetFlix data set]
 
  
===Software/Web Links===
+
* [[Media:Amdahls_Law_in_the_Multicore_Era.pdf | Amdahl's Law in the Multicore Era]] summary
[[Image:HadoopCartoon.png | 100px | right]]
+
* [[CSC352 Project1 Solution | Selected Solutions for Project 1]]
*[http://hadoop.apache.org/common/ Apache's Documentation on Hadoop Common]
 
**[http://hadoop.apache.org/common/docs/current/mapred_tutorial.html The Hadoop Tutorial] from Apache.  A "Must-Do!"
 
**[http://hadoop.apache.org/common/docs/current/streaming.html#Hadoop+Streaming Hadoop Streaming], i.e. using Hadoop with  Python, for example.
 
* [http://developer.yahoo.com/hadoop/tutorial/ A Yahoo Tutorial] on Hadoop.  Another "Must-Do!"
 
* [http://v-lad.org/Tutorials/Hadoop/00%20-%20Intro.html An Hadoop-On-Eclipse] tutorial.  For Windows platform but works for Macs as well.  Best way to setup Eclipse!  You will need Eclipse 3.3.2 and Hadoop 0.19.1.
 
*[http://www.hadoopbook.com/ The Hadoop-Book] Web site.
 
*[http://wiki.apache.org/hadoop/FrontPage The Hadoop Wiki], the authoritative source on working with Hadoop. <font color="purple">Many examples in Java and Python</font>
 
** [http://wiki.apache.org/hadoop/WordCount WordCount]
 
** [http://wiki.apache.org/hadoop/PythonWordCount Python WordCount]
 
** [http://wiki.apache.org/hadoop/C%2B%2BWordCount C++ WordCount]
 
** [http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample How to read and write to HDFS]
 
*[http://code.google.com/edu/parallel/tools/hadoopvm/index.html  Hadoop at Google]: A preconfigured single node instance available at Google.
 
* [http://www.michael-noll.com/wiki/Writing_An_Hadoop_MapReduce_Program_In_Python Writing the WordCount] in Python
 
*[http://code.google.com/edu/parallel/tools/hadoopvm/index.html Guide for setting up IBM's Eclipse Tools for Hadoop] (go to bottom of page)
 
:The IBM MapReduce Tools for Eclipse Plug-in is a robust plug-in that brings Hadoop support to the Eclipse platform. Features include server configuration, support for launching MapReduce jobs and browsing the distributed file system. This setup assumes that you are running Eclipse (version 3.3 or above) on your computer.
 
* [http://www.infosci.cornell.edu/hadoop/mac.html Guide] from Cornell for setting up Hadoop on a Mac.
 
*[http://www.cloudera.com/blog/2009/04/20/configuring-eclipse-for-hadoop-development-a-screencast/ Configuring Eclipse for Hadoop] A video from Cloudera on setting up Hadoop... not easy to follow...
 
* [https://trac.declarativity.net/browser/hadoop-0.19.1-bfs/src/examples/org/apache/hadoop/examples The source code for the examples] that come with the Hadoop 0.19.1 distribution.  Includes WordCount, WordCountAggregate, WordCountHistogram, PiEstimator, Join, and Grep, among others.
 
 
===Videos===
 
* [http://code.google.com/edu/submissions/mapreduce-minilecture/listing.html Google]'s series of 4 lectures on map-reduce, distributed file-system, and clustering algorithms.
 
* [http://jez.blip.tv/file/245701/ A video of Tom White], author of O'Reilly's Hadoop guide, on BlipTV. White outlines the suite of projects centered around Hadoop ( an open source Map / Reduce project)
 
* [http://www.cloudera.com/hadoop-training-basic Cloudera]'s collection of videos. 
 
** [http://www.cloudera.com/hadoop-training-basic Thinking At Scale] <-- Start here!
 
** [http://www.cloudera.com/hadoop-training-basic MapReduce and HDFS]
 
** [http://www.cloudera.com/hadoop-training-basic Hadoop Ecosystem Tour]
 
** [http://www.cloudera.com/hadoop-training-basic Programming with Hadoop]
 
** [http://www.cloudera.com/hadoop-training-basic Introduction to Hive]
 
** [http://www.cloudera.com/hadoop-training-basic Introduction to Pig]
 
** [http://www.cloudera.com/hadoop-training-basic MapReduce Algorithms]
 
** [http://www.cloudera.com/hadoop-training-basic Training Exercises and Tutorials]
 
** [http://www.cloudera.com/hadoop-training-basic Getting Started with Hadoop]
 
** [http://www.cloudera.com/hadoop-training-basic Writing MapReduce Programs]
 
** [http://www.cloudera.com/hadoop-training-basic Hive Tutorial]
 
** [http://www.cloudera.com/hadoop-training-basic Pig Tutorial]
 
* [http://www.hulu.com/watch/116372/cnbc-originals-inside-the-mind-of-google CNBC's report: Inside the Mind of Google]. "The best way to watch “Inside the Mind of Google,” Maria Bartiromo’s report on the Internet giant Thursday on CNBC, is to not watch the first quarter of it. (from Neil enzlinger's 12/02/09 [http://www.nytimes.com/2009/12/03/arts/television/03mind.html NYT review])
 
</tanbox>
 
  
[[CSC352_Notes | <font color="white">Notes</font>]]
+
<br />
</onlysmith>
+
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
[[Category:CSC352]][[Category:Class]][[Category:Schedule]]

Latest revision as of 10:22, 9 August 2013



Main Page | Syllabus | Schedule | Links & Resources



Hadoop-Related

Projects

  • Project 4 has officially started on 1/26/10! and is now Over. Find its shared wiki page here! Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
SmilingPython.png

Python Threads

Week Topics Reading
Week 1
1/25
  • Tuesday
    • Introduce Syllabus
    • Interrupts
Pogoplug.jpg
  • Thursday
    • The PogoPlug...
    • The iPad...
    • Python Review
      • NQueens.py: a python program to find the first solution for N queens on an NxN board.

Read
  • What is a Thread? [1]
  • What is a Processes? [2]
Reference material for when we start programming
For Discussion next Tuesday
  • read the paper by Asanovic K. et al, The Landscape of Parallel Computing Research: A View from Berkeley.
Week 2
2/1

 

Week 3
2/8

Week 4
2/15


XgridLogo.png

XGrid Programming

Week Topics Reading
Week 5

Week 6


Week 7


 

 

SpringBreak.gif

 

Week 8

  • Tuesday
    • Sign-up for paper presentations here!
    • Class participation on the decomposition of Homework 3/Project 2 (the serial part)
  • Thursday
    • Continuation of decomposition of Homework 3/Project 2 (the parallel part)
    • XGrid Lab 3: running jobs on the Science Center XGrid.

 


HadoopCartoon.png

Cloud Computing

Week Topics Reading
Week 9

HadoopOReilly.jpg
  • Section 6 in Tom White's Hadoop, the Definitive Guide, available on Google Books.
Week 10

CSC352HadoopPerformanceMachineLearning.png
WrongTaskTimeline.png

 

Week 11





    • Monitoring a Cluster of Computers as a school of fish (U. Nebraska)




    • The evolution of Hadoop (Code-Swarm)





 

Week 12


 

Week 13


 

Selected Solutions for papers, homework, or projects