Difference between revisions of "CSC352 Class Page 2010"

From dftwiki3
Jump to: navigation, search
(Resources: References & Bibliography)
m (Thiebaut moved page CSC352 Class Page to CSC352 Class Page 2010)
 
(206 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
__TOC__
 +
 +
<br />
 +
<br />
 +
<center>[[CSC352 | Main Page]] | [[CSC352_Syllabus | Syllabus]] | [[CSC352_Class_Page | Schedule]] |
 +
[[CSC352 Resources | Links &amp; Resources]]</center>
 +
<br />
 +
<br />
 +
 +
=Hadoop-Related=
 +
 +
* [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ Hadoop FAQ Page]
 +
* [http://maven.smith.edu/~thiebaut/showhadoopip.php Hadoop IPs]
 +
 +
=Projects=
 +
 +
* [[CSC352 Project 1 | Project 1]]: Started Feb 16, due March 2nd. A [[CSC352 Project1 Solution|good example of a solution]].
 +
* [[CSC352 Project 2 | Project 2]]: Deals with the Xgrid.  A [[CSC352 Project 2 Solution | collection of good proposed solutions]].
 +
* [[CSC352 Project 3 | Project 3]] Deals with processing Wikipedia pages on Hadoop/MapReduce.
 +
 +
* '''Project 4 has officially started on 1/26/10! and is now Over.'''  Find its shared wiki page [http://cs.smith.edu/classwiki/index.php/CSC352_Page#Projects here]!  Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
 +
 +
[[File:SmilingPython.png | 100px | right]]
 +
 
=Python Threads=
 
=Python Threads=
 +
{| style="width:100%" border="1"
 +
|- style="background:#ffdead;"
 +
|'''Week''' || '''Topics''' || '''Reading'''
 +
 +
<!-- ================================================================== -->
 +
|-valign="top"
 +
|width="15%"| Week 1 <br /> 1/25
 +
|width="60%"|
 +
* '''Tuesday'''
 +
** Introduce Syllabus
 +
** Interrupts
 +
[[Image:pogoplug.jpg | right]]
 +
* '''Thursday'''
 +
** The [[PogoPlug]]...
 +
** The [http://www.apple.com/ipad/ iPad]...
 +
** Python Review
 +
*** [[NQueens.py | NQueens.py]]: a python program to find the first solution for ''N'' queens on an ''N''xN board.
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
;Read:
 +
* What is a Thread? [http://en.wikipedia.org/wiki/Thread_(computer_science)]
 +
* What is a Processes? [http://en.wikipedia.org/wiki/Process_(computing)]
 +
;Reference material for when we start programming:
 +
* [http://www.python.org/doc/2.5.2/lib/thread-objects.html Python.org's page] on Thread Objects.  It's the reference, but is not always crystal clear...
 +
* [http://linuxgazette.net/107/pai.html Linux Gazette Article] on Python Threads.  Good overall introduction.
 +
* [http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf Norm Matloff's] introduction to Python Threads.  More details, less easy to follow.
 +
;For Discussion next Tuesday
 +
*read  the paper by  Asanovic K. ''et al'', The Landscape of Parallel Computing Research: A View from Berkeley.
 +
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| Week 2 <br /> 2/1<br />
 +
||
 +
* '''Tuesday'''
 +
** discussion (prepare a 1-page summary) of the paper by  Asanovic K. ''et al'', [http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf The Landscape of Parallel Computing Research: A View from Berkeley], Dec. 2006. ([[media:LandscapeParallelProcessingBerkeley1206.pdf|cached copy]])
 +
 +
* '''Thursday'''
 +
** Python
 +
*** [[classExample1.py | classExample1.py ]]
 +
*** [[classExample2.py | classExample2.py ]]
 +
*** [[classExample3.py | classExample3.py ]]
 +
*** [[serialPing.py | serialPing.py ]]
 +
** Python Threads
 +
*** [[threadedPing.py | threadedPing.py ]]: a threaded solution to the serial ping program
 +
*** [[ThreadedNQueens.py | ThreadedNQueens.py ]]: a threaded solution to the N-Queens program
 +
 +
----
 +
* [[CSC352 Homework 1 | Homework #1]]
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
&nbsp;
 +
* Make sure to read Norman Matloff's tutorial on threads in the [[CSC352_Resources#Documentation_on_Python_Threads|
 +
section on Python]] on the [[CSC352_Resources | Resource]] Page.
 +
* Also, don't hesitate to check [http://python.org Python.org] for information on threads.
 +
<!-- ================================================================== -->
 +
|- style="background:#ffffff" valign="top"
 +
| Week 3 <br /> 2/8 <br />
 +
||
 +
* '''Tuesday'''
 +
** An experiment with Control-C...
 +
** Sharing in shared memory
 +
*** the problem
 +
*** atomicity of operations
 +
*** semaphores
 +
**** fair
 +
**** safe
 +
**** live
 +
*** Python locks
 +
*** examples:
 +
**** [[CS352_threadedpingWithLocks.py | threadedPingWithLocks.py]]
 +
**** [[CS352_threadedpingWithSemaphores.py | threadedPingWithSemaphores.py]]
 +
* '''Thursday'''
 +
** Discussion: "Why Von Neumann?"
 +
** Continuation of Shared Memory and Sharing
 +
*** Python Queues
 +
**** [[CS352_threadedpingWithQueues.py | threadedPingWithQueues.py]]
 +
** Some [[CSC352 Notes on the Python GIL| notes]] on Python and treads
 +
** An alternative to Threading: [http://docs.python.org/library/multiprocessing.html Multiprocessing]
 +
*** Available with Python 2.6 and up only
 +
*** [[CSC352_multiprocessingNQueens.py | multiprocessingNQueens.py]]
 +
*** [[CSC352 Comparison threading to multiprocessing| Comparison]] of the two methods
 +
** Performance measures
 +
*** Speedup
 +
*** Execution time
 +
*** Processor utilization
 +
*** Processor efficiency
 +
*** Processor efficacy
 +
** Guidlines for presenting research papers
 +
** Deadlocks
 +
 +
----
 +
* [[CSC352 Homework 2 | Homework #2]], [[CSC352 Homework 2 Solution 1 | Solution 1]] and [[CSC352 Homework 2 Solution 2 | Solution 2 ]]
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
* [http://maven.smith.edu/~thiebaut/transputer/chapter8/chap8-1.html Measuring Performance]
 +
* [http://insidehpc.com/2010/01/05/sun-video-tutorial-optimizing-performance-in-parallel-processing/ Video] from Sun Microsystems on parallelizing applications and  improving their performance (19 minutes)
 +
* [http://en.wikipedia.org/wiki/Deadlock Deadlocks]
 +
* [http://www.python.org/dev/peps/pep-0371/ PEP 371] ([[Media:Python_PEP0371.pdf|cached copy]])
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| Week 4 <br /> 2/15 <br />
 +
||
 +
* '''Tuesday'''
 +
** Discusion of [[media:XenAndTheArtOfVirtualization_3.pdf | Xen and the Art of Virtualization]] (Presentation by '''Le''')
 +
*** One person presents
 +
*** Everybody else submits a 1-page summary in 3 parts.
 +
** Review of HW #1
 +
** A visit of the XGrid system ([http://www.facebook.com/pages/Northampton-MA/Computer-Science-Smith-College/264041891883?ref=ts Photos])
 +
* '''Thursday'''
 +
** Python Q&A
 +
** Amdahl's Law
 +
** Deadlocks: the main rule
 +
** [[How to Read Technical Papers]]
 +
** Summarize [[Media:TechnologyBitsNYT02162010.pdf| this!]]
 +
** Good background information: [http://en.wikipedia.org/wiki/Message_Passing_Interface#Example_program MPI]
 +
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
* Additional information on Xen can be found in  Mauer, R., [http://www.linuxjournal.com/article/8812 Xen Virtualization and Linux Clustering], [http://www.linuxjournal.com Linux Journal] January 12th, 2006.
 +
* A [http://blip.tv/file/2232410 video] on the Python GIL discovered by Diana.
 +
* A [http://oblong.com/ video] by Oblong Industries (in reference to the NYT article to summarize)
 +
 +
 +
|}
 +
 +
[[File:XgridLogo.png| right | 100px | link=http://rocinante.smith.edu/ganglia/]]
  
 
=XGrid Programming=
 
=XGrid Programming=
 +
 +
{| style="width:100%" border="1"
 +
|- style="background:#ffdead;"
 +
|'''Week''' || '''Topics''' || '''Reading'''
 +
 +
<!-- ================================================================== -->
 +
|-valign="top"
 +
|width="15%"| Week 5 <br /> 
 +
|width="60%"|
 +
* '''Tuesday'''
 +
** [[CSC352 Introduction to the Projects | Introduction to the next projects]]
 +
** [http://data.scl.utah.edu/fmi/xsl/stream/details.xsl?-recid=104&a::v=2212a4Eaya A Video presentation of the XGrid] (watch first 10 minutes)
 +
** [[XGrid Tutorial Part 1: Monte Carlo | Tutorial/Lab #1: Monte Carlo on the XGrid]]
 +
* '''Thursday''':
 +
** Presentation and discussion of [http://www.cs.wisc.edu/multifacet/papers/ieeecomputer08_amdahl_multicore.pdf Amdahl's Law in the Multicore Era], Mark Hill and Michael Marty, IEEE Computer, July 2008, and accompanying [http://www.cs.wisc.edu/multifacet/amdahl/ dynamic graph]. ([[Media:ieeecomputer08_amdahl_multicore.pdf |cached copy]])
 +
** [[XGrid Tutorial Part 1: Monte Carlo | Continuation of the Xgrid Monte-Carlo tutorial]]
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
* Apple's [http://developer.apple.com/mac/library/documentation/MacOSXServer/Conceptual/Xgrid_Programming_Guide/Introduction/Introduction.html XGrid Introduction]
 +
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| Week 6 <br /> <br />
 +
||
 +
* '''Tuesday'''
 +
** Presentation of Projects #2 and #3
 +
*** [[CSC352_Resources#Videos:_Big_Data_and_Analytics | Videos]]
 +
*** Pres by DT
 +
**** [http://jonudell.net/udell/gems/umlaut/umlaut.html John Udell's capture of the page ''Heavy Metal Umlaut'']
 +
**** [http://meta.wikimedia.org/wiki/Research Wikipedia Research Projects]
 +
*** [[CSC352_Project_2 | Intro to Projects #2 and #3]] ([http://cs.smith.edu/~thiebaut/freevideos/BigData2.swf  presentation])
 +
* '''Thursday'''
 +
** Canceled by DT
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
 +
||
 +
* Read  Hughes, B., [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.7248&rep=rep1&type=pdf Building Computational Grids with Apple's XGrid Middleware],  ''ACM International Conference Proceeding Series'', Vol. 167, Hobart, Tasmania, Australia, 2006. For next Tuesday.
 +
<!-- ================================================================== -->
 +
|- style="background:#ffffff" valign="top"
 +
| Week 7 <br />  <br />
 +
||
 +
* '''Tuesday'''
 +
** Presentation and discussion of  [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.7248&rep=rep1&type=pdf Building Computational Grids with Apple's XGrid Middleware], by Hughes, B. (''ACM International Conference Proceeding Series'', Vol. 167, Hobart, Tasmania, Australia, 2006.)  ([[media:buildingComputationalGrids.pdf|cached copy]])
 +
** Define Homework #3
 +
** Guidelines for projects. ([http://cs.smith.edu/~thiebaut/freevideos/WhatsInAProject.swf Presentation]) (We stopped on the bell-curve).
 +
* '''Thursday'''
 +
** Guidelines for projects. ([http://cs.smith.edu/~thiebaut/freevideos/WhatsInAProject.swf Presentation])
 +
** [[XGrid Tutorial Part 2: Processing Wikipedia Pages | XGrid Lab 2]]
 +
** Scheduling
 +
*** Processor Scheduling (OS)
 +
*** Multiprocessor Scheduling
 +
*** XGrid Scheduling
 +
 +
----
 +
* [[CSC352 Homework 3 | Homework #3 ]] and its [[CSC352 Homework 3 Solution | solution programs]].
 +
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
&nbsp;
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| &nbsp; <br />  <br />
 +
||
 +
<center>[[Image:SpringBreak.gif |  150px]]</center>
 +
||
 +
&nbsp;
 +
<!-- ================================================================== -->
 +
|- style="background:#ffffff" valign="top"
 +
| Week 8 <br />  <br />
 +
||
 +
* '''Tuesday'''
 +
** Sign-up for paper presentations [http://cs.smith.edu/classwiki/index.php/CSC352_Sign-Up_Sheet_for_Paper_Presentations here]!
 +
** Class participation on the decomposition of Homework 3/Project 2 (the serial part)
 +
* '''Thursday'''
 +
** Continuation of decomposition of Homework 3/Project 2 (the parallel part)
 +
** [[XGrid Tutorial Part 3: Monte Carlo on the Science Center XGrid | XGrid Lab 3]]: running jobs on the Science Center XGrid.
 +
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
&nbsp;
 +
|}
 +
 +
<!--
 +
        _                      _
 +
  ___| |  ___  _  _  __| |
 +
/ __|  |/ _ \ | |  |  |/ _` |
 +
| (__|  | (_)  | |_|  | (_| |
 +
\___|_|\___/\__,_|\__,_|
 +
                         
 +
-->
 +
 +
[[File:HadoopCartoon.png|right|100px]]
  
 
=Cloud Computing=
 
=Cloud Computing=
 +
{| style="width:100%" border="1"
 +
|- style="background:#ffdead;"
 +
|'''Week''' || '''Topics''' || '''Reading'''
  
=Resources: References &amp; Bibliography=
+
<!-- ================================================================== -->
 +
|-valign="top"
 +
|width="15%"| Week 9 <br /> 
 +
|width="60%"|
 +
* '''Tuesday'''
 +
** Presentation of [http://labs.google.com/papers/mapreduce-osdi04.pdf MapReduce: Simplified Data Processing on Large Clusters] (Yang)
 +
** [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] (We stopped at Section 4).
 +
* '''Thursday'''
 +
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] and [[Hadoop_Tutorial_1_--_Running_WordCount | Tutorial #1]], [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial #2]]
  
+
----
<onlysmith>
+
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
==Parallel Processing/Good background information==
+
* [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ  Hadoop Howtos and FAQs]
* Asanovic K. ''et al'', [http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf The Landscape of Parallel Computing Research: A View from Berkeley], Dec. 2006. ([[media:LandscapeParallelProcessingBerkeley1206.pdf|cached copy]])
+
||
* Xen
+
[[File:HadoopOReilly.jpg | 70px | right]]
** Mauer, R., [http://www.linuxjournal.com/article/8812 Xen Virtualization and Linux Clustering], [http://www.linuxjournal.com Linux Journal] January 12th, 2006
+
* [http://hadoop.apache.org/common/docs/current/mapred_tutorial.html Map-Reduce tutorial] from apache.org: a must-read!
** Barham P., ''et al.'', [[media:XenAndTheArtOfVirtualization_3.pdf | Xen and the Art of Virtualization]], University of Cambridge Computer Laboratory 15 JJ Thomson Avenue, Cambridge, UK, CB3 0FD
+
* [http://developer.yahoo.com/hadoop/tutorial/module4.html Map-Reduce Basics] from Yahoo.com: another must-read!
* AMD News
 
**  Hardwidge, B., [http://www.bit-tech.net/custompc/news/605374/amd-plans-supercomputer-with-1000-gpus.html AMD plans supercomputer with 1,000 GPUs], Jan. 2009, [http://www.bit-tech.net bit-tech.net] (or graphics goes to the clouds!)
 
** Halfacree G., [http://www.bit-tech.net/news/hardware/2009/11/17/amd-supercomputer-tops-top500-list/1 AMD supercomputer tops TOP500 list], November 2009, [http://www.bit-tech.net bit-tech.net] (or Intel gets a black eye!)
 
  
==Python==
+
* Section 6 in Tom White's ''Hadoop, the Definitive Guide'', available on [http://books.google.com/books?id=bKPEwR-Pt6EC&printsec=frontcover&dq=hadoop+definitive+guide&source=bl&ots=kOdw-xf9Gg&sig=GyHDzyATSbMVcPysVbSAQKuhv58&hl=en&ei=YJ-0S6HSOIS0lQfm3u1q&sa=X&oi=book_result&ct=result&resnum=6&ved=0CB0Q6AEwBQ#v=onepage&q=&f=false Google Books].
<greenbox>
+
<!-- ================================================================== -->
[[Image:smilingPython.png| right| 100px]]
+
|- style="background:#eeeeff" valign="top"
* [http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf Norman Matloff and Francis Hsu's Tutorial] on Python Threads (University of California, Davis) ([[media:matlof_PythonTutorial.pdf|cached copy]])
+
| Week 10 <br /> <br />
* [http://linuxgazette.net/107/pai.html Understanding Threading in Python], Krishna G Pai, Linux Gazette, Oct. 2004
+
||
* [http://www.python.org/doc/2.3.5/lib/thread-objects.html Thread Objects] from [http://www.python.org Python.Org]
+
[[Image:CSC352HadoopPerformanceMachineLearning.png| right|150px]]
</greenbox>
+
* '''Tuesday'''
 +
** Presentation of [http://www.icsi.berkeley.edu/~arlo/publications/gillick_cs262a_proj.pdf MapReduce: Distributed Computing for Machine Learning]
 +
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]]
 +
*** [[Hadoop_Tutorial_1_--_Running_WordCount | Tutorial #1]],
 +
*** [[Hadoop_Tutorial_1_--_Running_WordCount#Analyzing_the_Hadoop_Logs | Tutorial #1: output logs]]
 +
***  [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial #1.1: Task Timelines]]
 +
[[Image:WrongTaskTimeline.png| 200px|right]]
 +
* '''Thursday'''
 +
** Question of the day: What's wrong with this picture?
 +
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]]
 +
** Lab for today:
 +
*** Compare WordCount on 1 vs WordCount on 6 [[Hadoop_Tutorial_1_--_Running_WordCount#Moment_of_Truth:_Compare_5-PC_Hadoop_cluster_to_1_Linux_PC | Section 5 of Tutorial 1]]
 +
*** Create your own version of the Java WordCount program [[Hadoop_Tutorial_1_--_Running_WordCount#Running_Your_Own_Version_of_WordCount.java | Section 4 of Tutorial 1]]
 +
*** Create your own Counters [[Hadoop_Tutorial_1_--_Running_WordCount#Counters | Section 6 of Tutorial 1]]: count Buck!
 +
*** Generate Timelines [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial 1.1]]
 +
*** Counting words in Python [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial 2]]
 +
----
 +
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
* [[CSC352 Homework 4 | Homework #4]] and [[CSC352 Homework 4 Solution | a solution]].
 +
||
 +
&nbsp;
  
==XGrid==
+
<!-- ================================================================== -->
<bluebox>
+
|- style="background:#ffffff" valign="top"
__NOTOC__
+
| Week 11 <br />  <br />
[[Image:xgridLogo.png | right|100px]]
+
||
* [http://tango.csc.smith.edu/classwiki/index.php/Xgrid_Programming Programming Examples, Setup, and References]
+
* '''Tuesday'''
 +
** Presentation of [http://portal.acm.org/ft_gateway.cfm?id=1629198&type=pdf&coll=GUIDE&dl=GUIDE&CFID=82739837&CFTOKEN=94683258 MapReduce: A Flexible Data Processing Tool] ([[media:MapReduceFlexibleDataProcessingTool.pdf | cached copy]])
 +
** Compare with  Paulson, Rasin, Abadi, DeWitt, Madden, and Stonebraker's paper [[Media:ComparisonOfApproachesToLargeScaleDataAnalysis.pdf |A Comparison of Approaches to Large Scale Data-Analysis]], SIGMOD-09, June 2009.
 +
* '''Thursday'''
 +
** [http://en.wikipedia.org/wiki/Hypertable Hypertable] is an open-source project parallel to Google's BigTable...
 +
** Art vs Science...
 +
** Some preliminary thinking about the final project...
 +
** [[Hadoop_Tutorial_2.1_--_Streaming_XML_Files | Streaming whole files]]
 +
** [[Hadoop Tutorial 2.2 -- Running C++ Programs on Hadoop | WordCount in C++]]
 +
**  Visualizations of Hadoop Data Transfers, from the U. of Nebraska ([http://www.google.com/search?q=university+of+Nebraska+hadoop+visualization&hl=en&safe=off&tbs=vid:1&tbo=u&ei=oKO4S6GMCoH7lwfq88SXCg&sa=X&oi=video_result_group&ct=title&resnum=1&ved=0CBEQqwQwAA more videos])
 +
<br /><br /><center><videoflash>qoBoEzOkeDQ</videoflash></center><br /><br />
 +
** Monitoring a Cluster of Computers as a school of fish (U. Nebraska)
 +
<br /><br /><center><videoflash>LM1j_8sWSEk</videoflash></center><br /><br />
 +
** The evolution of Hadoop (Code-Swarm)
 +
<br /><br /><center><videoflash type="vimeo">2513321</videoflash></center><br /><br />
  
* What's an XGrid system?
+
----
** [http://data.scl.utah.edu/fmi/xsl/stream/details.xsl?-recid=104&a::v=2212a4Eaya A Video] presentation of the XGrid (click on movie reel icon to start).
+
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
** A very good overview of the XGrid from [http://www.macdevcenter.com/pub/a/mac/2005/08/23/xgrid.html?page=1 macdevcenter.com]
+
* A [[CSC352 Project 2 Solution | solution]] for Project 2 has been posted!
</bluebox>
+
||
 +
* [http://developer.yahoo.com/hadoop/tutorial/index.html Hadoop Tutorial from Yahoo Developer Network (YDN)]
 +
&nbsp;
  
==Cloud Computing==
+
<!-- ================================================================== -->
<blockquote>"''Failure is the defining difference between distributed and local programming''" <br>
+
|- style="background:#eeeeff" valign="top"
Ken Arnold, CORBA Designer
+
| Week 12 <br /> <br />
</blockquote>
+
||
<tanbox>
+
* '''Tuesday'''
__NOTOC__
+
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part I
===Literature===
+
** [[Hadoop_Tutorial_3_--_Hadoop_on_Amazon_AWS | Signing on to Amazon AWS]]
* [[Image:hadoopOReilly.jpg | right |100px]] [http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/0596521979  Hadoop, the definitive guide], Tim White, O'Reilly Media, June 2009, ISBN 0596521979. The Web site for the book is http://www.hadoopbook.com/ (with the data used as examples in the book)
+
** [[Hadoop_Tutorial_3.1_--_Using_Amazon%27s_WordCount_program | Uploading data to AWS and counting words]]
* Dean, J., and S. Ghemawat, [http://labs.google.com/papers/mapreduce-osdi04.pdf MapReduce: Simplified Data Processing on Large Clusters], Dec. 2004,  ([[media:MapReduce1204.pdf|cached copy]])
+
** [[Hadoop_Tutorial_3.2_--_Using_Your_Own_WordCount_program | Word-counting using Streaming Python on AWS]]
Czajkowski G., [http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html Sorting 1 PB with MapReduce], Nov. 2008, ([[media:Sorting1PBWithMapReduce.pdf|cached copy]])
+
** [[Hadoop_Tutorial_3.3_--_How_Much%3F | Costs of maintaining a Hadoop cluster on AWS]]
* Armbrust M, ''et al'', [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Tech Rep. CB/EECS-2009-28, Feb. 2009 ([[media:AboveTheCloudsBerkeley.pdf|cached copy]])
+
* '''Thursday'''
* Olson C. ''et. al.'', [[Media:pigLatinNotSoForeignLanguage.pdf |Pig  Latin: A Not-So-Foreign Language for Data Processing]], SIGMOD’08, June 9–12, 2008, Vancouver, BC, Canada.
+
** Continuation of the AWS labs
* Ghemawat S., H. Gobioff, and S.T. Leung, [http://labs.google.com/papers/gfs-sosp2003.pdf The Google File System], SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA.
+
** [[Hadoop_Tutorial_4:_Start_an_EC2_Instance | Starting an EC2 instance on AWS]]
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
&nbsp;
 +
<!-- ================================================================== -->
 +
|- style="background:#ffffff" valign="top"
 +
| Week 13 <br /> <br />
 +
||
 +
* '''Tuesday'''
 +
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part II
 +
** [[CSC352 Problem of the Day| Problem of the day]]: discussion
 +
** Work on projects
  
===Class Material on the Web===
+
* '''Thursday'''
* [http://code.google.com/edu/submissions/uwspr2007_clustercourse/listing.html University of Washington: Problem Solving on Large Scale Clusters]:
+
** Presentation of [http://www.hulu.com/watch/116372/cnbc-originals-inside-the-mind-of-google Inside the Mind of Google]
: The University of Washington ran an upper-division course on Distributed Computing with MapReduce in Spring 2007. Below you'll find the materials that were used for the class: five lectures in powerpoint format, as well as four lab exercises designed which were completed by students over the duration of the course, using a cluster running Hadoop.
+
** wrap up
* [http://www.cs.brandeis.edu/~cs147a/ Distributed Systems Course] at Brandeis U.
+
----
** [http://www.cs.brandeis.edu/~cs147a/lab/hadoop-intro/ Introduction to Hadoop Lab]
+
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
** [http://www.cs.brandeis.edu/~cs147a/lab/hadoop-singlenode/ Single Node setup Lab]
+
||
** [http://www.cs.brandeis.edu/~cs147a/lab/hadoop-example/ Hadoop Example Program Lab]
+
&nbsp;
** [http://www.cs.brandeis.edu/~cs147a/lab/hadoop-cluster/ Hadoop Cluster Setup Lab]
 
* [http://code.google.com/edu/parallel/mapreduce-tutorial.html Google's Introduction to Parallel Programming and MapReduce]
 
* [http://code.google.com/edu/submissions/ucberkeley-parallelism/index.html Intro to Parallel Programming and Threading] from U. C. Berkeley
 
  
===Software/Web Links===
+
|}
[[Image:HadoopCartoon.png | 100px | right]]
 
*[http://hadoop.apache.org/common/ Apache's Documentation on Hadoop Common]
 
**[http://hadoop.apache.org/common/docs/current/mapred_tutorial.html The Hadoop Tutorial] from Apache.  A "Must-Do!"
 
**[http://hadoop.apache.org/common/docs/current/streaming.html#Hadoop+Streaming Hadoop Streaming], i.e. using Hadoop with  Python, for example.
 
* [http://developer.yahoo.com/hadoop/tutorial/ A Yahoo Tutorial] on Hadoop.  Another "Must-Do!"
 
*[http://www.hadoopbook.com/ The Hadoop-Book] Web site.
 
*[http://wiki.apache.org/hadoop/FrontPage The Hadoop Wiki], the authoritative source on working with Hadoop. <font color="purple">Many examples in Java and Python</font>
 
** [http://wiki.apache.org/hadoop/WordCount WordCount]
 
** [http://wiki.apache.org/hadoop/PythonWordCount Python WordCount]
 
** [http://wiki.apache.org/hadoop/C%2B%2BWordCount C++ WordCount]
 
** [http://wiki.apache.org/hadoop/HadoopDfsReadWriteExample How to read and write to HDFS]
 
*[http://code.google.com/edu/parallel/tools/hadoopvm/index.html  Hadoop at Google]: A preconfigured single node instance available at Google.
 
*[http://code.google.com/edu/parallel/tools/hadoopvm/index.html Guide for setting up IBM's Eclipse Tools for Hadoop] (go to bottom of page)
 
:The IBM MapReduce Tools for Eclipse Plug-in is a robust plug-in that brings Hadoop support to the Eclipse platform. Features include server configuration, support for launching MapReduce jobs and browsing the distributed file system. This setup assumes that you are running Eclipse (version 3.3 or above) on your computer.
 
* [http://www.infosci.cornell.edu/hadoop/mac.html Guide] from Cornell for setting up Hadoop on a Mac.
 
*[http://www.cloudera.com/blog/2009/04/20/configuring-eclipse-for-hadoop-development-a-screencast/ Configuring Eclipse for Hadoop] A video from Cloudera on setting up Hadoop... not easy to follow...
 
  
===Videos===
+
=Selected Solutions for papers, homework, or projects=
* [http://jez.blip.tv/file/245701/ A video of Tom White], author of O'Reilly's Hadoop guide, on BlipTV. White outlines the suite of projects centered around Hadoop ( an open source Map / Reduce project)
 
* [http://www.cloudera.com/hadoop-training-basic Cloudera]'s collection of videos. 
 
** [http://www.cloudera.com/hadoop-training-basic Thinking At Scale]  <-- Start here!
 
** [http://www.cloudera.com/hadoop-training-basic MapReduce and HDFS]
 
** [http://www.cloudera.com/hadoop-training-basic Hadoop Ecosystem Tour]
 
** [http://www.cloudera.com/hadoop-training-basic Programming with Hadoop]
 
** [http://www.cloudera.com/hadoop-training-basic Introduction to Hive]
 
** [http://www.cloudera.com/hadoop-training-basic Introduction to Pig]
 
** [http://www.cloudera.com/hadoop-training-basic MapReduce Algorithms]
 
** [http://www.cloudera.com/hadoop-training-basic Training Exercises and Tutorials]
 
** [http://www.cloudera.com/hadoop-training-basic Getting Started with Hadoop]
 
** [http://www.cloudera.com/hadoop-training-basic Writing MapReduce Programs]
 
** [http://www.cloudera.com/hadoop-training-basic Hive Tutorial]
 
** [http://www.cloudera.com/hadoop-training-basic Pig Tutorial]
 
  
</tanbox>
+
* [[Media:Amdahls_Law_in_the_Multicore_Era.pdf | Amdahl's Law in the Multicore Era]] summary
 +
* [[CSC352 Project1 Solution | Selected Solutions for Project 1]]
  
[[CSC352_Notes | <font color="white">Notes</font>]]
+
<br />
</onlysmith>
+
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
[[Category:CSC352]][[Category:Class]][[Category:Schedule]]

Latest revision as of 10:22, 9 August 2013



Main Page | Syllabus | Schedule | Links & Resources



Hadoop-Related

Projects

  • Project 4 has officially started on 1/26/10! and is now Over. Find its shared wiki page here! Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
SmilingPython.png

Python Threads

Week Topics Reading
Week 1
1/25
  • Tuesday
    • Introduce Syllabus
    • Interrupts
Pogoplug.jpg
  • Thursday
    • The PogoPlug...
    • The iPad...
    • Python Review
      • NQueens.py: a python program to find the first solution for N queens on an NxN board.

Read
  • What is a Thread? [1]
  • What is a Processes? [2]
Reference material for when we start programming
For Discussion next Tuesday
  • read the paper by Asanovic K. et al, The Landscape of Parallel Computing Research: A View from Berkeley.
Week 2
2/1

 

Week 3
2/8

Week 4
2/15


XgridLogo.png

XGrid Programming

Week Topics Reading
Week 5

Week 6


Week 7


 

 

SpringBreak.gif

 

Week 8

  • Tuesday
    • Sign-up for paper presentations here!
    • Class participation on the decomposition of Homework 3/Project 2 (the serial part)
  • Thursday
    • Continuation of decomposition of Homework 3/Project 2 (the parallel part)
    • XGrid Lab 3: running jobs on the Science Center XGrid.

 


HadoopCartoon.png

Cloud Computing

Week Topics Reading
Week 9

HadoopOReilly.jpg
  • Section 6 in Tom White's Hadoop, the Definitive Guide, available on Google Books.
Week 10

CSC352HadoopPerformanceMachineLearning.png
WrongTaskTimeline.png

 

Week 11





    • Monitoring a Cluster of Computers as a school of fish (U. Nebraska)




    • The evolution of Hadoop (Code-Swarm)





 

Week 12


 

Week 13


 

Selected Solutions for papers, homework, or projects