Difference between revisions of "CSC352 Class Page 2010"

From dftwiki3
Jump to: navigation, search
(Cloud Computing)
m (Thiebaut moved page CSC352 Class Page to CSC352 Class Page 2010)
 
(241 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
__TOC__
 +
 +
<br />
 +
<br />
 +
<center>[[CSC352 | Main Page]] | [[CSC352_Syllabus | Syllabus]] | [[CSC352_Class_Page | Schedule]] |
 +
[[CSC352 Resources | Links &amp; Resources]]</center>
 +
<br />
 +
<br />
 +
 +
=Hadoop-Related=
 +
 +
* [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ Hadoop FAQ Page]
 +
* [http://maven.smith.edu/~thiebaut/showhadoopip.php Hadoop IPs]
 +
 +
=Projects=
 +
 +
* [[CSC352 Project 1 | Project 1]]: Started Feb 16, due March 2nd. A [[CSC352 Project1 Solution|good example of a solution]].
 +
* [[CSC352 Project 2 | Project 2]]: Deals with the Xgrid.  A [[CSC352 Project 2 Solution | collection of good proposed solutions]].
 +
* [[CSC352 Project 3 | Project 3]] Deals with processing Wikipedia pages on Hadoop/MapReduce.
 +
 +
* '''Project 4 has officially started on 1/26/10! and is now Over.'''  Find its shared wiki page [http://cs.smith.edu/classwiki/index.php/CSC352_Page#Projects here]!  Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
 +
 +
[[File:SmilingPython.png | 100px | right]]
 +
 
=Python Threads=
 
=Python Threads=
 +
{| style="width:100%" border="1"
 +
|- style="background:#ffdead;"
 +
|'''Week''' || '''Topics''' || '''Reading'''
 +
 +
<!-- ================================================================== -->
 +
|-valign="top"
 +
|width="15%"| Week 1 <br /> 1/25
 +
|width="60%"|
 +
* '''Tuesday'''
 +
** Introduce Syllabus
 +
** Interrupts
 +
[[Image:pogoplug.jpg | right]]
 +
* '''Thursday'''
 +
** The [[PogoPlug]]...
 +
** The [http://www.apple.com/ipad/ iPad]...
 +
** Python Review
 +
*** [[NQueens.py | NQueens.py]]: a python program to find the first solution for ''N'' queens on an ''N''xN board.
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
;Read:
 +
* What is a Thread? [http://en.wikipedia.org/wiki/Thread_(computer_science)]
 +
* What is a Processes? [http://en.wikipedia.org/wiki/Process_(computing)]
 +
;Reference material for when we start programming:
 +
* [http://www.python.org/doc/2.5.2/lib/thread-objects.html Python.org's page] on Thread Objects.  It's the reference, but is not always crystal clear...
 +
* [http://linuxgazette.net/107/pai.html Linux Gazette Article] on Python Threads.  Good overall introduction.
 +
* [http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf Norm Matloff's] introduction to Python Threads.  More details, less easy to follow.
 +
;For Discussion next Tuesday
 +
*read  the paper by  Asanovic K. ''et al'', The Landscape of Parallel Computing Research: A View from Berkeley.
 +
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| Week 2 <br /> 2/1<br />
 +
||
 +
* '''Tuesday'''
 +
** discussion (prepare a 1-page summary) of the paper by  Asanovic K. ''et al'', [http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf The Landscape of Parallel Computing Research: A View from Berkeley], Dec. 2006. ([[media:LandscapeParallelProcessingBerkeley1206.pdf|cached copy]])
 +
 +
* '''Thursday'''
 +
** Python
 +
*** [[classExample1.py | classExample1.py ]]
 +
*** [[classExample2.py | classExample2.py ]]
 +
*** [[classExample3.py | classExample3.py ]]
 +
*** [[serialPing.py | serialPing.py ]]
 +
** Python Threads
 +
*** [[threadedPing.py | threadedPing.py ]]: a threaded solution to the serial ping program
 +
*** [[ThreadedNQueens.py | ThreadedNQueens.py ]]: a threaded solution to the N-Queens program
 +
 +
----
 +
* [[CSC352 Homework 1 | Homework #1]]
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
&nbsp;
 +
* Make sure to read Norman Matloff's tutorial on threads in the [[CSC352_Resources#Documentation_on_Python_Threads|
 +
section on Python]] on the [[CSC352_Resources | Resource]] Page.
 +
* Also, don't hesitate to check [http://python.org Python.org] for information on threads.
 +
<!-- ================================================================== -->
 +
|- style="background:#ffffff" valign="top"
 +
| Week 3 <br /> 2/8 <br />
 +
||
 +
* '''Tuesday'''
 +
** An experiment with Control-C...
 +
** Sharing in shared memory
 +
*** the problem
 +
*** atomicity of operations
 +
*** semaphores
 +
**** fair
 +
**** safe
 +
**** live
 +
*** Python locks
 +
*** examples:
 +
**** [[CS352_threadedpingWithLocks.py | threadedPingWithLocks.py]]
 +
**** [[CS352_threadedpingWithSemaphores.py | threadedPingWithSemaphores.py]]
 +
* '''Thursday'''
 +
** Discussion: "Why Von Neumann?"
 +
** Continuation of Shared Memory and Sharing
 +
*** Python Queues
 +
**** [[CS352_threadedpingWithQueues.py | threadedPingWithQueues.py]]
 +
** Some [[CSC352 Notes on the Python GIL| notes]] on Python and treads
 +
** An alternative to Threading: [http://docs.python.org/library/multiprocessing.html Multiprocessing]
 +
*** Available with Python 2.6 and up only
 +
*** [[CSC352_multiprocessingNQueens.py | multiprocessingNQueens.py]]
 +
*** [[CSC352 Comparison threading to multiprocessing| Comparison]] of the two methods
 +
** Performance measures
 +
*** Speedup
 +
*** Execution time
 +
*** Processor utilization
 +
*** Processor efficiency
 +
*** Processor efficacy
 +
** Guidlines for presenting research papers
 +
** Deadlocks
 +
 +
----
 +
* [[CSC352 Homework 2 | Homework #2]], [[CSC352 Homework 2 Solution 1 | Solution 1]] and [[CSC352 Homework 2 Solution 2 | Solution 2 ]]
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
* [http://maven.smith.edu/~thiebaut/transputer/chapter8/chap8-1.html Measuring Performance]
 +
* [http://insidehpc.com/2010/01/05/sun-video-tutorial-optimizing-performance-in-parallel-processing/ Video] from Sun Microsystems on parallelizing applications and  improving their performance (19 minutes)
 +
* [http://en.wikipedia.org/wiki/Deadlock Deadlocks]
 +
* [http://www.python.org/dev/peps/pep-0371/ PEP 371] ([[Media:Python_PEP0371.pdf|cached copy]])
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| Week 4 <br /> 2/15 <br />
 +
||
 +
* '''Tuesday'''
 +
** Discusion of [[media:XenAndTheArtOfVirtualization_3.pdf | Xen and the Art of Virtualization]] (Presentation by '''Le''')
 +
*** One person presents
 +
*** Everybody else submits a 1-page summary in 3 parts.
 +
** Review of HW #1
 +
** A visit of the XGrid system ([http://www.facebook.com/pages/Northampton-MA/Computer-Science-Smith-College/264041891883?ref=ts Photos])
 +
* '''Thursday'''
 +
** Python Q&A
 +
** Amdahl's Law
 +
** Deadlocks: the main rule
 +
** [[How to Read Technical Papers]]
 +
** Summarize [[Media:TechnologyBitsNYT02162010.pdf| this!]]
 +
** Good background information: [http://en.wikipedia.org/wiki/Message_Passing_Interface#Example_program MPI]
 +
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
* Additional information on Xen can be found in  Mauer, R., [http://www.linuxjournal.com/article/8812 Xen Virtualization and Linux Clustering], [http://www.linuxjournal.com Linux Journal] January 12th, 2006.
 +
* A [http://blip.tv/file/2232410 video] on the Python GIL discovered by Diana.
 +
* A [http://oblong.com/ video] by Oblong Industries (in reference to the NYT article to summarize)
 +
 +
 +
|}
 +
 +
[[File:XgridLogo.png| right | 100px | link=http://rocinante.smith.edu/ganglia/]]
  
 
=XGrid Programming=
 
=XGrid Programming=
 +
 +
{| style="width:100%" border="1"
 +
|- style="background:#ffdead;"
 +
|'''Week''' || '''Topics''' || '''Reading'''
 +
 +
<!-- ================================================================== -->
 +
|-valign="top"
 +
|width="15%"| Week 5 <br /> 
 +
|width="60%"|
 +
* '''Tuesday'''
 +
** [[CSC352 Introduction to the Projects | Introduction to the next projects]]
 +
** [http://data.scl.utah.edu/fmi/xsl/stream/details.xsl?-recid=104&a::v=2212a4Eaya A Video presentation of the XGrid] (watch first 10 minutes)
 +
** [[XGrid Tutorial Part 1: Monte Carlo | Tutorial/Lab #1: Monte Carlo on the XGrid]]
 +
* '''Thursday''':
 +
** Presentation and discussion of [http://www.cs.wisc.edu/multifacet/papers/ieeecomputer08_amdahl_multicore.pdf Amdahl's Law in the Multicore Era], Mark Hill and Michael Marty, IEEE Computer, July 2008, and accompanying [http://www.cs.wisc.edu/multifacet/amdahl/ dynamic graph]. ([[Media:ieeecomputer08_amdahl_multicore.pdf |cached copy]])
 +
** [[XGrid Tutorial Part 1: Monte Carlo | Continuation of the Xgrid Monte-Carlo tutorial]]
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
* Apple's [http://developer.apple.com/mac/library/documentation/MacOSXServer/Conceptual/Xgrid_Programming_Guide/Introduction/Introduction.html XGrid Introduction]
 +
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| Week 6 <br /> <br />
 +
||
 +
* '''Tuesday'''
 +
** Presentation of Projects #2 and #3
 +
*** [[CSC352_Resources#Videos:_Big_Data_and_Analytics | Videos]]
 +
*** Pres by DT
 +
**** [http://jonudell.net/udell/gems/umlaut/umlaut.html John Udell's capture of the page ''Heavy Metal Umlaut'']
 +
**** [http://meta.wikimedia.org/wiki/Research Wikipedia Research Projects]
 +
*** [[CSC352_Project_2 | Intro to Projects #2 and #3]] ([http://cs.smith.edu/~thiebaut/freevideos/BigData2.swf  presentation])
 +
* '''Thursday'''
 +
** Canceled by DT
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
 +
||
 +
* Read  Hughes, B., [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.7248&rep=rep1&type=pdf Building Computational Grids with Apple's XGrid Middleware],  ''ACM International Conference Proceeding Series'', Vol. 167, Hobart, Tasmania, Australia, 2006. For next Tuesday.
 +
<!-- ================================================================== -->
 +
|- style="background:#ffffff" valign="top"
 +
| Week 7 <br />  <br />
 +
||
 +
* '''Tuesday'''
 +
** Presentation and discussion of  [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.7248&rep=rep1&type=pdf Building Computational Grids with Apple's XGrid Middleware], by Hughes, B. (''ACM International Conference Proceeding Series'', Vol. 167, Hobart, Tasmania, Australia, 2006.)  ([[media:buildingComputationalGrids.pdf|cached copy]])
 +
** Define Homework #3
 +
** Guidelines for projects. ([http://cs.smith.edu/~thiebaut/freevideos/WhatsInAProject.swf Presentation]) (We stopped on the bell-curve).
 +
* '''Thursday'''
 +
** Guidelines for projects. ([http://cs.smith.edu/~thiebaut/freevideos/WhatsInAProject.swf Presentation])
 +
** [[XGrid Tutorial Part 2: Processing Wikipedia Pages | XGrid Lab 2]]
 +
** Scheduling
 +
*** Processor Scheduling (OS)
 +
*** Multiprocessor Scheduling
 +
*** XGrid Scheduling
 +
 +
----
 +
* [[CSC352 Homework 3 | Homework #3 ]] and its [[CSC352 Homework 3 Solution | solution programs]].
 +
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
&nbsp;
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| &nbsp; <br />  <br />
 +
||
 +
<center>[[Image:SpringBreak.gif |  150px]]</center>
 +
||
 +
&nbsp;
 +
<!-- ================================================================== -->
 +
|- style="background:#ffffff" valign="top"
 +
| Week 8 <br />  <br />
 +
||
 +
* '''Tuesday'''
 +
** Sign-up for paper presentations [http://cs.smith.edu/classwiki/index.php/CSC352_Sign-Up_Sheet_for_Paper_Presentations here]!
 +
** Class participation on the decomposition of Homework 3/Project 2 (the serial part)
 +
* '''Thursday'''
 +
** Continuation of decomposition of Homework 3/Project 2 (the parallel part)
 +
** [[XGrid Tutorial Part 3: Monte Carlo on the Science Center XGrid | XGrid Lab 3]]: running jobs on the Science Center XGrid.
 +
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
&nbsp;
 +
|}
 +
 +
<!--
 +
        _                      _
 +
  ___| |  ___  _  _  __| |
 +
/ __|  |/ _ \ | |  |  |/ _` |
 +
| (__|  | (_)  | |_|  | (_| |
 +
\___|_|\___/\__,_|\__,_|
 +
                         
 +
-->
 +
 +
[[File:HadoopCartoon.png|right|100px]]
  
 
=Cloud Computing=
 
=Cloud Computing=
 +
{| style="width:100%" border="1"
 +
|- style="background:#ffdead;"
 +
|'''Week''' || '''Topics''' || '''Reading'''
 +
 +
<!-- ================================================================== -->
 +
|-valign="top"
 +
|width="15%"| Week 9 <br /> 
 +
|width="60%"|
 +
* '''Tuesday'''
 +
** Presentation of [http://labs.google.com/papers/mapreduce-osdi04.pdf MapReduce: Simplified Data Processing on Large Clusters] (Yang)
 +
** [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] (We stopped at Section 4).
 +
* '''Thursday'''
 +
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] and [[Hadoop_Tutorial_1_--_Running_WordCount | Tutorial #1]], [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial #2]]
 +
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
* [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ  Hadoop Howtos and FAQs]
 +
||
 +
[[File:HadoopOReilly.jpg | 70px | right]]
 +
* [http://hadoop.apache.org/common/docs/current/mapred_tutorial.html Map-Reduce tutorial] from apache.org: a must-read!
 +
* [http://developer.yahoo.com/hadoop/tutorial/module4.html Map-Reduce Basics] from Yahoo.com: another must-read!
  
=References &amp; Bibliography=
+
* Section 6 in Tom White's ''Hadoop, the Definitive Guide'', available on [http://books.google.com/books?id=bKPEwR-Pt6EC&printsec=frontcover&dq=hadoop+definitive+guide&source=bl&ots=kOdw-xf9Gg&sig=GyHDzyATSbMVcPysVbSAQKuhv58&hl=en&ei=YJ-0S6HSOIS0lQfm3u1q&sa=X&oi=book_result&ct=result&resnum=6&ved=0CB0Q6AEwBQ#v=onepage&q=&f=false Google Books].
 +
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| Week 10 <br /> <br />
 +
||
 +
[[Image:CSC352HadoopPerformanceMachineLearning.png| right|150px]]
 +
* '''Tuesday'''
 +
** Presentation of [http://www.icsi.berkeley.edu/~arlo/publications/gillick_cs262a_proj.pdf MapReduce: Distributed Computing for Machine Learning]
 +
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]]
 +
*** [[Hadoop_Tutorial_1_--_Running_WordCount | Tutorial #1]],
 +
*** [[Hadoop_Tutorial_1_--_Running_WordCount#Analyzing_the_Hadoop_Logs | Tutorial #1: output logs]]
 +
***  [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial #1.1: Task Timelines]]
 +
[[Image:WrongTaskTimeline.png| 200px|right]]
 +
* '''Thursday'''
 +
** Question of the day: What's wrong with this picture?
 +
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]]
 +
** Lab for today:
 +
*** Compare WordCount on 1 vs WordCount on 6 [[Hadoop_Tutorial_1_--_Running_WordCount#Moment_of_Truth:_Compare_5-PC_Hadoop_cluster_to_1_Linux_PC | Section 5 of Tutorial 1]]
 +
*** Create your own version of the Java WordCount program [[Hadoop_Tutorial_1_--_Running_WordCount#Running_Your_Own_Version_of_WordCount.java | Section 4 of Tutorial 1]]
 +
*** Create your own Counters [[Hadoop_Tutorial_1_--_Running_WordCount#Counters | Section 6 of Tutorial 1]]: count Buck!
 +
*** Generate Timelines [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial 1.1]]
 +
*** Counting words in Python [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial 2]]
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
* [[CSC352 Homework 4 | Homework #4]] and [[CSC352 Homework 4 Solution | a solution]].
 +
||
 +
&nbsp;
  
==Parallel Processing/Good background information==
+
<!-- ================================================================== -->
* Asanovic K. ''et al'', [http://www.eecs.berkeley.edu/Pubs/TechRpts/2006/EECS-2006-183.pdf The Landscape of Parallel Computing Research: A View from Berkeley], Dec. 2006. ([[media:LandscapeParallelProcessingBerkeley1206.pdf|cached copy]])
+
|- style="background:#ffffff" valign="top"
* Mauer, R., [http://www.linuxjournal.com/article/8812 Xen Virtualization and Linux Clustering], [http://www.linuxjournal.com Linux Journal] January 12th, 2006
+
| Week 11 <br />  <br />
* Barham P., ''et al.'', [[media:XenAndTheArtOfVirtualization_3.pdf | Xen and the Art of Virtualization]], University of Cambridge Computer Laboratory 15 JJ Thomson Avenue, Cambridge, UK, CB3 0FD
+
||
* AMD News
+
* '''Tuesday'''
** Hardwidge, B., [http://www.bit-tech.net/custompc/news/605374/amd-plans-supercomputer-with-1000-gpus.html AMD plans supercomputer with 1,000 GPUs], Jan. 2009, [http://www.bit-tech.net bit-tech.net] (or graphics goes to the clouds!)
+
** Presentation of [http://portal.acm.org/ft_gateway.cfm?id=1629198&type=pdf&coll=GUIDE&dl=GUIDE&CFID=82739837&CFTOKEN=94683258 MapReduce: A Flexible Data Processing Tool] ([[media:MapReduceFlexibleDataProcessingTool.pdf | cached copy]])
** Halfacree G., [http://www.bit-tech.net/news/hardware/2009/11/17/amd-supercomputer-tops-top500-list/1 AMD supercomputer tops TOP500 list], November 2009, [http://www.bit-tech.net bit-tech.net] (or Intel gets a black eye!)
+
** Compare with  Paulson, Rasin, Abadi, DeWitt, Madden, and Stonebraker's paper [[Media:ComparisonOfApproachesToLargeScaleDataAnalysis.pdf |A Comparison of Approaches to Large Scale Data-Analysis]], SIGMOD-09, June 2009.
 +
* '''Thursday'''
 +
** [http://en.wikipedia.org/wiki/Hypertable Hypertable] is an open-source project parallel to Google's BigTable...
 +
** Art vs Science...
 +
** Some preliminary thinking about the final project...
 +
** [[Hadoop_Tutorial_2.1_--_Streaming_XML_Files | Streaming whole files]]
 +
** [[Hadoop Tutorial 2.2 -- Running C++ Programs on Hadoop | WordCount in C++]]
 +
**  Visualizations of Hadoop Data Transfers, from the U. of Nebraska ([http://www.google.com/search?q=university+of+Nebraska+hadoop+visualization&hl=en&safe=off&tbs=vid:1&tbo=u&ei=oKO4S6GMCoH7lwfq88SXCg&sa=X&oi=video_result_group&ct=title&resnum=1&ved=0CBEQqwQwAA more videos])
 +
<br /><br /><center><videoflash>qoBoEzOkeDQ</videoflash></center><br /><br />
 +
** Monitoring a Cluster of Computers as a school of fish (U. Nebraska)
 +
<br /><br /><center><videoflash>LM1j_8sWSEk</videoflash></center><br /><br />
 +
** The evolution of Hadoop (Code-Swarm)
 +
<br /><br /><center><videoflash type="vimeo">2513321</videoflash></center><br /><br />
  
==Python==
+
----
<bluebox>
+
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
* [http://heather.cs.ucdavis.edu/~matloff/Python/PyThreads.pdf Norman Matloff and Francis Hsu's Tutorial] on Python Threads (University of California, Davis) ([[media:matlof_PythonTutorial.pdf|cached copy]])
+
* A [[CSC352 Project 2 Solution | solution]] for Project 2 has been posted!
* [http://linuxgazette.net/107/pai.html Understanding Threading in Python], Krishna G Pai, Linux Gazette, Oct. 2004
+
||
* [http://www.python.org/doc/2.3.5/lib/thread-objects.html Thread Objects] from [http://www.python.org Python.Org]
+
* [http://developer.yahoo.com/hadoop/tutorial/index.html Hadoop Tutorial from Yahoo Developer Network (YDN)]
</bluebox>
+
&nbsp;
  
==XGrid==
+
<!-- ================================================================== -->
 +
|- style="background:#eeeeff" valign="top"
 +
| Week 12 <br />  <br />
 +
||
 +
* '''Tuesday'''
 +
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part I
 +
** [[Hadoop_Tutorial_3_--_Hadoop_on_Amazon_AWS | Signing on to Amazon AWS]]
 +
** [[Hadoop_Tutorial_3.1_--_Using_Amazon%27s_WordCount_program | Uploading data to AWS and counting words]]
 +
** [[Hadoop_Tutorial_3.2_--_Using_Your_Own_WordCount_program | Word-counting using Streaming Python on AWS]]
 +
** [[Hadoop_Tutorial_3.3_--_How_Much%3F | Costs of maintaining a Hadoop cluster on AWS]]
 +
* '''Thursday'''
 +
** Continuation of the AWS labs
 +
** [[Hadoop_Tutorial_4:_Start_an_EC2_Instance | Starting an EC2 instance on AWS]]
 +
----
 +
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
||
 +
&nbsp;
 +
<!-- ================================================================== -->
 +
|- style="background:#ffffff" valign="top"
 +
| Week 13 <br />  <br />
 +
||
 +
* '''Tuesday'''
 +
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part II
 +
** [[CSC352 Problem of the Day| Problem of the day]]: discussion
 +
** Work on projects
  
==Cloud Computing==
+
* '''Thursday'''
<tanbox>
+
** Presentation of [http://www.hulu.com/watch/116372/cnbc-originals-inside-the-mind-of-google Inside the Mind of Google]
__NOTOC__
+
** wrap up
===Literature===
+
----
* [[Image:hadoopOReilly.jpg | right |100px]] [http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/0596521979  Hadoop, the definitive guide], Tim White, O'Reilly Media, June 2009, ISBN 0596521979.  The Web site for the book is http://www.hadoopbook.com/ (with the data used as examples in the book)
+
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
* Dean, J., and S. Ghemawat, [http://labs.google.com/papers/mapreduce-osdi04.pdf MapReduce: Simplified Data Processing on Large Clusters], Dec. 2004,  ([[media:MapReduce1204.pdf|cached copy]])
+
||
*  Czajkowski G., [http://googleblog.blogspot.com/2008/11/sorting-1pb-with-mapreduce.html  Sorting 1 PB with MapReduce], Nov. 2008, ([[media:Sorting1PBWithMapReduce.pdf|cached copy]])
+
&nbsp;
* Armbrust M, ''et al'', [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Tech Rep. CB/EECS-2009-28, Feb. 2009 ([[media:AboveTheCloudsBerkeley.pdf|cached copy]])
 
  
===Class Material===
+
|}
* [http://code.google.com/edu/submissions/uwspr2007_clustercourse/listing.html University of Washington: Problem Solving on Large Scale Clusters]:
 
: The University of Washington ran an upper-division course on Distributed Computing with MapReduce in Spring 2007. Below you'll find the materials that were used for the class: five lectures in powerpoint format, as well as four lab exercises designed which were completed by students over the duration of the course, using a cluster running Hadoop.
 
  
===Software/Web Links===
+
=Selected Solutions for papers, homework, or projects=
*[http://www.hadoopbook.com/ The HadoopBook] Web site.
 
*[http://wiki.apache.org/hadoop/FrontPage The Hadoop Wiki], the authoritative source on working with Hadoop
 
*[http://code.google.com/edu/parallel/tools/hadoopvm/index.html  Hadoop at Google]:
 
: Setting up a Hadoop cluster can be an all day job. However, if you want to experiment with the platform right now, [Google] has created a virtual machine image with a preconfigured single node instance of Hadoop
 
*[http://code.google.com/edu/parallel/tools/hadoopvm/index.html Guide for setting up IBM's Eclipse Tools for Hadoop] (go to bottom of page)
 
:The IBM MapReduce Tools for Eclipse Plug-in is a robust plug-in that brings Hadoop support to the Eclipse platform. Features include server configuration, support for launching MapReduce jobs and browsing the distributed file system. This setup assumes that you are running Eclipse (version 3.3 or above) on your computer.
 
*[http://www.cloudera.com/blog/2009/04/20/configuring-eclipse-for-hadoop-development-a-screencast/ Configuring Eclipse for Hadoop]
 
:A video from Cloudera on setting up Hadoop... not easy to follow...
 
  
===Videos===
+
* [[Media:Amdahls_Law_in_the_Multicore_Era.pdf | Amdahl's Law in the Multicore Era]] summary
* [http://jez.blip.tv/file/245701/ A video of Tom White], author of O'Reilly's Hadoop guide, on BlipTV. Tom outlines the suite of projects centered around Hadoop ( an open source Map / Reduce project)
+
* [[CSC352 Project1 Solution | Selected Solutions for Project 1]]
  
</tanbox>
+
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
[[Category:CSC352]][[Category:Class]][[Category:Schedule]]

Latest revision as of 11:22, 9 August 2013



Main Page | Syllabus | Schedule | Links & Resources



Hadoop-Related

Projects

  • Project 4 has officially started on 1/26/10! and is now Over. Find its shared wiki page here! Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
SmilingPython.png

Python Threads

Week Topics Reading
Week 1
1/25
  • Tuesday
    • Introduce Syllabus
    • Interrupts
Pogoplug.jpg
  • Thursday
    • The PogoPlug...
    • The iPad...
    • Python Review
      • NQueens.py: a python program to find the first solution for N queens on an NxN board.

Read
  • What is a Thread? [1]
  • What is a Processes? [2]
Reference material for when we start programming
For Discussion next Tuesday
  • read the paper by Asanovic K. et al, The Landscape of Parallel Computing Research: A View from Berkeley.
Week 2
2/1

 

Week 3
2/8

Week 4
2/15


XgridLogo.png

XGrid Programming

Week Topics Reading
Week 5

Week 6


Week 7


 

 

SpringBreak.gif

 

Week 8

  • Tuesday
    • Sign-up for paper presentations here!
    • Class participation on the decomposition of Homework 3/Project 2 (the serial part)
  • Thursday
    • Continuation of decomposition of Homework 3/Project 2 (the parallel part)
    • XGrid Lab 3: running jobs on the Science Center XGrid.

 


HadoopCartoon.png

Cloud Computing

Week Topics Reading
Week 9

HadoopOReilly.jpg
  • Section 6 in Tom White's Hadoop, the Definitive Guide, available on Google Books.
Week 10

CSC352HadoopPerformanceMachineLearning.png
WrongTaskTimeline.png

 

Week 11





    • Monitoring a Cluster of Computers as a school of fish (U. Nebraska)




    • The evolution of Hadoop (Code-Swarm)





 

Week 12


 

Week 13


 

Selected Solutions for papers, homework, or projects