Difference between revisions of "CSC352 Class Page 2010"

From dftwiki3
Jump to: navigation, search
(Cloud Computing)
m (Thiebaut moved page CSC352 Class Page to CSC352 Class Page 2010)
 
(49 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
<br />
 
<br />
 
<br />
 
<br />
 +
 +
=Hadoop-Related=
 +
 +
* [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ Hadoop FAQ Page]
 +
* [http://maven.smith.edu/~thiebaut/showhadoopip.php Hadoop IPs]
  
 
=Projects=
 
=Projects=
  
 
* [[CSC352 Project 1 | Project 1]]: Started Feb 16, due March 2nd. A [[CSC352 Project1 Solution|good example of a solution]].
 
* [[CSC352 Project 1 | Project 1]]: Started Feb 16, due March 2nd. A [[CSC352 Project1 Solution|good example of a solution]].
* [[CSC352 Project 2 | Project 2]]: Deals with the Xgrid
+
* [[CSC352 Project 2 | Project 2]]: Deals with the Xgrid.  A [[CSC352 Project 2 Solution | collection of good proposed solutions]].
* Project 3: Will deal with Hadoop
+
* [[CSC352 Project 3 | Project 3]] Deals with processing Wikipedia pages on Hadoop/MapReduce.
 +
 
 
* '''Project 4 has officially started on 1/26/10! and is now Over.'''  Find its shared wiki page [http://cs.smith.edu/classwiki/index.php/CSC352_Page#Projects here]!  Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
 
* '''Project 4 has officially started on 1/26/10! and is now Over.'''  Find its shared wiki page [http://cs.smith.edu/classwiki/index.php/CSC352_Page#Projects here]!  Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
  
Line 204: Line 210:
  
 
----
 
----
* [[CSC352 Homework 3 | Homework #3 ]]
+
* [[CSC352 Homework 3 | Homework #3 ]] and its [[CSC352 Homework 3 Solution | solution programs]].
  
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
Line 232: Line 238:
 
&nbsp;  
 
&nbsp;  
 
|}
 
|}
 +
 +
<!--
 +
        _                      _
 +
  ___| |  ___  _  _  __| |
 +
/ __|  |/ _ \ | |  |  |/ _` |
 +
| (__|  | (_)  | |_|  | (_| |
 +
\___|_|\___/\__,_|\__,_|
 +
                         
 +
-->
 +
 +
[[File:HadoopCartoon.png|right|100px]]
  
 
=Cloud Computing=
 
=Cloud Computing=
Line 242: Line 259:
 
|width="15%"| Week 9 <br />   
 
|width="15%"| Week 9 <br />   
 
|width="60%"|
 
|width="60%"|
* '''Tuesday'''
+
* '''Tuesday'''  
 +
** Presentation of [http://labs.google.com/papers/mapreduce-osdi04.pdf MapReduce: Simplified Data Processing on Large Clusters] (Yang)
 +
** [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] (We stopped at Section 4).
 
* '''Thursday'''
 
* '''Thursday'''
 +
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] and [[Hadoop_Tutorial_1_--_Running_WordCount | Tutorial #1]], [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial #2]]
 +
 
----
 
----
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 +
* [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ  Hadoop Howtos and FAQs]
 
||
 
||
 +
[[File:HadoopOReilly.jpg | 70px | right]]
 
* [http://hadoop.apache.org/common/docs/current/mapred_tutorial.html Map-Reduce tutorial] from apache.org: a must-read!
 
* [http://hadoop.apache.org/common/docs/current/mapred_tutorial.html Map-Reduce tutorial] from apache.org: a must-read!
 +
* [http://developer.yahoo.com/hadoop/tutorial/module4.html Map-Reduce Basics] from Yahoo.com: another must-read!
  
 +
* Section 6 in Tom White's ''Hadoop, the Definitive Guide'', available on [http://books.google.com/books?id=bKPEwR-Pt6EC&printsec=frontcover&dq=hadoop+definitive+guide&source=bl&ots=kOdw-xf9Gg&sig=GyHDzyATSbMVcPysVbSAQKuhv58&hl=en&ei=YJ-0S6HSOIS0lQfm3u1q&sa=X&oi=book_result&ct=result&resnum=6&ved=0CB0Q6AEwBQ#v=onepage&q=&f=false Google Books].
 
<!-- ================================================================== -->
 
<!-- ================================================================== -->
 
|- style="background:#eeeeff" valign="top"
 
|- style="background:#eeeeff" valign="top"
 
| Week 10 <br /> <br />
 
| Week 10 <br /> <br />
 
||
 
||
 +
[[Image:CSC352HadoopPerformanceMachineLearning.png| right|150px]]
 
* '''Tuesday'''
 
* '''Tuesday'''
 +
** Presentation of [http://www.icsi.berkeley.edu/~arlo/publications/gillick_cs262a_proj.pdf MapReduce: Distributed Computing for Machine Learning]
 +
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]]
 +
*** [[Hadoop_Tutorial_1_--_Running_WordCount | Tutorial #1]],
 +
*** [[Hadoop_Tutorial_1_--_Running_WordCount#Analyzing_the_Hadoop_Logs | Tutorial #1: output logs]]
 +
***  [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial #1.1: Task Timelines]]
 +
[[Image:WrongTaskTimeline.png| 200px|right]]
 
* '''Thursday'''
 
* '''Thursday'''
 +
** Question of the day: What's wrong with this picture?
 +
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]]
 +
** Lab for today:
 +
*** Compare WordCount on 1 vs WordCount on 6 [[Hadoop_Tutorial_1_--_Running_WordCount#Moment_of_Truth:_Compare_5-PC_Hadoop_cluster_to_1_Linux_PC | Section 5 of Tutorial 1]]
 +
*** Create your own version of the Java WordCount program [[Hadoop_Tutorial_1_--_Running_WordCount#Running_Your_Own_Version_of_WordCount.java | Section 4 of Tutorial 1]]
 +
*** Create your own Counters [[Hadoop_Tutorial_1_--_Running_WordCount#Counters | Section 6 of Tutorial 1]]: count Buck!
 +
*** Generate Timelines [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial 1.1]]
 +
*** Counting words in Python [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial 2]]
 
----
 
----
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
+
* [[CSC352 Homework 4 | Homework #4]] and [[CSC352 Homework 4 Solution | a solution]].
 
||
 
||
 
&nbsp;
 
&nbsp;
Line 266: Line 306:
 
||
 
||
 
* '''Tuesday'''
 
* '''Tuesday'''
 +
** Presentation of [http://portal.acm.org/ft_gateway.cfm?id=1629198&type=pdf&coll=GUIDE&dl=GUIDE&CFID=82739837&CFTOKEN=94683258 MapReduce: A Flexible Data Processing Tool] ([[media:MapReduceFlexibleDataProcessingTool.pdf | cached copy]])
 +
** Compare with  Paulson, Rasin, Abadi, DeWitt, Madden, and Stonebraker's paper [[Media:ComparisonOfApproachesToLargeScaleDataAnalysis.pdf |A Comparison of Approaches to Large Scale Data-Analysis]], SIGMOD-09, June 2009.
 
* '''Thursday'''
 
* '''Thursday'''
 +
** [http://en.wikipedia.org/wiki/Hypertable Hypertable] is an open-source project parallel to Google's BigTable...
 +
** Art vs Science...
 +
** Some preliminary thinking about the final project...
 +
** [[Hadoop_Tutorial_2.1_--_Streaming_XML_Files | Streaming whole files]]
 +
** [[Hadoop Tutorial 2.2 -- Running C++ Programs on Hadoop | WordCount in C++]]
 +
**  Visualizations of Hadoop Data Transfers, from the U. of Nebraska ([http://www.google.com/search?q=university+of+Nebraska+hadoop+visualization&hl=en&safe=off&tbs=vid:1&tbo=u&ei=oKO4S6GMCoH7lwfq88SXCg&sa=X&oi=video_result_group&ct=title&resnum=1&ved=0CBEQqwQwAA more videos])
 +
<br /><br /><center><videoflash>qoBoEzOkeDQ</videoflash></center><br /><br />
 +
** Monitoring a Cluster of Computers as a school of fish (U. Nebraska)
 +
<br /><br /><center><videoflash>LM1j_8sWSEk</videoflash></center><br /><br />
 +
** The evolution of Hadoop (Code-Swarm)
 +
<br /><br /><center><videoflash type="vimeo">2513321</videoflash></center><br /><br />
 +
 
----
 
----
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
+
* A [[CSC352 Project 2 Solution | solution]] for Project 2 has been posted!
 
||
 
||
 +
* [http://developer.yahoo.com/hadoop/tutorial/index.html Hadoop Tutorial from Yahoo Developer Network (YDN)]
 
&nbsp;  
 
&nbsp;  
 +
 
<!-- ================================================================== -->
 
<!-- ================================================================== -->
 
|- style="background:#eeeeff" valign="top"
 
|- style="background:#eeeeff" valign="top"
Line 277: Line 333:
 
||
 
||
 
* '''Tuesday'''
 
* '''Tuesday'''
 +
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part I
 +
** [[Hadoop_Tutorial_3_--_Hadoop_on_Amazon_AWS | Signing on to Amazon AWS]]
 +
** [[Hadoop_Tutorial_3.1_--_Using_Amazon%27s_WordCount_program | Uploading data to AWS and counting words]]
 +
** [[Hadoop_Tutorial_3.2_--_Using_Your_Own_WordCount_program | Word-counting using Streaming Python on AWS]]
 +
** [[Hadoop_Tutorial_3.3_--_How_Much%3F | Costs of maintaining a Hadoop cluster on AWS]]
 
* '''Thursday'''
 
* '''Thursday'''
 +
** Continuation of the AWS labs
 +
** [[Hadoop_Tutorial_4:_Start_an_EC2_Instance | Starting an EC2 instance on AWS]]
 
----
 
----
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
Line 287: Line 350:
 
||
 
||
 
* '''Tuesday'''
 
* '''Tuesday'''
 +
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part II
 +
** [[CSC352 Problem of the Day| Problem of the day]]: discussion
 +
** Work on projects
 +
 
* '''Thursday'''
 
* '''Thursday'''
 +
** Presentation of [http://www.hulu.com/watch/116372/cnbc-originals-inside-the-mind-of-google Inside the Mind of Google]
 +
** wrap up
 
----
 
----
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]

Latest revision as of 10:22, 9 August 2013



Main Page | Syllabus | Schedule | Links & Resources



Hadoop-Related

Projects

  • Project 4 has officially started on 1/26/10! and is now Over. Find its shared wiki page here! Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
SmilingPython.png

Python Threads

Week Topics Reading
Week 1
1/25
  • Tuesday
    • Introduce Syllabus
    • Interrupts
Pogoplug.jpg
  • Thursday
    • The PogoPlug...
    • The iPad...
    • Python Review
      • NQueens.py: a python program to find the first solution for N queens on an NxN board.

Read
  • What is a Thread? [1]
  • What is a Processes? [2]
Reference material for when we start programming
For Discussion next Tuesday
  • read the paper by Asanovic K. et al, The Landscape of Parallel Computing Research: A View from Berkeley.
Week 2
2/1

 

Week 3
2/8

Week 4
2/15


XgridLogo.png

XGrid Programming

Week Topics Reading
Week 5

Week 6


Week 7


 

 

SpringBreak.gif

 

Week 8

  • Tuesday
    • Sign-up for paper presentations here!
    • Class participation on the decomposition of Homework 3/Project 2 (the serial part)
  • Thursday
    • Continuation of decomposition of Homework 3/Project 2 (the parallel part)
    • XGrid Lab 3: running jobs on the Science Center XGrid.

 


HadoopCartoon.png

Cloud Computing

Week Topics Reading
Week 9

HadoopOReilly.jpg
  • Section 6 in Tom White's Hadoop, the Definitive Guide, available on Google Books.
Week 10

CSC352HadoopPerformanceMachineLearning.png
WrongTaskTimeline.png

 

Week 11





    • Monitoring a Cluster of Computers as a school of fish (U. Nebraska)




    • The evolution of Hadoop (Code-Swarm)





 

Week 12


 

Week 13


 

Selected Solutions for papers, homework, or projects