Difference between revisions of "CSC352 Class Page 2010"

From dftwiki3
Jump to: navigation, search
(Cloud Computing)
m (Thiebaut moved page CSC352 Class Page to CSC352 Class Page 2010)
 
(22 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
<br />
 
<br />
 
<br />
 
<br />
 +
 +
=Hadoop-Related=
 +
 +
* [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ Hadoop FAQ Page]
 +
* [http://maven.smith.edu/~thiebaut/showhadoopip.php Hadoop IPs]
  
 
=Projects=
 
=Projects=
  
 
* [[CSC352 Project 1 | Project 1]]: Started Feb 16, due March 2nd. A [[CSC352 Project1 Solution|good example of a solution]].
 
* [[CSC352 Project 1 | Project 1]]: Started Feb 16, due March 2nd. A [[CSC352 Project1 Solution|good example of a solution]].
* [[CSC352 Project 2 | Project 2]]: Deals with the Xgrid
+
* [[CSC352 Project 2 | Project 2]]: Deals with the Xgrid.  A [[CSC352 Project 2 Solution | collection of good proposed solutions]].
* Project 3: Will deal with Hadoop
+
* [[CSC352 Project 3 | Project 3]] Deals with processing Wikipedia pages on Hadoop/MapReduce.
 +
 
 
* '''Project 4 has officially started on 1/26/10! and is now Over.'''  Find its shared wiki page [http://cs.smith.edu/classwiki/index.php/CSC352_Page#Projects here]!  Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
 
* '''Project 4 has officially started on 1/26/10! and is now Over.'''  Find its shared wiki page [http://cs.smith.edu/classwiki/index.php/CSC352_Page#Projects here]!  Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
  
Line 272: Line 278:
 
| Week 10 <br /> <br />
 
| Week 10 <br /> <br />
 
||
 
||
 +
[[Image:CSC352HadoopPerformanceMachineLearning.png| right|150px]]
 
* '''Tuesday'''
 
* '''Tuesday'''
 
** Presentation of [http://www.icsi.berkeley.edu/~arlo/publications/gillick_cs262a_proj.pdf MapReduce: Distributed Computing for Machine Learning]
 
** Presentation of [http://www.icsi.berkeley.edu/~arlo/publications/gillick_cs262a_proj.pdf MapReduce: Distributed Computing for Machine Learning]
Line 278: Line 285:
 
*** [[Hadoop_Tutorial_1_--_Running_WordCount#Analyzing_the_Hadoop_Logs | Tutorial #1: output logs]]
 
*** [[Hadoop_Tutorial_1_--_Running_WordCount#Analyzing_the_Hadoop_Logs | Tutorial #1: output logs]]
 
***  [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial #1.1: Task Timelines]]  
 
***  [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial #1.1: Task Timelines]]  
*** [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial #2: Running Python on Hadoop]]
 
 
[[Image:WrongTaskTimeline.png| 200px|right]]
 
[[Image:WrongTaskTimeline.png| 200px|right]]
 
* '''Thursday'''
 
* '''Thursday'''
** Art vs Science...
 
 
** Question of the day: What's wrong with this picture?  
 
** Question of the day: What's wrong with this picture?  
 
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]]  
 
** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]]  
Line 292: Line 297:
 
----
 
----
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
+
* [[CSC352 Homework 4 | Homework #4]] and [[CSC352 Homework 4 Solution | a solution]].
 
||
 
||
 
&nbsp;
 
&nbsp;
Line 302: Line 307:
 
* '''Tuesday'''
 
* '''Tuesday'''
 
** Presentation of [http://portal.acm.org/ft_gateway.cfm?id=1629198&type=pdf&coll=GUIDE&dl=GUIDE&CFID=82739837&CFTOKEN=94683258 MapReduce: A Flexible Data Processing Tool] ([[media:MapReduceFlexibleDataProcessingTool.pdf | cached copy]])
 
** Presentation of [http://portal.acm.org/ft_gateway.cfm?id=1629198&type=pdf&coll=GUIDE&dl=GUIDE&CFID=82739837&CFTOKEN=94683258 MapReduce: A Flexible Data Processing Tool] ([[media:MapReduceFlexibleDataProcessingTool.pdf | cached copy]])
 +
** Compare with  Paulson, Rasin, Abadi, DeWitt, Madden, and Stonebraker's paper [[Media:ComparisonOfApproachesToLargeScaleDataAnalysis.pdf |A Comparison of Approaches to Large Scale Data-Analysis]], SIGMOD-09, June 2009.
 
* '''Thursday'''
 
* '''Thursday'''
 +
** [http://en.wikipedia.org/wiki/Hypertable Hypertable] is an open-source project parallel to Google's BigTable...
 +
** Art vs Science...
 +
** Some preliminary thinking about the final project...
 +
** [[Hadoop_Tutorial_2.1_--_Streaming_XML_Files | Streaming whole files]]
 +
** [[Hadoop Tutorial 2.2 -- Running C++ Programs on Hadoop | WordCount in C++]]
 +
**  Visualizations of Hadoop Data Transfers, from the U. of Nebraska ([http://www.google.com/search?q=university+of+Nebraska+hadoop+visualization&hl=en&safe=off&tbs=vid:1&tbo=u&ei=oKO4S6GMCoH7lwfq88SXCg&sa=X&oi=video_result_group&ct=title&resnum=1&ved=0CBEQqwQwAA more videos])
 +
<br /><br /><center><videoflash>qoBoEzOkeDQ</videoflash></center><br /><br />
 +
** Monitoring a Cluster of Computers as a school of fish (U. Nebraska)
 +
<br /><br /><center><videoflash>LM1j_8sWSEk</videoflash></center><br /><br />
 +
** The evolution of Hadoop (Code-Swarm)
 +
<br /><br /><center><videoflash type="vimeo">2513321</videoflash></center><br /><br />
 +
 
----
 
----
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
+
* A [[CSC352 Project 2 Solution | solution]] for Project 2 has been posted!
 
||
 
||
 +
* [http://developer.yahoo.com/hadoop/tutorial/index.html Hadoop Tutorial from Yahoo Developer Network (YDN)]
 
&nbsp;  
 
&nbsp;  
 +
 
<!-- ================================================================== -->
 
<!-- ================================================================== -->
 
|- style="background:#eeeeff" valign="top"
 
|- style="background:#eeeeff" valign="top"
Line 314: Line 334:
 
* '''Tuesday'''
 
* '''Tuesday'''
 
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part I
 
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part I
 +
** [[Hadoop_Tutorial_3_--_Hadoop_on_Amazon_AWS | Signing on to Amazon AWS]]
 +
** [[Hadoop_Tutorial_3.1_--_Using_Amazon%27s_WordCount_program | Uploading data to AWS and counting words]]
 +
** [[Hadoop_Tutorial_3.2_--_Using_Your_Own_WordCount_program | Word-counting using Streaming Python on AWS]]
 +
** [[Hadoop_Tutorial_3.3_--_How_Much%3F | Costs of maintaining a Hadoop cluster on AWS]]
 
* '''Thursday'''
 
* '''Thursday'''
 +
** Continuation of the AWS labs
 +
** [[Hadoop_Tutorial_4:_Start_an_EC2_Instance | Starting an EC2 instance on AWS]]
 
----
 
----
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
Line 325: Line 351:
 
* '''Tuesday'''
 
* '''Tuesday'''
 
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part II
 
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part II
 +
** [[CSC352 Problem of the Day| Problem of the day]]: discussion
 +
** Work on projects
  
 
* '''Thursday'''
 
* '''Thursday'''
 
** Presentation of [http://www.hulu.com/watch/116372/cnbc-originals-inside-the-mind-of-google Inside the Mind of Google]
 
** Presentation of [http://www.hulu.com/watch/116372/cnbc-originals-inside-the-mind-of-google Inside the Mind of Google]
 +
** wrap up
 
----
 
----
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]
 
*  [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes]

Latest revision as of 11:22, 9 August 2013



Main Page | Syllabus | Schedule | Links & Resources



Hadoop-Related

Projects

  • Project 4 has officially started on 1/26/10! and is now Over. Find its shared wiki page here! Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
SmilingPython.png

Python Threads

Week Topics Reading
Week 1
1/25
  • Tuesday
    • Introduce Syllabus
    • Interrupts
Pogoplug.jpg
  • Thursday
    • The PogoPlug...
    • The iPad...
    • Python Review
      • NQueens.py: a python program to find the first solution for N queens on an NxN board.

Read
  • What is a Thread? [1]
  • What is a Processes? [2]
Reference material for when we start programming
For Discussion next Tuesday
  • read the paper by Asanovic K. et al, The Landscape of Parallel Computing Research: A View from Berkeley.
Week 2
2/1

 

Week 3
2/8

Week 4
2/15


XgridLogo.png

XGrid Programming

Week Topics Reading
Week 5

Week 6


Week 7


 

 

SpringBreak.gif

 

Week 8

  • Tuesday
    • Sign-up for paper presentations here!
    • Class participation on the decomposition of Homework 3/Project 2 (the serial part)
  • Thursday
    • Continuation of decomposition of Homework 3/Project 2 (the parallel part)
    • XGrid Lab 3: running jobs on the Science Center XGrid.

 


HadoopCartoon.png

Cloud Computing

Week Topics Reading
Week 9

HadoopOReilly.jpg
  • Section 6 in Tom White's Hadoop, the Definitive Guide, available on Google Books.
Week 10

CSC352HadoopPerformanceMachineLearning.png
WrongTaskTimeline.png

 

Week 11





    • Monitoring a Cluster of Computers as a school of fish (U. Nebraska)




    • The evolution of Hadoop (Code-Swarm)





 

Week 12


 

Week 13


 

Selected Solutions for papers, homework, or projects