Difference between revisions of "CSC352 Class Page 2010"
(→XGrid Programming) |
m (Thiebaut moved page CSC352 Class Page to CSC352 Class Page 2010) |
||
(77 intermediate revisions by the same user not shown) | |||
Line 7: | Line 7: | ||
<br /> | <br /> | ||
<br /> | <br /> | ||
+ | |||
+ | =Hadoop-Related= | ||
+ | |||
+ | * [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ Hadoop FAQ Page] | ||
+ | * [http://maven.smith.edu/~thiebaut/showhadoopip.php Hadoop IPs] | ||
=Projects= | =Projects= | ||
− | * [[CSC352 Project 1 | Project 1]]: Started Feb 16, due March 2nd. | + | * [[CSC352 Project 1 | Project 1]]: Started Feb 16, due March 2nd. A [[CSC352 Project1 Solution|good example of a solution]]. |
− | * [[CSC352 Project 2 | Project 2]]: | + | * [[CSC352 Project 2 | Project 2]]: Deals with the Xgrid. A [[CSC352 Project 2 Solution | collection of good proposed solutions]]. |
− | * Project 3 | + | * [[CSC352 Project 3 | Project 3]] Deals with processing Wikipedia pages on Hadoop/MapReduce. |
− | * '''Project 4 has officially started on 1/26/10!''' Find its shared wiki page [http://cs.smith.edu/classwiki/index.php/CSC352_Page#Projects here]! | + | |
+ | * '''Project 4 has officially started on 1/26/10! and is now Over.''' Find its shared wiki page [http://cs.smith.edu/classwiki/index.php/CSC352_Page#Projects here]! Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster! | ||
[[File:SmilingPython.png | 100px | right]] | [[File:SmilingPython.png | 100px | right]] | ||
Line 179: | Line 185: | ||
**** [http://jonudell.net/udell/gems/umlaut/umlaut.html John Udell's capture of the page ''Heavy Metal Umlaut''] | **** [http://jonudell.net/udell/gems/umlaut/umlaut.html John Udell's capture of the page ''Heavy Metal Umlaut''] | ||
**** [http://meta.wikimedia.org/wiki/Research Wikipedia Research Projects] | **** [http://meta.wikimedia.org/wiki/Research Wikipedia Research Projects] | ||
− | *** [[CSC352_Project_2 | Intro to | + | *** [[CSC352_Project_2 | Intro to Projects #2 and #3]] ([http://cs.smith.edu/~thiebaut/freevideos/BigData2.swf presentation]) |
− | |||
− | |||
− | |||
− | |||
− | |||
* '''Thursday''' | * '''Thursday''' | ||
+ | ** Canceled by DT | ||
---- | ---- | ||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
Line 197: | Line 199: | ||
* '''Tuesday''' | * '''Tuesday''' | ||
** Presentation and discussion of [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.7248&rep=rep1&type=pdf Building Computational Grids with Apple's XGrid Middleware], by Hughes, B. (''ACM International Conference Proceeding Series'', Vol. 167, Hobart, Tasmania, Australia, 2006.) ([[media:buildingComputationalGrids.pdf|cached copy]]) | ** Presentation and discussion of [http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.60.7248&rep=rep1&type=pdf Building Computational Grids with Apple's XGrid Middleware], by Hughes, B. (''ACM International Conference Proceeding Series'', Vol. 167, Hobart, Tasmania, Australia, 2006.) ([[media:buildingComputationalGrids.pdf|cached copy]]) | ||
+ | ** Define Homework #3 | ||
+ | ** Guidelines for projects. ([http://cs.smith.edu/~thiebaut/freevideos/WhatsInAProject.swf Presentation]) (We stopped on the bell-curve). | ||
* '''Thursday''' | * '''Thursday''' | ||
+ | ** Guidelines for projects. ([http://cs.smith.edu/~thiebaut/freevideos/WhatsInAProject.swf Presentation]) | ||
+ | ** [[XGrid Tutorial Part 2: Processing Wikipedia Pages | XGrid Lab 2]] | ||
+ | ** Scheduling | ||
+ | *** Processor Scheduling (OS) | ||
+ | *** Multiprocessor Scheduling | ||
+ | *** XGrid Scheduling | ||
+ | |||
---- | ---- | ||
+ | * [[CSC352 Homework 3 | Homework #3 ]] and its [[CSC352 Homework 3 Solution | solution programs]]. | ||
+ | |||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
|| | || | ||
− | | + | |
<!-- ================================================================== --> | <!-- ================================================================== --> | ||
|- style="background:#eeeeff" valign="top" | |- style="background:#eeeeff" valign="top" | ||
+ | | <br /> <br /> | ||
+ | || | ||
+ | <center>[[Image:SpringBreak.gif | 150px]]</center> | ||
+ | || | ||
+ | | ||
+ | <!-- ================================================================== --> | ||
+ | |- style="background:#ffffff" valign="top" | ||
| Week 8 <br /> <br /> | | Week 8 <br /> <br /> | ||
|| | || | ||
* '''Tuesday''' | * '''Tuesday''' | ||
+ | ** Sign-up for paper presentations [http://cs.smith.edu/classwiki/index.php/CSC352_Sign-Up_Sheet_for_Paper_Presentations here]! | ||
+ | ** Class participation on the decomposition of Homework 3/Project 2 (the serial part) | ||
* '''Thursday''' | * '''Thursday''' | ||
+ | ** Continuation of decomposition of Homework 3/Project 2 (the parallel part) | ||
+ | ** [[XGrid Tutorial Part 3: Monte Carlo on the Science Center XGrid | XGrid Lab 3]]: running jobs on the Science Center XGrid. | ||
+ | |||
---- | ---- | ||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
Line 213: | Line 238: | ||
| | ||
|} | |} | ||
+ | |||
+ | <!-- | ||
+ | _ _ | ||
+ | ___| | ___ _ _ __| | | ||
+ | / __| |/ _ \ | | | |/ _` | | ||
+ | | (__| | (_) | |_| | (_| | | ||
+ | \___|_|\___/\__,_|\__,_| | ||
+ | |||
+ | --> | ||
+ | |||
+ | [[File:HadoopCartoon.png|right|100px]] | ||
=Cloud Computing= | =Cloud Computing= | ||
Line 221: | Line 257: | ||
<!-- ================================================================== --> | <!-- ================================================================== --> | ||
|-valign="top" | |-valign="top" | ||
− | |width="15%"| Week | + | |width="15%"| Week 9 <br /> |
|width="60%"| | |width="60%"| | ||
− | * '''Tuesday''' | + | * '''Tuesday''' |
+ | ** Presentation of [http://labs.google.com/papers/mapreduce-osdi04.pdf MapReduce: Simplified Data Processing on Large Clusters] (Yang) | ||
+ | ** [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] (We stopped at Section 4). | ||
* '''Thursday''' | * '''Thursday''' | ||
+ | ** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] and [[Hadoop_Tutorial_1_--_Running_WordCount | Tutorial #1]], [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial #2]] | ||
+ | |||
---- | ---- | ||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
+ | * [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ Hadoop Howtos and FAQs] | ||
|| | || | ||
− | & | + | [[File:HadoopOReilly.jpg | 70px | right]] |
+ | * [http://hadoop.apache.org/common/docs/current/mapred_tutorial.html Map-Reduce tutorial] from apache.org: a must-read! | ||
+ | * [http://developer.yahoo.com/hadoop/tutorial/module4.html Map-Reduce Basics] from Yahoo.com: another must-read! | ||
+ | |||
+ | * Section 6 in Tom White's ''Hadoop, the Definitive Guide'', available on [http://books.google.com/books?id=bKPEwR-Pt6EC&printsec=frontcover&dq=hadoop+definitive+guide&source=bl&ots=kOdw-xf9Gg&sig=GyHDzyATSbMVcPysVbSAQKuhv58&hl=en&ei=YJ-0S6HSOIS0lQfm3u1q&sa=X&oi=book_result&ct=result&resnum=6&ved=0CB0Q6AEwBQ#v=onepage&q=&f=false Google Books]. | ||
<!-- ================================================================== --> | <!-- ================================================================== --> | ||
|- style="background:#eeeeff" valign="top" | |- style="background:#eeeeff" valign="top" | ||
− | | Week | + | | Week 10 <br /> <br /> |
|| | || | ||
+ | [[Image:CSC352HadoopPerformanceMachineLearning.png| right|150px]] | ||
* '''Tuesday''' | * '''Tuesday''' | ||
+ | ** Presentation of [http://www.icsi.berkeley.edu/~arlo/publications/gillick_cs262a_proj.pdf MapReduce: Distributed Computing for Machine Learning] | ||
+ | ** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] | ||
+ | *** [[Hadoop_Tutorial_1_--_Running_WordCount | Tutorial #1]], | ||
+ | *** [[Hadoop_Tutorial_1_--_Running_WordCount#Analyzing_the_Hadoop_Logs | Tutorial #1: output logs]] | ||
+ | *** [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial #1.1: Task Timelines]] | ||
+ | [[Image:WrongTaskTimeline.png| 200px|right]] | ||
* '''Thursday''' | * '''Thursday''' | ||
+ | ** Question of the day: What's wrong with this picture? | ||
+ | ** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] | ||
+ | ** Lab for today: | ||
+ | *** Compare WordCount on 1 vs WordCount on 6 [[Hadoop_Tutorial_1_--_Running_WordCount#Moment_of_Truth:_Compare_5-PC_Hadoop_cluster_to_1_Linux_PC | Section 5 of Tutorial 1]] | ||
+ | *** Create your own version of the Java WordCount program [[Hadoop_Tutorial_1_--_Running_WordCount#Running_Your_Own_Version_of_WordCount.java | Section 4 of Tutorial 1]] | ||
+ | *** Create your own Counters [[Hadoop_Tutorial_1_--_Running_WordCount#Counters | Section 6 of Tutorial 1]]: count Buck! | ||
+ | *** Generate Timelines [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial 1.1]] | ||
+ | *** Counting words in Python [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial 2]] | ||
---- | ---- | ||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
− | + | * [[CSC352 Homework 4 | Homework #4]] and [[CSC352 Homework 4 Solution | a solution]]. | |
|| | || | ||
| | ||
Line 243: | Line 303: | ||
<!-- ================================================================== --> | <!-- ================================================================== --> | ||
|- style="background:#ffffff" valign="top" | |- style="background:#ffffff" valign="top" | ||
− | | Week | + | | Week 11 <br /> <br /> |
|| | || | ||
* '''Tuesday''' | * '''Tuesday''' | ||
+ | ** Presentation of [http://portal.acm.org/ft_gateway.cfm?id=1629198&type=pdf&coll=GUIDE&dl=GUIDE&CFID=82739837&CFTOKEN=94683258 MapReduce: A Flexible Data Processing Tool] ([[media:MapReduceFlexibleDataProcessingTool.pdf | cached copy]]) | ||
+ | ** Compare with Paulson, Rasin, Abadi, DeWitt, Madden, and Stonebraker's paper [[Media:ComparisonOfApproachesToLargeScaleDataAnalysis.pdf |A Comparison of Approaches to Large Scale Data-Analysis]], SIGMOD-09, June 2009. | ||
* '''Thursday''' | * '''Thursday''' | ||
+ | ** [http://en.wikipedia.org/wiki/Hypertable Hypertable] is an open-source project parallel to Google's BigTable... | ||
+ | ** Art vs Science... | ||
+ | ** Some preliminary thinking about the final project... | ||
+ | ** [[Hadoop_Tutorial_2.1_--_Streaming_XML_Files | Streaming whole files]] | ||
+ | ** [[Hadoop Tutorial 2.2 -- Running C++ Programs on Hadoop | WordCount in C++]] | ||
+ | ** Visualizations of Hadoop Data Transfers, from the U. of Nebraska ([http://www.google.com/search?q=university+of+Nebraska+hadoop+visualization&hl=en&safe=off&tbs=vid:1&tbo=u&ei=oKO4S6GMCoH7lwfq88SXCg&sa=X&oi=video_result_group&ct=title&resnum=1&ved=0CBEQqwQwAA more videos]) | ||
+ | <br /><br /><center><videoflash>qoBoEzOkeDQ</videoflash></center><br /><br /> | ||
+ | ** Monitoring a Cluster of Computers as a school of fish (U. Nebraska) | ||
+ | <br /><br /><center><videoflash>LM1j_8sWSEk</videoflash></center><br /><br /> | ||
+ | ** The evolution of Hadoop (Code-Swarm) | ||
+ | <br /><br /><center><videoflash type="vimeo">2513321</videoflash></center><br /><br /> | ||
+ | |||
---- | ---- | ||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
− | + | * A [[CSC352 Project 2 Solution | solution]] for Project 2 has been posted! | |
|| | || | ||
+ | * [http://developer.yahoo.com/hadoop/tutorial/index.html Hadoop Tutorial from Yahoo Developer Network (YDN)] | ||
| | ||
+ | |||
<!-- ================================================================== --> | <!-- ================================================================== --> | ||
|- style="background:#eeeeff" valign="top" | |- style="background:#eeeeff" valign="top" | ||
− | | Week | + | | Week 12 <br /> <br /> |
|| | || | ||
* '''Tuesday''' | * '''Tuesday''' | ||
+ | ** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part I | ||
+ | ** [[Hadoop_Tutorial_3_--_Hadoop_on_Amazon_AWS | Signing on to Amazon AWS]] | ||
+ | ** [[Hadoop_Tutorial_3.1_--_Using_Amazon%27s_WordCount_program | Uploading data to AWS and counting words]] | ||
+ | ** [[Hadoop_Tutorial_3.2_--_Using_Your_Own_WordCount_program | Word-counting using Streaming Python on AWS]] | ||
+ | ** [[Hadoop_Tutorial_3.3_--_How_Much%3F | Costs of maintaining a Hadoop cluster on AWS]] | ||
* '''Thursday''' | * '''Thursday''' | ||
+ | ** Continuation of the AWS labs | ||
+ | ** [[Hadoop_Tutorial_4:_Start_an_EC2_Instance | Starting an EC2 instance on AWS]] | ||
---- | ---- | ||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
Line 264: | Line 347: | ||
<!-- ================================================================== --> | <!-- ================================================================== --> | ||
|- style="background:#ffffff" valign="top" | |- style="background:#ffffff" valign="top" | ||
− | | Week | + | | Week 13 <br /> <br /> |
|| | || | ||
* '''Tuesday''' | * '''Tuesday''' | ||
+ | ** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part II | ||
+ | ** [[CSC352 Problem of the Day| Problem of the day]]: discussion | ||
+ | ** Work on projects | ||
+ | |||
* '''Thursday''' | * '''Thursday''' | ||
+ | ** Presentation of [http://www.hulu.com/watch/116372/cnbc-originals-inside-the-mind-of-google Inside the Mind of Google] | ||
+ | ** wrap up | ||
---- | ---- | ||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
Line 275: | Line 364: | ||
|} | |} | ||
+ | =Selected Solutions for papers, homework, or projects= | ||
− | + | * [[Media:Amdahls_Law_in_the_Multicore_Era.pdf | Amdahl's Law in the Multicore Era]] summary | |
+ | * [[CSC352 Project1 Solution | Selected Solutions for Project 1]] | ||
<br /> | <br /> |
Latest revision as of 10:22, 9 August 2013
Contents
Hadoop-Related
Projects
- Project 1: Started Feb 16, due March 2nd. A good example of a solution.
- Project 2: Deals with the Xgrid. A collection of good proposed solutions.
- Project 3 Deals with processing Wikipedia pages on Hadoop/MapReduce.
- Project 4 has officially started on 1/26/10! and is now Over. Find its shared wiki page here! Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
Python Threads
Week | Topics | Reading |
Week 1 1/25 |
|
|
Week 2 2/1 |
|
|
Week 3 2/8 |
|
|
Week 4 2/15 |
|
|
XGrid Programming
Week | Topics | Reading |
Week 5 |
|
|
Week 6 |
|
|
Week 7 |
|
|
|
|
|
Week 8 |
|
|
Cloud Computing
Week | Topics | Reading |
Week 9 |
|
|
Week 10 |
|
|
Week 11 |
|
|
Week 12 |
|
|
Week 13 |
|
|
Selected Solutions for papers, homework, or projects