Difference between revisions of "CSC352 Class Page 2010"
(→Cloud Computing) |
m (Thiebaut moved page CSC352 Class Page to CSC352 Class Page 2010) |
||
(44 intermediate revisions by the same user not shown) | |||
Line 7: | Line 7: | ||
<br /> | <br /> | ||
<br /> | <br /> | ||
+ | |||
+ | =Hadoop-Related= | ||
+ | |||
+ | * [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ Hadoop FAQ Page] | ||
+ | * [http://maven.smith.edu/~thiebaut/showhadoopip.php Hadoop IPs] | ||
=Projects= | =Projects= | ||
* [[CSC352 Project 1 | Project 1]]: Started Feb 16, due March 2nd. A [[CSC352 Project1 Solution|good example of a solution]]. | * [[CSC352 Project 1 | Project 1]]: Started Feb 16, due March 2nd. A [[CSC352 Project1 Solution|good example of a solution]]. | ||
− | * [[CSC352 Project 2 | Project 2]]: Deals with the Xgrid | + | * [[CSC352 Project 2 | Project 2]]: Deals with the Xgrid. A [[CSC352 Project 2 Solution | collection of good proposed solutions]]. |
− | * Project 3 | + | * [[CSC352 Project 3 | Project 3]] Deals with processing Wikipedia pages on Hadoop/MapReduce. |
+ | |||
* '''Project 4 has officially started on 1/26/10! and is now Over.''' Find its shared wiki page [http://cs.smith.edu/classwiki/index.php/CSC352_Page#Projects here]! Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster! | * '''Project 4 has officially started on 1/26/10! and is now Over.''' Find its shared wiki page [http://cs.smith.edu/classwiki/index.php/CSC352_Page#Projects here]! Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster! | ||
Line 204: | Line 210: | ||
---- | ---- | ||
− | * [[CSC352 Homework 3 | Homework #3 ]] | + | * [[CSC352 Homework 3 | Homework #3 ]] and its [[CSC352 Homework 3 Solution | solution programs]]. |
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
Line 232: | Line 238: | ||
| | ||
|} | |} | ||
+ | |||
+ | <!-- | ||
+ | _ _ | ||
+ | ___| | ___ _ _ __| | | ||
+ | / __| |/ _ \ | | | |/ _` | | ||
+ | | (__| | (_) | |_| | (_| | | ||
+ | \___|_|\___/\__,_|\__,_| | ||
+ | |||
+ | --> | ||
+ | |||
+ | [[File:HadoopCartoon.png|right|100px]] | ||
=Cloud Computing= | =Cloud Computing= | ||
Line 243: | Line 260: | ||
|width="60%"| | |width="60%"| | ||
* '''Tuesday''' | * '''Tuesday''' | ||
− | ** Presentation of [http://labs.google.com/papers/mapreduce-osdi04.pdf MapReduce: Simplified Data Processing on Large Clusters] | + | ** Presentation of [http://labs.google.com/papers/mapreduce-osdi04.pdf MapReduce: Simplified Data Processing on Large Clusters] (Yang) |
− | ** [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] | + | ** [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] (We stopped at Section 4). |
* '''Thursday''' | * '''Thursday''' | ||
+ | ** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] and [[Hadoop_Tutorial_1_--_Running_WordCount | Tutorial #1]], [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial #2]] | ||
+ | |||
---- | ---- | ||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
+ | * [http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Howto_%26_FAQ Hadoop Howtos and FAQs] | ||
|| | || | ||
+ | [[File:HadoopOReilly.jpg | 70px | right]] | ||
* [http://hadoop.apache.org/common/docs/current/mapred_tutorial.html Map-Reduce tutorial] from apache.org: a must-read! | * [http://hadoop.apache.org/common/docs/current/mapred_tutorial.html Map-Reduce tutorial] from apache.org: a must-read! | ||
* [http://developer.yahoo.com/hadoop/tutorial/module4.html Map-Reduce Basics] from Yahoo.com: another must-read! | * [http://developer.yahoo.com/hadoop/tutorial/module4.html Map-Reduce Basics] from Yahoo.com: another must-read! | ||
+ | * Section 6 in Tom White's ''Hadoop, the Definitive Guide'', available on [http://books.google.com/books?id=bKPEwR-Pt6EC&printsec=frontcover&dq=hadoop+definitive+guide&source=bl&ots=kOdw-xf9Gg&sig=GyHDzyATSbMVcPysVbSAQKuhv58&hl=en&ei=YJ-0S6HSOIS0lQfm3u1q&sa=X&oi=book_result&ct=result&resnum=6&ved=0CB0Q6AEwBQ#v=onepage&q=&f=false Google Books]. | ||
<!-- ================================================================== --> | <!-- ================================================================== --> | ||
|- style="background:#eeeeff" valign="top" | |- style="background:#eeeeff" valign="top" | ||
| Week 10 <br /> <br /> | | Week 10 <br /> <br /> | ||
|| | || | ||
+ | [[Image:CSC352HadoopPerformanceMachineLearning.png| right|150px]] | ||
* '''Tuesday''' | * '''Tuesday''' | ||
− | ** Presentation of [http://www.icsi.berkeley.edu/~arlo/publications/gillick_cs262a_proj.pdf | + | ** Presentation of [http://www.icsi.berkeley.edu/~arlo/publications/gillick_cs262a_proj.pdf MapReduce: Distributed Computing for Machine Learning] |
+ | ** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] | ||
+ | *** [[Hadoop_Tutorial_1_--_Running_WordCount | Tutorial #1]], | ||
+ | *** [[Hadoop_Tutorial_1_--_Running_WordCount#Analyzing_the_Hadoop_Logs | Tutorial #1: output logs]] | ||
+ | *** [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial #1.1: Task Timelines]] | ||
+ | [[Image:WrongTaskTimeline.png| 200px|right]] | ||
* '''Thursday''' | * '''Thursday''' | ||
+ | ** Question of the day: What's wrong with this picture? | ||
+ | ** Continuation of [[CSC352 MapReduce/Hadoop Class Notes | Introduction to MapReduce/Hadoop]] | ||
+ | ** Lab for today: | ||
+ | *** Compare WordCount on 1 vs WordCount on 6 [[Hadoop_Tutorial_1_--_Running_WordCount#Moment_of_Truth:_Compare_5-PC_Hadoop_cluster_to_1_Linux_PC | Section 5 of Tutorial 1]] | ||
+ | *** Create your own version of the Java WordCount program [[Hadoop_Tutorial_1_--_Running_WordCount#Running_Your_Own_Version_of_WordCount.java | Section 4 of Tutorial 1]] | ||
+ | *** Create your own Counters [[Hadoop_Tutorial_1_--_Running_WordCount#Counters | Section 6 of Tutorial 1]]: count Buck! | ||
+ | *** Generate Timelines [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Tutorial 1.1]] | ||
+ | *** Counting words in Python [[Hadoop_Tutorial_2_--_Running_WordCount_in_Python | Tutorial 2]] | ||
---- | ---- | ||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
− | + | * [[CSC352 Homework 4 | Homework #4]] and [[CSC352 Homework 4 Solution | a solution]]. | |
|| | || | ||
| | ||
Line 270: | Line 306: | ||
|| | || | ||
* '''Tuesday''' | * '''Tuesday''' | ||
− | ** Presentation of [http:// | + | ** Presentation of [http://portal.acm.org/ft_gateway.cfm?id=1629198&type=pdf&coll=GUIDE&dl=GUIDE&CFID=82739837&CFTOKEN=94683258 MapReduce: A Flexible Data Processing Tool] ([[media:MapReduceFlexibleDataProcessingTool.pdf | cached copy]]) |
+ | ** Compare with Paulson, Rasin, Abadi, DeWitt, Madden, and Stonebraker's paper [[Media:ComparisonOfApproachesToLargeScaleDataAnalysis.pdf |A Comparison of Approaches to Large Scale Data-Analysis]], SIGMOD-09, June 2009. | ||
* '''Thursday''' | * '''Thursday''' | ||
+ | ** [http://en.wikipedia.org/wiki/Hypertable Hypertable] is an open-source project parallel to Google's BigTable... | ||
+ | ** Art vs Science... | ||
+ | ** Some preliminary thinking about the final project... | ||
+ | ** [[Hadoop_Tutorial_2.1_--_Streaming_XML_Files | Streaming whole files]] | ||
+ | ** [[Hadoop Tutorial 2.2 -- Running C++ Programs on Hadoop | WordCount in C++]] | ||
+ | ** Visualizations of Hadoop Data Transfers, from the U. of Nebraska ([http://www.google.com/search?q=university+of+Nebraska+hadoop+visualization&hl=en&safe=off&tbs=vid:1&tbo=u&ei=oKO4S6GMCoH7lwfq88SXCg&sa=X&oi=video_result_group&ct=title&resnum=1&ved=0CBEQqwQwAA more videos]) | ||
+ | <br /><br /><center><videoflash>qoBoEzOkeDQ</videoflash></center><br /><br /> | ||
+ | ** Monitoring a Cluster of Computers as a school of fish (U. Nebraska) | ||
+ | <br /><br /><center><videoflash>LM1j_8sWSEk</videoflash></center><br /><br /> | ||
+ | ** The evolution of Hadoop (Code-Swarm) | ||
+ | <br /><br /><center><videoflash type="vimeo">2513321</videoflash></center><br /><br /> | ||
+ | |||
---- | ---- | ||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
− | + | * A [[CSC352 Project 2 Solution | solution]] for Project 2 has been posted! | |
|| | || | ||
+ | * [http://developer.yahoo.com/hadoop/tutorial/index.html Hadoop Tutorial from Yahoo Developer Network (YDN)] | ||
| | ||
+ | |||
<!-- ================================================================== --> | <!-- ================================================================== --> | ||
|- style="background:#eeeeff" valign="top" | |- style="background:#eeeeff" valign="top" | ||
Line 283: | Line 334: | ||
* '''Tuesday''' | * '''Tuesday''' | ||
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part I | ** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part I | ||
+ | ** [[Hadoop_Tutorial_3_--_Hadoop_on_Amazon_AWS | Signing on to Amazon AWS]] | ||
+ | ** [[Hadoop_Tutorial_3.1_--_Using_Amazon%27s_WordCount_program | Uploading data to AWS and counting words]] | ||
+ | ** [[Hadoop_Tutorial_3.2_--_Using_Your_Own_WordCount_program | Word-counting using Streaming Python on AWS]] | ||
+ | ** [[Hadoop_Tutorial_3.3_--_How_Much%3F | Costs of maintaining a Hadoop cluster on AWS]] | ||
* '''Thursday''' | * '''Thursday''' | ||
+ | ** Continuation of the AWS labs | ||
+ | ** [[Hadoop_Tutorial_4:_Start_an_EC2_Instance | Starting an EC2 instance on AWS]] | ||
---- | ---- | ||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | ||
Line 294: | Line 351: | ||
* '''Tuesday''' | * '''Tuesday''' | ||
** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part II | ** Presentation of [http://www.eecs.berkeley.edu/Pubs/TechRpts/2009/EECS-2009-28.pdf Above the Clouds: A Berkeley View of Cloud Computing], Part II | ||
+ | ** [[CSC352 Problem of the Day| Problem of the day]]: discussion | ||
+ | ** Work on projects | ||
* '''Thursday''' | * '''Thursday''' | ||
** Presentation of [http://www.hulu.com/watch/116372/cnbc-originals-inside-the-mind-of-google Inside the Mind of Google] | ** Presentation of [http://www.hulu.com/watch/116372/cnbc-originals-inside-the-mind-of-google Inside the Mind of Google] | ||
+ | ** wrap up | ||
---- | ---- | ||
* [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] | * [http://cs.smith.edu/classwiki/index.php/CSC352_Page Lecture Notes] |
Latest revision as of 11:22, 9 August 2013
Contents
Hadoop-Related
Projects
- Project 1: Started Feb 16, due March 2nd. A good example of a solution.
- Project 2: Deals with the Xgrid. A collection of good proposed solutions.
- Project 3 Deals with processing Wikipedia pages on Hadoop/MapReduce.
- Project 4 has officially started on 1/26/10! and is now Over. Find its shared wiki page here! Congratulations to the class for setting up the individual nodes of the first Smith Cloud Cluster!
Python Threads
Week | Topics | Reading |
Week 1 1/25 |
|
|
Week 2 2/1 |
|
|
Week 3 2/8 |
|
|
Week 4 2/15 |
|
|
XGrid Programming
Week | Topics | Reading |
Week 5 |
|
|
Week 6 |
|
|
Week 7 |
|
|
|
|
|
Week 8 |
|
|
Cloud Computing
Week | Topics | Reading |
Week 9 |
|
|
Week 10 |
|
|
Week 11 |
|
|
Week 12 |
|
|
Week 13 |
|
|
Selected Solutions for papers, homework, or projects