Difference between revisions of "Hadoop/MapReduce Tutorials"
(Created page with 'right| 250px <br /> right | 150px <bluebox> <br /> <br /> These tutorials target the Hadoop/MapReduce Cluster in the CS Dep…') |
|||
(15 intermediate revisions by the same user not shown) | |||
Line 1: | Line 1: | ||
− | [[Image:AmazonAWS.jpg |right| | + | --[[User:Thiebaut|D. Thiebaut]] 16:01, 18 April 2010 (UTC) |
+ | ---- | ||
+ | [[Image:AmazonAWS.jpg |right| 150px]] | ||
<br /> | <br /> | ||
− | [[File:HadoopCartoon.png | right | | + | [[File:HadoopCartoon.png | right | 50px]] |
<bluebox> | <bluebox> | ||
− | |||
<br /> | <br /> | ||
Line 9: | Line 10: | ||
<br /> | <br /> | ||
+ | </bluebox> | ||
<br /> | <br /> | ||
<br /> | <br /> | ||
− | </ | + | <br /> |
+ | |||
+ | |||
− | {| | + | ::{| |
! Tutorial | ! Tutorial | ||
− | ! | + | ! Description |
− | |- | + | |- style="background:#eeeeff" valign="top" |
− | | | + | | width="30%" | |
[[Hadoop Tutorial 1 -- Running WordCount | Tutorial #1]] | [[Hadoop Tutorial 1 -- Running WordCount | Tutorial #1]] | ||
| | | | ||
Running WordCount written in Java on the Smith College Hadoop/MapReduce Cluster | Running WordCount written in Java on the Smith College Hadoop/MapReduce Cluster | ||
− | |- | + | |- style="background:#ffffff" valign="top" |
+ | | | ||
+ | [[Hadoop Tutorial 1.1 -- Generating Task Timelines | Tutorial #1.1]] | ||
+ | | | ||
+ | Creating timelines of the execution of tasks during the execution of a MapReduce program. | ||
+ | |- style="background:#eeeeff" valign="top" | ||
| | | | ||
[[Hadoop Tutorial 2 -- Running WordCount in Python | Tutorial #2]] | [[Hadoop Tutorial 2 -- Running WordCount in Python | Tutorial #2]] | ||
| | | | ||
Running WordCount in Python on the Smith College Hadoop/MapReduce Cluster | Running WordCount in Python on the Smith College Hadoop/MapReduce Cluster | ||
+ | |- style="background:#ffffff" valign="top" | ||
+ | | | ||
+ | [[Hadoop_Tutorial_2.1_--_Streaming_XML_Files | Tutorial #2.1]] | ||
+ | | | ||
+ | Running a streaming Python MapReduce program on XML files | ||
+ | |- style="background:#eeeeff" valign="top" | ||
+ | | | ||
+ | [http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_--_Running_C%2B%2B_Programs_on_Hadoop Tutorial #2.2] | ||
+ | | | ||
+ | Running C++ programs under Hadoop Pipes | ||
+ | |- style="background:#ffffff" valign="top" | ||
| | | | ||
− | |||
[[Hadoop Tutorial 3 -- Hadoop on Amazon AWS | Tutorial #3]] | [[Hadoop Tutorial 3 -- Hadoop on Amazon AWS | Tutorial #3]] | ||
| | | | ||
Running Hadoop jobs on Amazon AWS | Running Hadoop jobs on Amazon AWS | ||
− | |- | + | |- style="background:#eeeeff" valign="top" |
| | | | ||
− | [[ | + | [[Hadoop_Tutorial_3.1_--_Using_Amazon's_WordCount_program | Tutorial #3.1]] |
| | | | ||
Uploading text to S3 and running Amazon's WordCount Java program on our own data. | Uploading text to S3 and running Amazon's WordCount Java program on our own data. | ||
− | |- | + | |- style="background:#ffffff" valign="top" |
− | [[Hadoop Tutorial 3. | + | | |
+ | [[Hadoop_Tutorial_3.2_--_Using_Your_Own_WordCount_program | Tutorial #3.2]] | ||
+ | | | ||
+ | Uploading and Running your own streaming version of the WordCount program on AWS. | ||
+ | |- style="background:#eeeeff" valign="top" | ||
+ | | | ||
+ | [[Hadoop Tutorial 3.3 -- How Much? | Tutorial #3.3]] | ||
+ | | | ||
+ | Computing the cost of maintaining a cluster of 6 MapReduce instances on Amazon's AWS | ||
+ | |- style="background:#ffffff" valign="top" | ||
| | | | ||
− | + | [[Hadoop Tutorial 4: Start an EC2 Instance | Tutorial #4]] | |
− | |||
− | [[Hadoop Tutorial 4: Start an EC2 Instance | Tutorial 4]] | ||
| | | | ||
Start a server on Amazon's EC2 infrastructure | Start a server on Amazon's EC2 infrastructure |
Latest revision as of 15:54, 18 April 2010
--D. Thiebaut 16:01, 18 April 2010 (UTC)
These tutorials target the Hadoop/MapReduce Cluster in the CS Dept. at Smith College, as well as Amazon's EC2 and S3.
Tutorial Description Running WordCount written in Java on the Smith College Hadoop/MapReduce Cluster
Creating timelines of the execution of tasks during the execution of a MapReduce program.
Running WordCount in Python on the Smith College Hadoop/MapReduce Cluster
Running a streaming Python MapReduce program on XML files
Running C++ programs under Hadoop Pipes
Running Hadoop jobs on Amazon AWS
Uploading text to S3 and running Amazon's WordCount Java program on our own data.
Uploading and Running your own streaming version of the WordCount program on AWS.
Computing the cost of maintaining a cluster of 6 MapReduce instances on Amazon's AWS
Start a server on Amazon's EC2 infrastructure