Difference between revisions of "Hadoop/MapReduce Tutorials"

From dftwiki3
Jump to: navigation, search
 
(14 intermediate revisions by the same user not shown)
Line 1: Line 1:
 +
--[[User:Thiebaut|D. Thiebaut]] 16:01, 18 April 2010 (UTC)
 +
----
 
[[Image:AmazonAWS.jpg |right| 150px]]
 
[[Image:AmazonAWS.jpg |right| 150px]]
 
<br />
 
<br />
[[File:HadoopCartoon.png | right | 150px]]
+
[[File:HadoopCartoon.png | right | 50px]]
 
<bluebox>
 
<bluebox>
<br />
 
 
<br />
 
<br />
  
 
These tutorials target the Hadoop/MapReduce Cluster in the CS Dept. at Smith College, as well as Amazon's EC2 and S3.
 
These tutorials target the Hadoop/MapReduce Cluster in the CS Dept. at Smith College, as well as Amazon's EC2 and S3.
  
 +
<br />
 +
</bluebox>
 
<br />
 
<br />
 
<br />
 
<br />
 
<br />
 
<br />
</bluebox>
 
  
{|
+
 
 +
 
 +
::{|
 
! Tutorial
 
! Tutorial
! Comments
+
! Description
|-
+
|- style="background:#eeeeff" valign="top"
|
+
| width="30%" |
 
[[Hadoop Tutorial 1 -- Running WordCount | Tutorial #1]]
 
[[Hadoop Tutorial 1 -- Running WordCount | Tutorial #1]]
 
|  
 
|  
 
Running WordCount written in Java on the Smith College Hadoop/MapReduce Cluster
 
Running WordCount written in Java on the Smith College Hadoop/MapReduce Cluster
|-
+
|- style="background:#ffffff" valign="top"
 +
|
 +
[[Hadoop Tutorial 1.1 -- Generating Task Timelines | Tutorial #1.1]]
 +
|
 +
Creating timelines of the execution of tasks during the execution of a MapReduce program.
 +
|- style="background:#eeeeff" valign="top"
 
|
 
|
 
[[Hadoop Tutorial 2 -- Running WordCount in Python | Tutorial #2]]
 
[[Hadoop Tutorial 2 -- Running WordCount in Python | Tutorial #2]]
 
|
 
|
 
Running WordCount in Python on the Smith College Hadoop/MapReduce Cluster
 
Running WordCount in Python on the Smith College Hadoop/MapReduce Cluster
|-
+
|- style="background:#ffffff" valign="top"
 +
|
 +
[[Hadoop_Tutorial_2.1_--_Streaming_XML_Files | Tutorial #2.1]]
 +
|
 +
Running a streaming Python MapReduce program on XML files
 +
|- style="background:#eeeeff" valign="top"
 +
|
 +
[http://cs.smith.edu/dftwiki/index.php/Hadoop_Tutorial_2.2_--_Running_C%2B%2B_Programs_on_Hadoop Tutorial #2.2]
 +
|
 +
Running C++ programs under Hadoop Pipes
 +
|- style="background:#ffffff" valign="top"
 
|
 
|
 
[[Hadoop Tutorial 3 -- Hadoop on Amazon AWS | Tutorial #3]]
 
[[Hadoop Tutorial 3 -- Hadoop on Amazon AWS | Tutorial #3]]
 
|
 
|
 
Running Hadoop jobs on Amazon AWS
 
Running Hadoop jobs on Amazon AWS
|-
+
|- style="background:#eeeeff" valign="top"
 
|
 
|
[[Hadoop_Tutorial_3.1_--_Using_Amazon's_WordCount_program | Tutorial 3.1]]
+
[[Hadoop_Tutorial_3.1_--_Using_Amazon's_WordCount_program | Tutorial #3.1]]
 
|
 
|
 
Uploading text to S3 and running Amazon's WordCount Java program on our own data.
 
Uploading text to S3 and running Amazon's WordCount Java program on our own data.
|-
+
|- style="background:#ffffff" valign="top"
 +
|
 +
[[Hadoop_Tutorial_3.2_--_Using_Your_Own_WordCount_program | Tutorial #3.2]]
 +
|
 +
Uploading and Running your own streaming version of the WordCount program on AWS.
 +
|- style="background:#eeeeff" valign="top"
 
|
 
|
[[Hadoop_Tutorial_3.2_--_Using_Your_Own_WordCount_program | Tutorial 3.2]]
+
[[Hadoop Tutorial 3.3 -- How Much? | Tutorial #3.3]]
 
|
 
|
Compiling our own version of the Java WordCount program and uploading it to AWS.
+
Computing the cost of maintaining a cluster of 6 MapReduce instances on Amazon's AWS
|-
+
|- style="background:#ffffff" valign="top"
 
|
 
|
[[Hadoop Tutorial 4: Start an EC2 Instance | Tutorial 4]]
+
[[Hadoop Tutorial 4: Start an EC2 Instance | Tutorial #4]]
 
|
 
|
 
Start a server on Amazon's EC2 infrastructure
 
Start a server on Amazon's EC2 infrastructure

Latest revision as of 15:54, 18 April 2010

--D. Thiebaut 16:01, 18 April 2010 (UTC)


AmazonAWS.jpg


HadoopCartoon.png


These tutorials target the Hadoop/MapReduce Cluster in the CS Dept. at Smith College, as well as Amazon's EC2 and S3.






Tutorial Description

Tutorial #1

Running WordCount written in Java on the Smith College Hadoop/MapReduce Cluster

Tutorial #1.1

Creating timelines of the execution of tasks during the execution of a MapReduce program.

Tutorial #2

Running WordCount in Python on the Smith College Hadoop/MapReduce Cluster

Tutorial #2.1

Running a streaming Python MapReduce program on XML files

Tutorial #2.2

Running C++ programs under Hadoop Pipes

Tutorial #3

Running Hadoop jobs on Amazon AWS

Tutorial #3.1

Uploading text to S3 and running Amazon's WordCount Java program on our own data.

Tutorial #3.2

Uploading and Running your own streaming version of the WordCount program on AWS.

Tutorial #3.3

Computing the cost of maintaining a cluster of 6 MapReduce instances on Amazon's AWS

Tutorial #4

Start a server on Amazon's EC2 infrastructure