Difference between revisions of "Hadoop/MapReduce Tutorials"

From dftwiki3
Jump to: navigation, search
 
Line 5: Line 5:
 
[[File:HadoopCartoon.png | right | 50px]]
 
[[File:HadoopCartoon.png | right | 50px]]
 
<bluebox>
 
<bluebox>
<br />
 
 
<br />
 
<br />
  
 
These tutorials target the Hadoop/MapReduce Cluster in the CS Dept. at Smith College, as well as Amazon's EC2 and S3.
 
These tutorials target the Hadoop/MapReduce Cluster in the CS Dept. at Smith College, as well as Amazon's EC2 and S3.
  
<br />
 
<br />
 
 
<br />
 
<br />
 
</bluebox>
 
</bluebox>

Latest revision as of 15:54, 18 April 2010

--D. Thiebaut 16:01, 18 April 2010 (UTC)


AmazonAWS.jpg


HadoopCartoon.png


These tutorials target the Hadoop/MapReduce Cluster in the CS Dept. at Smith College, as well as Amazon's EC2 and S3.






Tutorial Description

Tutorial #1

Running WordCount written in Java on the Smith College Hadoop/MapReduce Cluster

Tutorial #1.1

Creating timelines of the execution of tasks during the execution of a MapReduce program.

Tutorial #2

Running WordCount in Python on the Smith College Hadoop/MapReduce Cluster

Tutorial #2.1

Running a streaming Python MapReduce program on XML files

Tutorial #2.2

Running C++ programs under Hadoop Pipes

Tutorial #3

Running Hadoop jobs on Amazon AWS

Tutorial #3.1

Uploading text to S3 and running Amazon's WordCount Java program on our own data.

Tutorial #3.2

Uploading and Running your own streaming version of the WordCount program on AWS.

Tutorial #3.3

Computing the cost of maintaining a cluster of 6 MapReduce instances on Amazon's AWS

Tutorial #4

Start a server on Amazon's EC2 infrastructure