Difference between revisions of "Hadoop/MapReduce Tutorials"
Line 37: | Line 37: | ||
Running WordCount in Python on the Smith College Hadoop/MapReduce Cluster | Running WordCount in Python on the Smith College Hadoop/MapReduce Cluster | ||
|- style="background:#ffffff" valign="top" | |- style="background:#ffffff" valign="top" | ||
+ | | | ||
+ | [[Hadoop_Tutorial_2.1_--_Streaming_XML_Files | Tutorial #2.1]] | ||
+ | | | ||
+ | Running a streaming Python MapReduce program on XML files | ||
+ | |- style="background:#eeeeff" valign="top" | ||
| | | | ||
[[Hadoop Tutorial 3 -- Hadoop on Amazon AWS | Tutorial #3]] | [[Hadoop Tutorial 3 -- Hadoop on Amazon AWS | Tutorial #3]] | ||
| | | | ||
Running Hadoop jobs on Amazon AWS | Running Hadoop jobs on Amazon AWS | ||
− | |- style="background:# | + | |- style="background:#ffffff" valign="top" |
| | | | ||
[[Hadoop_Tutorial_3.1_--_Using_Amazon's_WordCount_program | Tutorial #3.1]] | [[Hadoop_Tutorial_3.1_--_Using_Amazon's_WordCount_program | Tutorial #3.1]] | ||
| | | | ||
Uploading text to S3 and running Amazon's WordCount Java program on our own data. | Uploading text to S3 and running Amazon's WordCount Java program on our own data. | ||
− | |- style="background:# | + | |- style="background:#eeeeff" valign="top" |
| | | | ||
[[Hadoop_Tutorial_3.2_--_Using_Your_Own_WordCount_program | Tutorial #3.2]] | [[Hadoop_Tutorial_3.2_--_Using_Your_Own_WordCount_program | Tutorial #3.2]] | ||
| | | | ||
Compiling our own version of the Java WordCount program and uploading it to AWS. | Compiling our own version of the Java WordCount program and uploading it to AWS. | ||
− | |- style="background:# | + | |- style="background:#ffffff" valign="top" |
| | | | ||
[[Hadoop Tutorial 4: Start an EC2 Instance | Tutorial #4]] | [[Hadoop Tutorial 4: Start an EC2 Instance | Tutorial #4]] |
Revision as of 22:50, 12 April 2010
These tutorials target the Hadoop/MapReduce Cluster in the CS Dept. at Smith College, as well as Amazon's EC2 and S3.
Tutorial Comments Running WordCount written in Java on the Smith College Hadoop/MapReduce Cluster
Creating timelines of the execution of tasks during the execution of a MapReduce program.
Running WordCount in Python on the Smith College Hadoop/MapReduce Cluster
Running a streaming Python MapReduce program on XML files
Running Hadoop jobs on Amazon AWS
Uploading text to S3 and running Amazon's WordCount Java program on our own data.
Compiling our own version of the Java WordCount program and uploading it to AWS.
Start a server on Amazon's EC2 infrastructure