Hadoop Tutorial 2.1 -- Streaming XML Files

From dftwiki3
Revision as of 21:13, 12 April 2010 by Thiebaut (talk | contribs) (Created page with '{| | width="40%" | __TOC__ | <bluebox> right | 80px <br /> <br /> This tutorial is the continuation of [[Hadoop_Tutorial_2_--_Running_WordCount_in_…')
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

Contents

HadoopCartoon.png



This tutorial is the continuation of Hadoop_Tutorial_2_--_Running_WordCount_in_Python, and uses streaming to process XML files as a block. In this setup each Map task gets a whole xml file and breaks it down into tuples.


The Setup



StreamingXMLInHadoop.png