Hadoop Tutorial 1.1 -- Generating Task Timelines

From dftwiki3
Revision as of 14:20, 4 April 2010 by Thiebaut (talk | contribs)
Jump to: navigation, search

Contents

This Hadoop tutorial shows how to generate Task Timelines similar to the ones generated by Yahoo in their report on the TeraSort experiment.

In May 2009 Yahoo announced it could sort a Petabyte of dat in 16.25 hours and a Terabyte of data in 62 seconds using Hadoop running on 3658 processors in the first case, and 1460 in the second case [1]. In their report they show very convincing diagrams showing the evolution of the computation as a time-line of map, shuffle, sort, and reduce tasks as a function of time, an example of which is shown below.



MapReduceTaskTimeLine.png


References