Hadoop Tutorial 1 -- Running Wordcount (rev 2)
--D. Thiebaut 14:41, 16 December 2010 (UTC)
This page is deprecated. Go to the updated page here (Rev 3)
Before You Start
- You should be familiar with Hadoop. There are several on-line pages and tutorials that have excellent information. Make sure you browse them first!
- MapReduce Tutorial at apache.org. A must-read!
- Hadoop Tutorial at Yahoo!. Also very good!
- Section 6 in Tom White's Hadoop, the Definitive Guide is also good reading material.
Setup
Cluster Ips
- You can get the up-to-date Ips of the different nodes by running this Python program pingHadoop.py.
- Once it is created, make it executable
chmod +x pingHadoop.py
- and run it
./pingHadoop.py Thu Dec 16 09:39:20 2010 Status from hadoop101.dyndns.org is Alive Status from hadoop102.dyndns.org is Alive Status from hadoop103.dyndns.org is Alive Status from hadoop104.dyndns.org is Alive Status from hadoop105.dyndns.org is Alive Status from hadoop106.dyndns.org is Alive Status from hadoop107.dyndns.org is Alive Status from hadoop108.dyndns.org is Alive Status from hadoop109.dyndns.org is Alive Status from hadoop110.dyndns.org is Alive
Status of the Cluster
You can get information on the health of the cluster with Ganglia. Ganglia is a monitoring system for computer clusters.
Main Ganglia Window
Node Windows
Starting Cluster Operation
- Verify that all machines in FH342 are running Fedora.
- The cluster should be ready for hadoop operations by default.
Restarting Hadoop
- In case the Hadoop is not running, you may have to restart it:
- Ssh to hadoop110 as user hadoop
- Probe for hadoop processes/daemons running on hadoop110 with the Java Virtual Machine Process Status Tool (jps):
hadoop@hadoop110:~$ jps 16404 NameNode 16775 Jps 16576 SecondaryNameNode 16648 JobTracker
- If you don't see any of the processes above, the cluster is down. In this case, bring it up with start-all.sh
hadoop@hadoop110:~$ start-all.sh starting namenode, logging to /mnt/win/data/logs/hadoop-hadoop-namenode-hadoop110.out hadoop110: starting datanode, logging to /mnt/win/data/logs/hadoop-hadoop-datanode-hadoop110.out hadoop101: starting datanode, logging to /mnt/win/data/logs/hadoop-hadoop-datanode-hadoop101.out hadoop106: starting datanode, logging to /mnt/win/data/logs/hadoop-hadoop-datanode-hadoop106.out hadoop104: starting datanode, logging to /mnt/win/data/logs/hadoop-hadoop-datanode-hadoop104.out hadoop102: starting datanode, logging to /mnt/win/data/logs/hadoop-hadoop-datanode-hadoop102.out hadoop105: starting datanode, logging to /mnt/win/data/logs/hadoop-hadoop-datanode-hadoop105.out hadoop109: starting datanode, logging to /mnt/win/data/logs/hadoop-hadoop-datanode-hadoop109.out hadoop103: starting datanode, logging to /mnt/win/data/logs/hadoop-hadoop-datanode-hadoop103.out hadoop108: starting datanode, logging to /mnt/win/data/logs/hadoop-hadoop-datanode-hadoop108.out hadoop107: starting datanode, logging to /mnt/win/data/logs/hadoop-hadoop-datanode-hadoop107.out hadoop110: starting secondarynamenode, logging to /mnt/win/data/logs/hadoop-hadoop-secondarynamenode-hadoop110.out starting jobtracker, logging to /mnt/win/data/logs/hadoop-hadoop-jobtracker-hadoop110.out hadoop103: starting tasktracker, logging to /mnt/win/data/logs/hadoop-hadoop-tasktracker-hadoop103.out hadoop109: starting tasktracker, logging to /mnt/win/data/logs/hadoop-hadoop-tasktracker-hadoop109.out hadoop106: starting tasktracker, logging to /mnt/win/data/logs/hadoop-hadoop-tasktracker-hadoop106.out hadoop110: starting tasktracker, logging to /mnt/win/data/logs/hadoop-hadoop-tasktracker-hadoop110.out hadoop104: starting tasktracker, logging to /mnt/win/data/logs/hadoop-hadoop-tasktracker-hadoop104.out hadoop107: starting tasktracker, logging to /mnt/win/data/logs/hadoop-hadoop-tasktracker-hadoop107.out hadoop108: starting tasktracker, logging to /mnt/win/data/logs/hadoop-hadoop-tasktracker-hadoop108.out hadoop105: starting tasktracker, logging to /mnt/win/data/logs/hadoop-hadoop-tasktracker-hadoop105.out hadoop102: starting tasktracker, logging to /mnt/win/data/logs/hadoop-hadoop-tasktracker-hadoop102.out hadoop101: starting tasktracker, logging to /mnt/win/data/logs/hadoop-hadoop-tasktracker-hadoop101.out
- For completeness, you should know that the command for taking the cluster down in stop-all.sh, but, very likely, you will never have to use it.
- Just to make sure, connect to hadoop102 to verify that it, too, is running some hadoop processes:
hadoop@hadoop110:~$ hadoop102 hadoop@hadoop102:~$ jps 18571 TaskTracker 18749 Jps 18447 DataNode
Basic Hadoop Admin Commands
(Taken from Hadoop Wiki's Getting Started with Hadoop):
The ~/hadoop/bin directory contains some scripts used to launch Hadoop DFS and Hadoop Map/Reduce daemons. These are:
- start-all.sh - Starts all Hadoop daemons, the namenode, datanodes, the jobtracker and tasktrackers.
- stop-all.sh - Stops all Hadoop daemons.
- start-mapred.sh - Starts the Hadoop Map/Reduce daemons, the jobtracker and tasktrackers.
- stop-mapred.sh - Stops the Hadoop Map/Reduce daemons.
- start-dfs.sh - Starts the Hadoop DFS daemons, the namenode and datanodes.
- stop-dfs.sh - Stops the Hadoop DFS daemons.
Running the Map-Reduce WordCount Program
- We'll take the example directly from Michael Noll's Tutorial (1-node cluster tutorial), and count the frequency of words occuring in James Joyce's Ulysses.
Creating a working directory for your data
- If you haven't done so, ssh to hadoop10x (any of the hadoop machines) as user hadoop and create a directory for yourself. We'll use dft as an example in this tutorial.
hadoop@hadoop102:~$ cd hadoop@hadoop102:~$ cd 352 hadoop@hadoop102:~/352$ mkdir dft (replace dft by your favorite identifier) hadoop@hadoop102:~/352$ cd dft
Creating/Downloading Data Locally
In order to process a text file with hadoop, you first need to download the file to a personal directory in the hadoop account, then copy it to the Hadoop File System (HDFS) so that the hadoop namenode and datanodes can share it.
Creating a local copy for User Hadoop
- Download a copy of James Joyce's Ulysses:
hadoop@hadoop102:~/352/dft$ 'wget http://www.gutenberg.org/files/4300/4300-0.txt hadoop@hadoop102:~/352/dft$ mv 4300-0.txt 4300.txt hadoop@hadoop102:~/352/dft$ head -50 4300.txt
- Verify that you read:
"Stately, plump Buck Mulligan came from the stairhead, bearing a bowl of lather on which a mirror and a razor lay crossed."
| ||
Copy Data File to HDFS
- Create a dft (or whatever your identifier is) directory in the Hadoop File System (HDFS) and copy the data file 4300.txt to it:
hadoop@hadoop102:~/352/dft$ cd .. hadoop@hadoop102:~/352/dft$ hadoop fs -mkdir dft (makes a dft directory in the cloud) hadoop@hadoop102:~/352/dft$ hadoop fs -copyFromLocal 4300.txt dft hadoop@hadoop102:~/352/dft$ hadoop fs -ls hadoop@hadoop102:~/352/dft$ hadoop fs -ls dft
- Verify that your directory is now in the Hadoop File System, as indicated above, and that it contains the 4300.txt file.
WordCount.java Map-Reduce Program
- Hadoop comes with a set of demonstration programs. They are located in ~/hadoop/src/examples/org/apache/hadoop/examples/. One of them is WordCount.java which will automatically compute the word frequency of all text files found in the HDFS directory you ask it to process.
- The program has several sections:
The map section
public static class MapClass extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word, one);
}
}
}
The Map class takes lines of text that are fed to it (the text files are automatically broken down into lines by Hadoop--No need for us to do it!), and breaks them into words. Outputs a datagram for each word that is a ( String, int ) tuple, of the form ( "some-word", 1), since each tuple corresponds to the first occurence of each word, so the initial frequency for each word is 1.
The reduce section
public static class Reduce extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
int sum = 0;
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
The reduce section gets collections of datagrams of the form [( word, n1 ), (word, n2)...] where all the words are the same, but with different numbers. These collections are the result of a sorting process that is integral to Hadoop and which gathers all the datagrams with the same word together. The reduce process gathers the datagrams inside a datanode, and also gathers datagrams from the different datanodes into a final collection of datagrams where all the words are now unique, with their total frequency (number of occurrences).
The map-reduce organization section
conf.setMapperClass(MapClass.class);
conf.setCombinerClass(Reduce.class);
conf.setReducerClass(Reduce.class);
Here we see that the combining stage and the reduce stage are implemented by the same reduce class, which makes sense, since the number of occurrences of a word as generated on several datanodes is just the sum of the numbers of occurrences.
The datagram definitions
// the keys are words (strings)
conf.setOutputKeyClass(Text.class);
// the values are counts (ints)
conf.setOutputValueClass(IntWritable.class);
As the documentation indicates, the datagrams are of the form (String, int).
Running WordCound
Run the wordcount java program from the example directory in hadoop:
hadoop@hadoop102:~/352/dft$ hadoop jar /Users/hadoop/hadoop/hadoop-0.19.2-examples.jar wordcount dft dft-output
The program takes about 40 seconds to execute on the cluster. The output generated will look something like this:
10/12/16 11:03:52 INFO mapred.FileInputFormat: Total input paths to process : 1
10/12/16 11:03:52 INFO mapred.JobClient: Running job: job_201012161018_0003
10/12/16 11:03:53 INFO mapred.JobClient: map 0% reduce 0%
10/12/16 11:03:57 INFO mapred.JobClient: map 1% reduce 0%
10/12/16 11:04:02 INFO mapred.JobClient: map 10% reduce 0%
10/12/16 11:04:07 INFO mapred.JobClient: map 21% reduce 0%
10/12/16 11:04:11 INFO mapred.JobClient: map 31% reduce 0%
10/12/16 11:04:15 INFO mapred.JobClient: map 41% reduce 0%
10/12/16 11:04:19 INFO mapred.JobClient: map 52% reduce 0%
10/12/16 11:04:24 INFO mapred.JobClient: map 65% reduce 0%
10/12/16 11:04:29 INFO mapred.JobClient: map 78% reduce 0%
10/12/16 11:04:33 INFO mapred.JobClient: map 89% reduce 0%
10/12/16 11:04:38 INFO mapred.JobClient: map 100% reduce 0%
10/12/16 11:04:39 INFO mapred.JobClient: Job complete: job_201012161018_0003
10/12/16 11:04:39 INFO mapred.JobClient: Counters: 8
10/12/16 11:04:39 INFO mapred.JobClient: File Systems
10/12/16 11:04:39 INFO mapred.JobClient: HDFS bytes read=3529994
10/12/16 11:04:39 INFO mapred.JobClient: HDFS bytes written=887496
10/12/16 11:04:39 INFO mapred.JobClient: Job Counters
10/12/16 11:04:39 INFO mapred.JobClient: Rack-local map tasks=685
10/12/16 11:04:39 INFO mapred.JobClient: Launched map tasks=863
10/12/16 11:04:39 INFO mapred.JobClient: Data-local map tasks=178
10/12/16 11:04:39 INFO mapred.JobClient: Map-Reduce Framework
10/12/16 11:04:39 INFO mapred.JobClient: Map input records=33055
10/12/16 11:04:39 INFO mapred.JobClient: Map input bytes=1573044
10/12/16 11:04:39 INFO mapred.JobClient: Map output records=267975
If you need to kill your job...
- If for any reason your job is not completing correctly (may be some "Too many fetch failure" errors?), locate your job Id (job_201012161018_0003 in our case), and kill it:
hadoop job -kill job_201012161018_0003
Cluster Performance
- Ganglia offers an interesting first view of how the cluster reacts to this load:
Getting the Output
- Let's take a look at the output of the program:
hadoop@hadoop102:~/352/dft$ hadoop dfs -ls Found x items drwxr-xr-x - hadoop supergroup 0 2010-03-16 11:36 /user/hadoop/dft drwxr-xr-x - hadoop supergroup 0 2010-03-16 11:41 /user/hadoop/dft-output
- Verify that a new directory with -output at the end of your identifier has been created.
- Look at the contents of this output directory:
hadoop@hadoop102:~/352/dft$ hadoop dfs -ls dft-output Found 2 items drwxr-xr-x - hadoop supergroup 0 2010-03-16 11:40 /user/hadoop/dft-output/_logs -rw-r--r-- 2 hadoop supergroup 527522 2010-03-16 11:41 /user/hadoop/dft-output/part-00000
- Important Node: depending on the settings stored in your hadoop-site.xml file, you may have more than one output file, and they may be compressed (gzipped).
- Finally, let's take a look at the output
hadoop@hadoop102:~/352/dft$ hadoop dfs -cat dft-output/part-00000 | less
- And we get
"Come 1 "Defects," 1 "I 1 "Information 1 "J" 1 "Plain 2 "Project 5 . . . zest. 1 zigzag 2 zigzagging 1 zigzags, 1 zivio, 1 zmellz 1 zodiac 1 zodiac. 1 zodiacal 2 zoe)_ 1 zones: 1 zoo. 1 zoological 1 zouave's 1 zrads, 2 zrads. 1
- If we wanted to copy the output file to our local storage (remember, the output is automatically created in the HDFS world, and we have to copy the data from there to our file system to work on it):
hadoop@hadoop102:~$ cd ~/352/dft hadoop@hadoop102:~/352/dft$ hadoop dfs -copyToLocal dft-output/part-00000 . hadoop@hadoop102:~/352/dft$ ls 4300.txt part-00000
- To remove the output directory (recursively going through directories if necessary):
hadoop@hadoop102:~/352/dft$ hadoop dfs -rmr dft-output
- Note that it the hadoop program WordCount will not run another time if the output directory exists. It always wants to create a new one, so we'll have to remove the output directory regularly after having saved the output of each job.
| ||
Analyzing the Hadoop Logs
Hadoop keeps track of several logs of the execution of your programs. They are located in the logs sub-directory in the hadoop directory. Some of the same logs are also available from the hadoop Web GUI: http://hadoop110.dyndns.org:50030/jobtracker.jsp
Accessing Logs through the Command Line
Here is an example of the logs in ~/hadoop/logs
cd cd /mnt/win/data/logs/ ls -ltr -rwxrwxrwx 1 root root 15862 Jan 6 09:58 job_201012161155_0004_conf.xml drwxrwxrwx 1 root root 4096 Jan 6 09:58 history drwxrwxrwx 1 root root 368 Jan 6 09:58 userlogs cd history ls -ltr -rwxrwxrwx 1 root root 15862 Jan 6 09:58 hadoop110_1292518522985_job_201012161155_0004_conf.xml -rwxrwxrwx 1 root root 102324 Jan 6 10:02 hadoop110_1292518522985_job_201012161155_0004_hadoop_wordcount
The last log listed in the history directory is interesting. It contains the start and end time of all the tasks that ran during the execution of our Hadoop program.
It contains several different types of lines:
- Lines starting with "Job", that indicate that refer to the job, listing information about the job (priority, submit time, configuration, number of map tasks, number of reduce tasks, etc...
Job JOBID="job_201004011119_0025" LAUNCH_TIME="1270509980407" TOTAL_MAPS="12" TOTAL_REDUCES="1" JOB_STATUS="PREP"
- Lines starting with "Task" referring to the creation or completion of Map or Reduce tasks, indicating which host they start on, and which split they work on. On completion, all the counters associated with the task are listed.
Task TASKID="task_201012161155_0004_m_000000" TASK_TYPE="MAP" START_TIME="1294325917422"\ SPLITS="/default-rack/hadoop103,/default-rack/hadoop109,/default-rack/hadoop102" MapAttempt TASK_TYPE="MAP" TASKID="task_201012161155_0004_m_000000" \ TASK_ATTEMPT_ID="attempt_201012161155_0004_m_000000_0" TASK_STATUS="SUCCESS" FINISH_TIME="1294325918358" HOSTNAME="/default-rack/hadoop110" ... [(MAP_OUTPUT_BYTES)(Map output bytes)(66441)][(MAP_INPUT_BYTES)(Map input bytes)(39285)] [ (COMBINE_INPUT_RECORDS)(Combine input records)(7022)][(MAP_OUTPUT_RECORDS) (Map output records)(7022)]}" .
- Lines starting with "MapAttempt", reporting mostly status update, except if they contain the keywords SUCCESS and/or FINISH_TIME, indicating that the task has completed. The final time when the task finished is included in this line.
- Lines starting with "ReduceAttempt", similar to the MapAttempt tasks, report on the intermediary status of the tasks, and when the keyword SUCCESS is included, the finish time of the sort and shuffle phases will also be included.
ReduceAttempt TASK_TYPE="REDUCE" TASKID="task_201012161155_0004_r_000005" TASK_ATTEMPT_ID="attempt_201012161155_0004_r_000005_0" START_TIME="1294325924281" TRACKER_NAME="tracker_hadoop102:localhost/127\.0\.0\.1:40971" HTTP_PORT="50060" .
Generating Task Timelines
- Below is an example of a Task Timeline:
- See Generating Task Timelines for a series of steps that will allow you to generate Task Timelines.
Running Your Own Version of WordCount.java
In this section you will get a copy of the wordcount program in your directory, modify it, compile it, jar it, and run it on the Hadoop Cluster.
- Get a copy of the example WordCount.java program that comes with Hadoop:
cd cd 352/dft (use your own directory) cp /Users/hadoop/hadoop/src/examples/org/apache/hadoop/examples/WordCount.java .
- Create a directory where to store the java classes:
mkdir wordcount_classes
- Edit WordCount.java and change the package name to package org.myorg; That will be the extent of our modification for this time.
- Compile the new program:
javac -classpath /Users/hadoop/hadoop/hadoop-0.19.2-core.jar -d wordcount_classes WordCount.java
- create a java archive (jar) file containing the executables:
jar -cvf wordcount.jar -C wordcount_classes/ .
- Remove the output directory from the last run:
hadoop dfs -rmr dft-output
- Run your program on Hadoop:
hadoop jar /Users/hadoop/352/dft/wordcount.jar org.myorg.WordCount dft dft-output
- Check the results
hadoop dfs -ls dft-output hadoop dfs -cat dft-output/part-00000 " 34 "'A 1 "'About 1 "'Absolute 1 "'Ah!' 2 "'Ah, 2 ...
Moment of Truth: Compare 5-PC Hadoop cluster to 1 Linux PC
- The moment of truth has arrived. How is Hadoop fairing against a regular PC running Linux and computing the word frequencies of the contents of Ulysses?
- Step 1: time the execution of WordCount.java on hadoop.
hadoop dfs -rmr dft-output time hadoop jar /home/hadoop/hadoop/hadoop-0.19.2-examples.jar wordcount dft dft-output
- Observe and record the total execution time (real)
- To compute the word frequency of a text with Linux, we can use Linux commands and pipes, as follows:
cat 4300.txt | tr ' ' ' [ret] ' | sort | uniq -c [ret]
- where [ret] indicates that you should press the return/enter key. The explanation of what is going on is nicely presented at http://dsl.org, in the Text Concordance recipe.
- Try the command and verify that you get the word frequency of 4300.txt:
2457 1 _ 1 _........................ 7 - 3 -- 2 --_... 5 --... 1 --............ 3 -- 1 ?... 6 ... 1 ...? . . . 1 Zouave 1 zouave's 2 zrads, 1 zrads. 1 Zrads 1 Zulu 1 Zulus 1 _Zut!
- Observe the real execution time.
| ||
Instead of map-reducing just one text file, make the cluster work on several books at once. Michael Noll points to several downloadable book in his tutorial on setting up Hadoop on an Ubuntu cluster:
Download them all and time the execution time of hadoop on all these books against a single Linux PC.
| ||
Counters
- Counters are a nice way of keeping track of events taking place during a mapReduce job. Hadoop will merge the values of all the counters generated by the different tasks and will display the totals at the end of the job.
- The program sections below illustrate how we can create two counters to count the number of times the map function is called, and the number of times the reduce function is called.
- All we need to do is to create a new enum set in the mapReduce class, and to ask the reporter to increment the counters.
.
public class WordCount extends Configured implements Tool {
/**
* define my own counters
*/
enum MyCounters {
MAPFUNCTIONCALLS,
REDUCEFUNCTIONCALLS
}
/**
* Counts the words in each line.
* For each line of input, break the line into words and emit them as
* (<b>word</b>, <b>1</b>).
*/
public static class MapClass extends MapReduceBase
implements Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
// increment task counter
reporter.incrCounter( MyCounters.MAPFUNCTIONCALLS, 1 );
String line = value.toString();
StringTokenizer itr = new StringTokenizer(line);
while (itr.hasMoreTokens()) {
word.set(itr.nextToken());
output.collect(word, one);
}
}
}
/**
* A reducer class that just emits the sum of the input values.
*/
public static class Reduce extends MapReduceBase
implements Reducer<Text, IntWritable, Text, IntWritable> {
public void reduce(Text key, Iterator<IntWritable> values,
OutputCollector<Text, IntWritable> output,
Reporter reporter) throws IOException {
int sum = 0;
// increment reduce counter
reporter.incrCounter( MyCounters.REDUCEFUNCTIONCALLS, 1 );
while (values.hasNext()) {
sum += values.next().get();
}
output.collect(key, new IntWritable(sum));
}
}
...
- When we run the program we get, for example:
10/03/31 20:55:24 INFO mapred.FileInputFormat: Total input paths to process : 6 10/03/31 20:55:24 INFO mapred.JobClient: Running job: job_201003312045_0006 10/03/31 20:55:25 INFO mapred.JobClient: map 0% reduce 0% 10/03/31 20:55:28 INFO mapred.JobClient: map 7% reduce 0% 10/03/31 20:55:29 INFO mapred.JobClient: map 14% reduce 0% 10/03/31 20:55:31 INFO mapred.JobClient: map 42% reduce 0% 10/03/31 20:55:32 INFO mapred.JobClient: map 57% reduce 0% 10/03/31 20:55:33 INFO mapred.JobClient: map 85% reduce 0% 10/03/31 20:55:34 INFO mapred.JobClient: map 100% reduce 0% 10/03/31 20:55:43 INFO mapred.JobClient: map 100% reduce 100% 10/03/31 20:55:44 INFO mapred.JobClient: Job complete: job_201003312045_0006 10/03/31 20:55:44 INFO mapred.JobClient: Counters: 19 10/03/31 20:55:44 INFO mapred.JobClient: File Systems 10/03/31 20:55:44 INFO mapred.JobClient: HDFS bytes read=5536289 10/03/31 20:55:44 INFO mapred.JobClient: HDFS bytes written=1217928 10/03/31 20:55:44 INFO mapred.JobClient: Local bytes read=2950830 10/03/31 20:55:44 INFO mapred.JobClient: Local bytes written=5902130 10/03/31 20:55:44 INFO mapred.JobClient: Job Counters 10/03/31 20:55:44 INFO mapred.JobClient: Launched reduce tasks=1 10/03/31 20:55:44 INFO mapred.JobClient: Rack-local map tasks=2 10/03/31 20:55:44 INFO mapred.JobClient: Launched map tasks=14 10/03/31 20:55:44 INFO mapred.JobClient: Data-local map tasks=12 10/03/31 20:55:44 INFO mapred.JobClient: org.myorg.WordCount$MyCounters 10/03/31 20:55:44 INFO mapred.JobClient: REDUCEFUNCTIONCALLS=315743 10/03/31 20:55:44 INFO mapred.JobClient: MAPFUNCTIONCALLS=104996 10/03/31 20:55:44 INFO mapred.JobClient: Map-Reduce Framework 10/03/31 20:55:44 INFO mapred.JobClient: Reduce input groups=110260 10/03/31 20:55:44 INFO mapred.JobClient: Combine output records=205483 10/03/31 20:55:44 INFO mapred.JobClient: Map input records=104996 10/03/31 20:55:44 INFO mapred.JobClient: Reduce output records=110260 10/03/31 20:55:44 INFO mapred.JobClient: Map output bytes=9041295 10/03/31 20:55:44 INFO mapred.JobClient: Map input bytes=5521325 10/03/31 20:55:44 INFO mapred.JobClient: Combine input records=923139 10/03/31 20:55:44 INFO mapred.JobClient: Map output records=923139 10/03/31 20:55:44 INFO mapred.JobClient: Reduce input records=205483
| ||