CSC352 Notes 2013

From dftwiki3
Revision as of 20:15, 15 December 2009 by Thiebaut (talk | contribs) (Setting up Eclipse for Hadoop)
Jump to: navigation, search

Verify configuration of Hadoop

cd 
cd hadoop/conf
cat hadoop-site.xml

Yields

~/hadoop/conf$: cat hadoop-site.xml

<?xml version="1.0"?>
<?xml-stylesheet type="text/xsl" href="configuration.xsl"?>

<!-- Put site-specific property overrides in this file. -->

<configuration>

  <property>
    <name>fs.default.name</name>
    <value>hdfs://localhost:9100</value>
  </property>

  <property>
    <name>mapred.job.tracker</name>
    <value>localhost:9101</value>
  </property>

  <property>
    <name>dfs.data.dir</name>
    <value>/Users/thiebaut/hdfs/data</value>
  </property>

  <property>
    <name>dfs.name.dir</name>
    <value>/Users/thiebaut/hdfs/name</value>
  </property>

  <property>
    <name>dfs.replication</name>
    <value>1</value>
  </property> 

</configuration>

Setting up Eclipse for Hadoop

 start-all.sh

Map-Reduce Locations

  • setup eclipse
http://v-lad.org/Tutorials/Hadoop/17%20-%20set%20up%20hadoop%20location%20in%20the%20eclipse.html
    • localhost
    • Map/Reduce Master: localhost, 9101
    • DFS Master: user M/R Master host, localhost, 9100
    • user name: hadoop-thiebaut
    • SOCKS proxy: (not checked) host, 1080

DFS Locations

  • Open DFS Locations
    • localhost
      • (2)
        • tmp(1)
          • hadoop-thiebaut (1)
            • mapred (1)
              • system (0)
        • user(1)
          • thiebaut (2)
            • hello.txt
            • readme.txt
  • make In directory:
hadoop fs -mkdir In

Create a new project with Eclipse

Create a project as explained in http://v-lad.org/Tutorials/Hadoop/23%20-%20create%20the%20project.html

Project

  • Right-click on the blank space in the Project Explorer window and select New -> Project.. to create a new project.
  • Select Map/Reduce Project from the list of project types as shown in the image below.
  • Press the Next button.
  • Project Name: HadoopTest
  • Use default location
  • click on configure hadoop location, browse, and select /Users/thiebaut/hadoop-0.19.1 (or whatever it is)
  • Ok
  • Finish

Map/Reduce driver class

  • Right-click on the newly created Hadoop project in the Project Explorer tab and select New -> Other from the context menu.
  • Go to the Map/Reduce folder, select MapReduceDriver, then press the Next button as shown in the image below.
  • When the MapReduce Driver wizard appears, enter TestDriver in the Name field and press the Finish button. This will create the skeleton code for the MapReduce Driver.
  • Finish
  • Unfortunately the Hadoop plug-in for Eclipse is slightly out of step with the recent Hadoop API, so we need to edit the driver code a bit.
Find the following two lines in the source code and comment them out:
    conf.setInputPath(new Path("src"));
    conf.setOutputPath(new Path("out"));
Enter the following code immediatly after the two lines you just commented out (see image below):
    conf.setInputFormat(TextInputFormat.class);
    conf.setOutputFormat(TextOutputFormat.class);

    FileInputFormat.setInputPaths(conf, new Path("In"));
    FileOutputFormat.setOutputPath(conf, new Path("Out"));
  • After you have changed the code, you will see the new lines marked as incorrect by Eclipse. Click on the error icon for each line and select Eclipse's suggestion to import the missing class. You need to import the following classes: TextInputFormat, TextOutputFormat, FileInputFormat, FileOutputFormat.
  • After the missing classes are imported you are ready to run the project.

Notes on doing example in Yahoo Tutorial, Module 2

http://developer.yahoo.com/hadoop/tutorial/module2.html

   cd ../hadoop/examples/
   cat > HDFSHelloWorld.java
   mkdir hello_classes
   javac -classpath /Users/thiebaut/hadoop/hadoop-0.19.2-core.jar -d hello_classes HDFSHelloWorld.java 
   emacs HDFSHelloWorld.java -nw
   javac -classpath /Users/thiebaut/hadoop/hadoop-0.19.2-core.jar -d hello_classes HDFSHelloWorld.java 
   ls
   ls hello_classes/
   jar -cvf helloworld.jar -C hello_classes/ .
   ls
   ls -ltr
   find . -name "*" -print
   hadoop jar helloworld.jar HDFSHelloWorld

   Hello, world!