Difference between revisions of "Tutorial: Running MPI Programs on Hadoop Cluster"

From dftwiki3
Jump to: navigation, search
(Setup Aliases)
(Test MPI)
Line 81: Line 81:
 
<br />
 
<br />
 
::<source lang="C">
 
::<source lang="C">
/* C Example */
+
// hello.c
 +
// A simple hello world MPI program that can be run
 +
// on any number of computers
 +
#include <mpi.h>
 
#include <stdio.h>
 
#include <stdio.h>
#include <mpi.h>
 
  
 +
int main( int argc, char *argv[] ) {
 +
  int rank, size, nameLen;
 +
  char hostName[80];
  
int main (argc, argv)
+
  MPI_Init( &argc, &argv);  /* start MPI */
    int argc;
+
  MPI_Comm_rank( MPI_COMM_WORLD, &rank );   /* get current process Id */
    char *argv[];
+
  MPI_Comm_size( MPI_COMM_WORLD, &size );  /* get # of processes */
{
+
   MPI_Get_processor_name( hostName, &nameLen);
   int rank, size;
 
  
  MPI_Init (&argc, &argv);      /* starts MPI */
+
   printf( "Hello from Process %d of %d on Host %s\n", rank, size, hostName );
  MPI_Comm_rank (MPI_COMM_WORLD, &rank);        /* get current process id */
 
  MPI_Comm_size (MPI_COMM_WORLD, &size);        /* get number of processes */
 
   printf( "Hello world from process %d of %d\n", rank, size );
 
 
   MPI_Finalize();
 
   MPI_Finalize();
 
   return 0;
 
   return 0;
Line 104: Line 105:
 
* Compile and Run
 
* Compile and Run
 
<br />
 
<br />
 
+
 
  '''mpicc -o hello helloWorld.c'''
 
  '''mpicc -o hello helloWorld.c'''
 
  '''mpirun -np 2 ./hello'''
 
  '''mpirun -np 2 ./hello'''
  Hello world from process 0 of 2
+
  Hello from Process 1 of 2 on Host Hadoop01
  Hello world from process 1 of 2
+
  Hello from Process 0 of 2 on Host Hadoop01
 
+
 
* If you see the two lines starting with "Hello world" on your screen, MPI was successfully installed on your system!
 
* If you see the two lines starting with "Hello world" on your screen, MPI was successfully installed on your system!
  

Revision as of 12:05, 15 March 2017

--D. Thiebaut (talk) 13:57, 15 October 2013 (EDT)
Revised: --D. Thiebaut (talk) 12:02, 15 March 2017 (EDT)




Setup Password-Less ssh to the Hadoop cluster



This section is only visible to computers located at Smith College


Setup Aliases


  • This section is not required, but will save you a lot of typing.
  • Edit your .bashrc file
 emacs -nw ~/.bashrc 

  • and add these 3 lines at the end, where you will replace yourusername by your actual user name.
alias hadoop02='ssh -Y yourusername@hadoop02.dyndns.org'
alias hadoop03='ssh -Y yourusername@hadoop03.dyndns.org'
alias hadoop04='ssh -Y yourusername@hadoop04.dyndns.org'

  • Then tell bash to re-read the .bashrc file, since we just modified it. This way bash will learn the 3 new aliases we've defined.
source ~/.bashrc

  • Now you should be able to connect to the servers using their name only. For example:
hadoop02      
this should connect you to hadoop02 directly.
  • Exit from hadoop02, and try the same thing for hadoop03, and hadoop04.
  • Note, if you like even shorter commands, you could modify the .bashrc file and make the aliases h2, h3, and h4... Up to you.


Test MPI


MPI should already be installed, and your account ready to access it. To verify this, create a simple MPI "Hello World!" program, compile it, and run it.

// hello.c
// A simple hello world MPI program that can be run 
// on any number of computers
#include <mpi.h>
#include <stdio.h>

int main( int argc, char *argv[] ) {
  int rank, size, nameLen;
  char hostName[80];

  MPI_Init( &argc, &argv);   /* start MPI */
  MPI_Comm_rank( MPI_COMM_WORLD, &rank );   /* get current process Id */
  MPI_Comm_size( MPI_COMM_WORLD, &size );   /* get # of processes */
  MPI_Get_processor_name( hostName, &nameLen);

  printf( "Hello from Process %d of %d on Host %s\n", rank, size, hostName );
  MPI_Finalize();
  return 0;
}


  • Compile and Run


mpicc -o hello helloWorld.c
mpirun -np 2 ./hello
Hello from Process 1 of 2 on Host Hadoop01
Hello from Process 0 of 2 on Host Hadoop01

  • If you see the two lines starting with "Hello world" on your screen, MPI was successfully installed on your system!


Configuration for Running On Multiple Servers


This section is only visible to computers located at Smith College