Tutorial: Running MPI Programs on Hadoop Cluster
--D. Thiebaut (talk) 13:57, 15 October 2013 (EDT)
Revised: --D. Thiebaut (talk) 12:02, 15 March 2017 (EDT)
Contents
Change your Password
- Login to hadoop01 with the accounts provided to you, and change your temporary password:
passwd
- You should now be all set.
Setup Password-Less ssh to the Hadoop cluster
Setup Aliases
- This section is not required, but will save you a lot of typing.
- Edit your .bashrc file
emacs -nw ~/.bashrc
- and add these 3 lines at the end, where you will replace yourusername by your actual user name.
alias hadoop02='ssh -Y yourusername@hadoop02.dyndns.org' alias hadoop03='ssh -Y yourusername@hadoop03.dyndns.org' alias hadoop04='ssh -Y yourusername@hadoop04.dyndns.org'
- Then tell bash to re-read the .bashrc file, since we just modified it. This way bash will learn the 3 new aliases we've defined.
source ~/.bashrc
- Now you should be able to connect to the servers using their name only. For example:
hadoop02
- this should connect you to hadoop02 directly.
- Exit from hadoop02, and try the same thing for hadoop03, and hadoop04.
- Note, if you like even shorter commands, you could modify the .bashrc file and make the aliases h2, h3, and h4... Up to you.
Create HelloWorld Program & Test MPI
MPI should already be installed, and your account ready to access it. To verify this, you will create an MPI directory and create a simple MPI "Hello World!" program in it. You will then compile it, and run it as an MPI application.
- First create a directory called mpi
cd mkdir mpi
- Then cd to this directory and create the following C program:
cd mpi emacs -nw helloWorld.c
- Here's the code:
// hello.c // A simple hello world MPI program that can be run // on any number of computers #include <mpi.h> #include <stdio.h> int main( int argc, char *argv[] ) { int rank, size, nameLen; char hostName[80]; MPI_Init( &argc, &argv); /* start MPI */ MPI_Comm_rank( MPI_COMM_WORLD, &rank ); /* get current process Id */ MPI_Comm_size( MPI_COMM_WORLD, &size ); /* get # of processes */ MPI_Get_processor_name( hostName, &nameLen); printf( "Hello from Process %d of %d on Host %s\n", rank, size, hostName ); MPI_Finalize(); return 0; }
Compile & Run on 1 Server
- Compile and Run the program:
mpicc -o hello helloWorld.c mpirun -np 2 ./hello Hello from Process 1 of 2 on Host Hadoop01 Hello from Process 0 of 2 on Host Hadoop01
- If you see the two lines starting with "Hello world" on your screen, MPI was successfully installed on your system!
Configuration for Running On Multiple Servers