Tutorial: Running MPI Programs on Hadoop Cluster
--D. Thiebaut (talk) 13:57, 15 October 2013 (EDT)
Revised: --D. Thiebaut (talk) 12:02, 15 March 2017 (EDT)
Contents
Setup Password-Less ssh to the Hadoop cluster
Setup Aliases
- This section is not required, but will save you a lot of typing.
- Edit your .bashrc file
emacs -nw ~/.bashrc
- and add these 3 lines at the end, where you will replace yourusername by your actual user name.
alias hadoop02='ssh -Y yourusername@hadoop02.dyndns.org' alias hadoop03='ssh -Y yourusername@hadoop03.dyndns.org' alias hadoop04='ssh -Y yourusername@hadoop04.dyndns.org'
- Then tell bash to re-read the .bashrc file, since we just modified it. This way bash will learn the 3 new aliases we've defined.
source ~/.bashrc
- Now you should be able to connect to the servers using their name only. For example:
hadoop02
- this should connect you to hadoop02 directly.
- Exit from hadoop02, and try the same thing for hadoop03, and hadoop04.
Test MPI
MPI should already be installed, and your account ready to access it. To verify this, create a simple MPI "Hello World!" program, compile it, and run it.
/* C Example */ #include <stdio.h> #include <mpi.h> int main (argc, argv) int argc; char *argv[]; { int rank, size; MPI_Init (&argc, &argv); /* starts MPI */ MPI_Comm_rank (MPI_COMM_WORLD, &rank); /* get current process id */ MPI_Comm_size (MPI_COMM_WORLD, &size); /* get number of processes */ printf( "Hello world from process %d of %d\n", rank, size ); MPI_Finalize(); return 0; }
- Compile and Run
mpicc -o hello helloWorld.c mpirun -np 2 ./hello Hello world from process 0 of 2 Hello world from process 1 of 2
- If you see the two lines starting with "Hello world" on your screen, MPI was successfully installed on your system!
Configuration for Running On Multiple Servers