Tutorial: Computing Pi on an AWS MPI-Cluster

From dftwiki3
Revision as of 11:08, 16 March 2017 by Thiebaut (talk | contribs) (Types of AMIs)
Jump to: navigation, search

--D. Thiebaut (talk) 22:55, 28 October 2013 (EDT)


This tutorial is the continuation of the tutorial Creating and MPI-Cluster on AWS. Make sure you go through it first. The tutorial also assumes that you have available credentials to access the Amazon AWS system.


Reference Material


We have used

for inspiration and adapted them in the present page to setup a cluster for the CSC352 class. Our setup uses MIT's Star Cluster package.


An MPI C-Program for Approximating Pi


The program below works for any number of processes on an MPI cluster.

// piN.c
// D. Thiebaut
// Computes Pi using N processes under MPI
// 
// To compile and run:
// mpicc -o piN piN.cpp
// time mpirun -np 2 ./piN 100000000
//
// Output
// Process 1 of 2 started on beowulf2.  N= 50000000
// Process 0 of 2 started on beowulf2.  N= 50000000
//  50000000 iterations: Pi = 3.14159
//
//  real0m1.251s
//  user0m1.240s
//  sys0m0.000s
//

#include "mpi.h"
#include <stdio.h>
#include <stdlib.h>

#define MANAGER 0


//--------------------------------------------------------------------
//                         P R O T O T Y P E S
//--------------------------------------------------------------------
void doManager( int, int );
void doWorker( );

//--------------------------------------------------------------------
//                           M  A  I  N
//--------------------------------------------------------------------
int main(int argc, char *argv[]) {
  int myId, noProcs, nameLen;
  char procName[MPI_MAX_PROCESSOR_NAME];
  int n;

  if ( argc<2 ) {
    printf( "Syntax: mpirun -np noProcs piN n\n" );
    return 1;
  }

  // get the number of samples to generate
  n = atoi( argv[1] );
  
  //--- start MPI ---
  MPI_Init( &argc, &argv);
  MPI_Comm_rank( MPI_COMM_WORLD, &myId );
  MPI_Comm_size( MPI_COMM_WORLD, &noProcs );
  MPI_Get_processor_name( procName, &nameLen );
  
  //--- display which process we are, and how many there are ---
  printf( "Process %d of %d started on %s. n = %d\n", 
          myId,  noProcs, procName, n );

  //--- farm out the work: 1 manager, several workers ---
  if ( myId == MANAGER ) 
    doManager( n, noProcs );
  else
    doWorker( );
  
  //--- close up MPI ---
  MPI_Finalize();
  
  return 0;
}

//--------------------------------------------------------------------
// The function to be evaluated
//--------------------------------------------------------------------
double f( double x ) {
  return 4.0 / ( 1.0 + x*x );
}

//--------------------------------------------------------------------
// The manager's main work function.  Note that the function
// can and should be made more efficient (and faster) by sending
// an array of 3 ints rather than 3 separate ints to each worker.  
// However the current method is explicit and better highlights the
// communication pattern between Manager and Workers.
//--------------------------------------------------------------------
void doManager( int n, int noProcs ) {
  double sum0 = 0, sum1;
  double deltaX = 1.0/n;
  int i, begin, end;

  MPI_Status status;

  //--- first send n and bounds of series to all workers ---
  end = n/noProcs;
  for ( i=1; i<noProcs; i++ ) {
    begin = end;
    end   = (i+1) * n / noProcs;
    
    MPI_Send( &begin, 1, MPI_INT, i /*node i*/, 0, MPI_COMM_WORLD );
    MPI_Send( &end, 1, MPI_INT, i /*node i*/, 0, MPI_COMM_WORLD );
    MPI_Send( &n, 1, MPI_INT, i /*node i*/, 0, MPI_COMM_WORLD );
  }

  //--- perform summation over 1st interval of the series ---
  begin = 0;
  end   = n/noProcs;

  for ( i = begin; i < end; i++ )
    sum0 += f( i * deltaX );

  //--- wait for other half from worker ---
  for ( i=1; i<noProcs; i++ ) {
    MPI_Recv( &sum1, 1, MPI_DOUBLE, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status );
    sum0 += sum1;
  }

  //--- output result ---
  printf( "%d iterations: Pi = %f\n", n, sum0 *deltaX );
}

//--------------------------------------------------------------------
// The worker's main work function.  Same comment as for the
// Manager.  The 3 ints would benefit from being sent in an array.
//--------------------------------------------------------------------
void doWorker( ) {
  int begin, end, n, i;

  //--- get n and bounds for summation from manager ---
  MPI_Status status;
  MPI_Recv( &begin, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status );
  MPI_Recv( &end, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status );
  MPI_Recv( &n, 1, MPI_INT, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &status );

  //--- sum over boundaries received ---
  double sum = 0;
  double deltaX = 1.0/n;

  for ( i=begin; i< end; i++ )
    sum += f( i * deltaX );

  //-- send result to manager ---
  MPI_Send( &sum, 1, MPI_DOUBLE, MANAGER, 0, MPI_COMM_WORLD );
}


Creating a 10-node cluster


To create a 10-node cluster, make sure your current cluster is stopped.

 starcluster terminate myclusterABC

or

 starcluster terminate -f myclusterABC

if the first command is not successfule.

Edit the starcluster configuration file on your laptop:

 cd 
 cd .starcluster
 emacs -nw config             (or use your favorite text editor)

Locate the following line, and set the number of nodes to 10:

 # number of ec2 instances to launch                                                                                
 CLUSTER_SIZE = 10

Save the config file.

Start up the cluster again:

 starcluster start myclusterABC
StarCluster - (http://star.mit.edu/cluster) (v. 0.95.6)
Software Tools for Academics and Researchers (STAR)
Please submit bug reports to starcluster@mit.edu

*** WARNING - Setting 'EC2_PRIVATE_KEY' from environment...
*** WARNING - Setting 'EC2_CERT' from environment...
>>> Using default cluster template: smallcluster
  ...
You can activate a 'stopped' cluster by passing the -x
option to the 'start' command:

    $ starcluster start -x mpicluster

This will start all 'stopped' nodes and reconfigure the
cluster.


SSH to the Master node


SSH to the master node of your cluster and verify that you have an mpi folder there, and if not, create one.

 starcluster sshmaster myclusterABC
 ...
root@master:~# su - sgeadmin
sgeadmin@master:~$ mkdir mpi
sgeadmin@master:~$ 
  • Return to your Mac command prompt:
 sgeadmin@master:~$  exit


Upload the PiN.c program to the Cluster


Assuming that you have the version of the piN.c program listed at the top of this page somewhere on your laptop, cd to the directory containing the file:

 cd
 cd   pathToWhereYourFileIsLocated 
 

You can now upload the local file to the cluster using the starcluster utility:

 starcluster  put  myclusterABC  piN.c  /data/mpi/



Create a host file


Mpirun.mpich2 will by default launch all the processes you start on the node on which you run the command. By default mpirun.mpich2 assumes only one node in the cluster. To force it to launch the processes on other nodes in the cluster you need to put the names of all nodes in a file called the host file.

Starcluster uses a simple system to name the nodes. The master node is called master and the workers are called node001, node002, etc...

  • You can use emacs to create a file called hosts in the /home/mpi directory with the name of your nodes, or you can use base and do it with a one-line command:
sgeadmin@master:~$ cat /etc/hosts | tail -n 10 | cut -d' ' -f 2 > hosts
  • Verify that the hosts file contains the required information:
 sgeadmin@master:~$ cat hosts
 master
 node001
 node002
 node003
 node004
 node005
 node006
 node007
 node008
 node009

This list will be used by the mpirun.mpich2 program to figure out which node to distribute the processes to.

Compile and Run the PiN.c program on the 10-node cluster


 sgeadmin@master:~$ mpicc -o piN piN.c


 sgeadmin@master:~$ mpirun.mpich2 -n 10 -f hosts ./piN 10000000
 Process 0 of 10 started on master. n = 10000000
Process 7 of 10 started on node007. n = 10000000
Process 3 of 10 started on node003. n = 10000000
Process 6 of 10 started on node006. n = 10000000
Process 2 of 10 started on node002. n = 10000000
Process 8 of 10 started on node008. n = 10000000
Process 9 of 10 started on node009. n = 10000000
Process 5 of 10 started on node005. n = 10000000
Process 4 of 10 started on node004. n = 10000000
Process 1 of 10 started on node001. n = 10000000
10000000 iterations: Pi = 3.141593


Timing the Execution Time of your MPI Program


sgeadmin@master:~$  time mpirun.mpich2 -n 10 -f hosts ./piN 1000000000
Process 0 of 10 started on master. n = 1000000000
Process 3 of 10 started on node003. n = 1000000000
Process 6 of 10 started on node006. n = 1000000000
Process 2 of 10 started on node002. n = 1000000000
Process 4 of 10 started on node004. n = 1000000000
Process 7 of 10 started on node007. n = 1000000000
Process 9 of 10 started on node009. n = 1000000000
Process 1 of 10 started on node001. n = 1000000000
Process 5 of 10 started on node005. n = 1000000000
Process 8 of 10 started on node008. n = 1000000000
1000000000 iterations: Pi = 3.171246

real	0m4.911s
user	0m4.072s
sys	0m0.156s


AWS Instance comparison chart



Challenge 1

QuestionMark1.jpg


Looking at the charts below, figure out how much it has just cost you to run your MPI application on AWS. Note that Amazon charges by the hour. So if you use 5 minutes of time on your cluster (between start and stop operations), you will be charged for an hour of computer time.








Types of AMIs


For information, below is the comparison chart of the different instances or AMIs (Amazon Machine Instances) available to populate a cluster (captured in Oct. 2013. Check the Up-to-date table on Amazon for current technology). The default used by Starcluster is the m1.small, but notice that many others are available. Of course the more powerful machines come at a higher price. But given that some have many cores, the elevated cost for them is oven worth the potential boost in performance to be gained.

AWSInstanceComparisonChart1.png


Pricing


The prices per hour of renting of an AMI is given below for a the small instances (more can be found on this Amazon page).

AWSInstancePriceChart.png