XGrid Running a C Program

From dftwiki3
Revision as of 22:59, 17 February 2010 by Thiebaut (talk | contribs) (Case Study)
(diff) ← Older revision | Latest revision (diff) | Newer revision → (diff)
Jump to: navigation, search

--Thiebaut 17:33, 27 September 2008 (UTC)


Running a C program in parallel on the XGrid controller


The 1-job approach

I decided to try to run the N-Queens problem on the XGrid. The problem is to find, given an NxN chessboard, the number of different ways one can you put queens on each row and each column of the board without them taking each other out.

This program is a great way to test the raw CPU power of a machine, since it uses mostly CPU, and very little memory and practically no I/O.

The C program is availble here.

To compile, run, and measure the execution time of the program on the XGrid controller:

gcc -o queensall queensall.c 
time xgrid -job run ./queensall 14
N=14  365596 solutions  
real	0m5.504s
user	0m0.018s
sys	0m0.009s

(Note: I have setup the address of the XGrid controller and the password in my environment variables, so I don't have to provide them when I call xgrid.)

The executable, queensall, takes 5.504 seconds to find the 365,596 solutions on a 14x14 board on the XGrid controller, and 0.018 seconds of user time on the local machine (of course, since all the computation is done on the controller!)

This is fine, but the program is not parallel, and runs only on one of the XGrid computers... no big deal... The next step is to make it run in parallel.

A good reference for command line control of the Xgrid can be found at Wooster U. http://www.wooster.edu/physics/cowcpp/Xgrid/XgridIntroduction.htm (skip to Section 2.3).

Running on the grid

Getting the program ready for parallelism

To make the program run in parallel, one simply needs to modify the program so that we can force it to put the first queen in a specific column of the first row, and let it find all the solutions for this starting point.

The new program is QueensAll_SingleColumn.c.

Its syntax is simple:

./queensall_singlecolumn 1 14
First Queen in column 1 of 14x14 board:  16488 solutions found

Here it reports that it has found 16,488 solutions when the first queen is located in Column 1 or Row 0 of the 14x14 board.

Running in parallel

The parallelism will come from launching the program N times and for each instance give it a column number in [0..N-1].

We can use a shell file to automatically launch the program N times. Below is runQueensAll.sh for this purpose:

#! /bin/bash
# runQueensAll.sh
# D. Thiebaut
# Run the queensall_singlecolumn C program as multiple instances
# on the XGrid.
# To run the script type:
# 
#     ./runQueensAll.sh  nnn
# 
# where nnn is an integer representing the dimension of the 
# chessboard.  The script will automatically launch N jobs on the XGrid,
# Job i  putting its first queen in Column i of its first row.
#
if [ "$#" -eq "0" ]
  then
   echo "Syntax: "
   echo "runQueensAll.sh nnn"
   echo "where nnn is an integer representing the dimension of "
   echo "the chessboard."
   echo " "
   exit 1
fi

n=0
while [ $n -lt $1 ]; do
   #xgrid -job run ./queensall_singlecolumn $n $1
   xgrid -job submit ./queensall_singlecolumn $n $1
   n=$[$n + 1]
done

Make the script executable before running it:

 chmod +x runQueensAll.sh

Run the script:

 ./runQueensAll.sh 15

Check with the XGrid Admin GUI that the jobs are submitted and running:

XGrid XGridAdmin NQueens.png

Understanding the output of the script

When running the shell script we get:

./runQueensAll.sh 15
{
    jobIdentifier = 627;
}
{
    jobIdentifier = 628;
}
{
    jobIdentifier = 629;
}
.
.
.
{
    jobIdentifier = 638;
}
{
    jobIdentifier = 639;
}
{
    jobIdentifier = 640;
}
{
    jobIdentifier = 641;
}

This is the syntax used by the XGrid system to report information.

To get the actual output of a job, say Job 638, use the following command:

xgrid -job results -id 638
First Queen in column 11 of 15x15 board:  157034 solutions found

That's it! Job 638 computed that 157,034 solutions existed for a 15x15 board when the first queen was put in Column 11.

Comparing the execution times

Let's compare the N-Queens program for a 15x15 board when run on one node of the XGrid, and when we fork 15 different jobs.

Below are the results for the N-Queens program computing all the solutions and running on one node only:

time xgrid -job run ./queensall 15 
N=15  2279184 solutions  
real	0m34.816s
user	0m0.018s
sys	0m0.008s

34 seconds!

The N-Queens program as 15 different jobs:

time ./runQueensAll.sh 15
{
    jobIdentifier = 658;
}
{
    jobIdentifier = 659;
}
.
.
.
{
    jobIdentifier = 671;
}
{
    jobIdentifier = 672;
}

real	0m0.764s
user	0m0.229s
sys	0m0.103s

0.7 seconds! Actually, this is the time it takes the script to submit the jobs, not the time it takes for the jobs to finish.

The Xgrid Admin shows this plainly. It will show graphically each job finishing one after the other. We get parallelism, indeed!!!

Python to the rescue!

Getting the stats and the output of the processes individually on the command line is cumbersome. I created a Python program to do just that. Just pipe the output of the xgrid job submit command to the python program, and it will gather all the process Ids, get their respective outputs, and will display some statistics about the execution times.

The Python code getXGridOutput.py is available here.

Make sure you make it executable before using it:

 chmod +x getXGridOutput.py

Then run the parallel N-Queens programs as follows:

runQueensAll.sh 15 | getXGridOutput.py 

Job 1250 stopped: Execution time: 2.000000 seconds
First Queen in column 0 of 15x15 board:  69516 solutions found

Job 1251 stopped: Execution time: 2.000000 seconds
First Queen in column 1 of 15x15 board:  98156 solutions found

Job 1252 stopped: Execution time: 2.000000 seconds
First Queen in column 2 of 15x15 board:  122763 solutions found

Job 1253 stopped: Execution time: 2.000000 seconds
First Queen in column 3 of 15x15 board:  157034 solutions found

Job 1254 stopped: Execution time: 2.000000 seconds
First Queen in column 4 of 15x15 board:  175296 solutions found

Job 1255 stopped: Execution time: 2.000000 seconds
First Queen in column 5 of 15x15 board:  201164 solutions found

Job 1256 stopped: Execution time: 2.000000 seconds
First Queen in column 6 of 15x15 board:  206294 solutions found

Job 1257 stopped: Execution time: 3.000000 seconds
First Queen in column 7 of 15x15 board:  218738 solutions found

Job 1258 stopped: Execution time: 2.000000 seconds
First Queen in column 8 of 15x15 board:  206294 solutions found

Job 1259 stopped: Execution time: 2.000000 seconds
First Queen in column 9 of 15x15 board:  201164 solutions found

Job 1260 stopped: Execution time: 2.000000 seconds
First Queen in column 10 of 15x15 board:  175296 solutions found

Job 1261 stopped: Execution time: 2.000000 seconds
First Queen in column 11 of 15x15 board:  157034 solutions found

Job 1262 stopped: Execution time: 2.000000 seconds
First Queen in column 12 of 15x15 board:  122763 solutions found

Job 1263 stopped: Execution time: 2.000000 seconds
First Queen in column 13 of 15x15 board:  98156 solutions found

Job 1264 stopped: Execution time: 2.000000 seconds
First Queen in column 14 of 15x15 board:  69516 solutions found


Total execution time: 6.000000 seconds

Case Study

Let's compare solving the N-queens problem for a 17x17 board serially and in parallel:

Serial Code Parallel Code
runQueensAll.sh 17 | getXGridOutput.py 
Job 1280 stopped: Execution time: 79.000000 seconds
First Queen in column 0 of 17x17 board:  2729772 solutions found
.
.
.
Job 1288 stopped: Execution time: 108.000000 seconds
First Queen in column 8 of 17x17 board:  8026228 solutions found

Total execution time: 109.000000 seconds
time xgrid -job run ./queensall 17
N=17  95815104 solutions  
real	28m34.690s
user	0m0.018s
sys	0m0.008s






Interesting results! The speedup is close to 15.7 on a 10-node, 8-processor per node machine.

When the serial version of the program runs on the controller, it cannot but use only one of the 8 processors. The reason is that the serial code is not written for parallelism, and the gcc compiler used to compile it does not know how to extract parallelism from the code. So the program really uses 1 of the 8 processors on the controller.

(Note: it would be interesting to write a code that uses threads to compute the different solutions and see how the XGrid treats the threads. I would expect the OS to start threads on different processors of the same machine.)

When the program runs in parallel however, all 8 processors of all 10 machines can be used, and since we start 1 process per dimension of the chessboard, we have 17 processes running concurrently. The maximum speedup one can expect in this case is 17. The 15.7 speedup obtained is very reasonnable and accounts very likely to start-up and communication conditions.

The main observation, therefore, is that it takes the program about 2 minutes to run in parallel on the XGrid, and about half an hour to run serially on the XGrid controller.