Difference between revisions of "XGrid Running a C Program"
(Created page with ' --Thiebaut 17:33, 27 September 2008 (UTC) ---- =Running a C program in parallel on the XGrid controller= __TOC__ ==The 1-job approach== I decided to try to …') |
(→Case Study) |
||
Line 287: | Line 287: | ||
The main observation, therefore, is that''' it takes the program about 2 minutes to run in parallel on the XGrid, and about half an hour to run serially on the XGrid controller.''' | The main observation, therefore, is that''' it takes the program about 2 minutes to run in parallel on the XGrid, and about half an hour to run serially on the XGrid controller.''' | ||
+ | |||
+ | <br /> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | <br /> | ||
+ | [[Category:XGrid]][[Category:Tutorials]] |
Latest revision as of 21:59, 17 February 2010
--Thiebaut 17:33, 27 September 2008 (UTC)
Running a C program in parallel on the XGrid controller
Contents
The 1-job approach
I decided to try to run the N-Queens problem on the XGrid. The problem is to find, given an NxN chessboard, the number of different ways one can you put queens on each row and each column of the board without them taking each other out.
This program is a great way to test the raw CPU power of a machine, since it uses mostly CPU, and very little memory and practically no I/O.
The C program is availble here.
To compile, run, and measure the execution time of the program on the XGrid controller:
gcc -o queensall queensall.c time xgrid -job run ./queensall 14 N=14 365596 solutions real 0m5.504s user 0m0.018s sys 0m0.009s
(Note: I have setup the address of the XGrid controller and the password in my environment variables, so I don't have to provide them when I call xgrid.)
The executable, queensall, takes 5.504 seconds to find the 365,596 solutions on a 14x14 board on the XGrid controller, and 0.018 seconds of user time on the local machine (of course, since all the computation is done on the controller!)
This is fine, but the program is not parallel, and runs only on one of the XGrid computers... no big deal... The next step is to make it run in parallel.
A good reference for command line control of the Xgrid can be found at Wooster U. http://www.wooster.edu/physics/cowcpp/Xgrid/XgridIntroduction.htm (skip to Section 2.3).
Running on the grid
Getting the program ready for parallelism
To make the program run in parallel, one simply needs to modify the program so that we can force it to put the first queen in a specific column of the first row, and let it find all the solutions for this starting point.
The new program is QueensAll_SingleColumn.c.
Its syntax is simple:
./queensall_singlecolumn 1 14 First Queen in column 1 of 14x14 board: 16488 solutions found
Here it reports that it has found 16,488 solutions when the first queen is located in Column 1 or Row 0 of the 14x14 board.
Running in parallel
The parallelism will come from launching the program N times and for each instance give it a column number in [0..N-1].
We can use a shell file to automatically launch the program N times. Below is runQueensAll.sh for this purpose:
#! /bin/bash
# runQueensAll.sh
# D. Thiebaut
# Run the queensall_singlecolumn C program as multiple instances
# on the XGrid.
# To run the script type:
#
# ./runQueensAll.sh nnn
#
# where nnn is an integer representing the dimension of the
# chessboard. The script will automatically launch N jobs on the XGrid,
# Job i putting its first queen in Column i of its first row.
#
if [ "$#" -eq "0" ]
then
echo "Syntax: "
echo "runQueensAll.sh nnn"
echo "where nnn is an integer representing the dimension of "
echo "the chessboard."
echo " "
exit 1
fi
n=0
while [ $n -lt $1 ]; do
#xgrid -job run ./queensall_singlecolumn $n $1
xgrid -job submit ./queensall_singlecolumn $n $1
n=$[$n + 1]
done
Make the script executable before running it:
chmod +x runQueensAll.sh
Run the script:
./runQueensAll.sh 15
Check with the XGrid Admin GUI that the jobs are submitted and running:
Understanding the output of the script
When running the shell script we get:
./runQueensAll.sh 15
{
jobIdentifier = 627;
}
{
jobIdentifier = 628;
}
{
jobIdentifier = 629;
}
.
.
.
{
jobIdentifier = 638;
}
{
jobIdentifier = 639;
}
{
jobIdentifier = 640;
}
{
jobIdentifier = 641;
}
This is the syntax used by the XGrid system to report information.
To get the actual output of a job, say Job 638, use the following command:
xgrid -job results -id 638 First Queen in column 11 of 15x15 board: 157034 solutions found
That's it! Job 638 computed that 157,034 solutions existed for a 15x15 board when the first queen was put in Column 11.
Comparing the execution times
Let's compare the N-Queens program for a 15x15 board when run on one node of the XGrid, and when we fork 15 different jobs.
Below are the results for the N-Queens program computing all the solutions and running on one node only:
time xgrid -job run ./queensall 15 N=15 2279184 solutions real 0m34.816s user 0m0.018s sys 0m0.008s
34 seconds!
The N-Queens program as 15 different jobs:
time ./runQueensAll.sh 15
{
jobIdentifier = 658;
}
{
jobIdentifier = 659;
}
.
.
.
{
jobIdentifier = 671;
}
{
jobIdentifier = 672;
}
real 0m0.764s
user 0m0.229s
sys 0m0.103s
0.7 seconds! Actually, this is the time it takes the script to submit the jobs, not the time it takes for the jobs to finish.
The Xgrid Admin shows this plainly. It will show graphically each job finishing one after the other. We get parallelism, indeed!!!
Python to the rescue!
Getting the stats and the output of the processes individually on the command line is cumbersome. I created a Python program to do just that. Just pipe the output of the xgrid job submit command to the python program, and it will gather all the process Ids, get their respective outputs, and will display some statistics about the execution times.
The Python code getXGridOutput.py is available here.
Make sure you make it executable before using it:
chmod +x getXGridOutput.py
Then run the parallel N-Queens programs as follows:
runQueensAll.sh 15 | getXGridOutput.py
Job 1250 stopped: Execution time: 2.000000 seconds
First Queen in column 0 of 15x15 board: 69516 solutions found
Job 1251 stopped: Execution time: 2.000000 seconds
First Queen in column 1 of 15x15 board: 98156 solutions found
Job 1252 stopped: Execution time: 2.000000 seconds
First Queen in column 2 of 15x15 board: 122763 solutions found
Job 1253 stopped: Execution time: 2.000000 seconds
First Queen in column 3 of 15x15 board: 157034 solutions found
Job 1254 stopped: Execution time: 2.000000 seconds
First Queen in column 4 of 15x15 board: 175296 solutions found
Job 1255 stopped: Execution time: 2.000000 seconds
First Queen in column 5 of 15x15 board: 201164 solutions found
Job 1256 stopped: Execution time: 2.000000 seconds
First Queen in column 6 of 15x15 board: 206294 solutions found
Job 1257 stopped: Execution time: 3.000000 seconds
First Queen in column 7 of 15x15 board: 218738 solutions found
Job 1258 stopped: Execution time: 2.000000 seconds
First Queen in column 8 of 15x15 board: 206294 solutions found
Job 1259 stopped: Execution time: 2.000000 seconds
First Queen in column 9 of 15x15 board: 201164 solutions found
Job 1260 stopped: Execution time: 2.000000 seconds
First Queen in column 10 of 15x15 board: 175296 solutions found
Job 1261 stopped: Execution time: 2.000000 seconds
First Queen in column 11 of 15x15 board: 157034 solutions found
Job 1262 stopped: Execution time: 2.000000 seconds
First Queen in column 12 of 15x15 board: 122763 solutions found
Job 1263 stopped: Execution time: 2.000000 seconds
First Queen in column 13 of 15x15 board: 98156 solutions found
Job 1264 stopped: Execution time: 2.000000 seconds
First Queen in column 14 of 15x15 board: 69516 solutions found
Total execution time: 6.000000 seconds
Case Study
Let's compare solving the N-queens problem for a 17x17 board serially and in parallel:
Serial Code | Parallel Code |
---|---|
|
|
Interesting results! The speedup is close to 15.7 on a 10-node, 8-processor per node machine.
When the serial version of the program runs on the controller, it cannot but use only one of the 8 processors. The reason is that the serial code is not written for parallelism, and the gcc compiler used to compile it does not know how to extract parallelism from the code. So the program really uses 1 of the 8 processors on the controller.
(Note: it would be interesting to write a code that uses threads to compute the different solutions and see how the XGrid treats the threads. I would expect the OS to start threads on different processors of the same machine.)
When the program runs in parallel however, all 8 processors of all 10 machines can be used, and since we start 1 process per dimension of the chessboard, we have 17 processes running concurrently. The maximum speedup one can expect in this case is 17. The 15.7 speedup obtained is very reasonnable and accounts very likely to start-up and communication conditions.
The main observation, therefore, is that it takes the program about 2 minutes to run in parallel on the XGrid, and about half an hour to run serially on the XGrid controller.