Difference between revisions of "CSC352 Homework 5 2013"

From dftwiki3
Jump to: navigation, search
(Images)
(Images)
Line 17: Line 17:
 
==Images==
 
==Images==
 
<br />
 
<br />
A sample (200,000 or so) of the 3 million images have been transferred to an EBS drive in our AWS environment.  You need to attach it to your cluster in order for your program to access the files.  You should also create an EBS for yourself, where you can keep your program files.  Follow the directions (slightly modified since we did the lab on AWS) from [[Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_(EC2)#Creating_an_EBS_Volume| this section]] and the [[Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_(EC2)#Attaching_the_EBS_Volume_to_the_Cluster| the section that follows]] to attach your personal '''data''' EBS volume to your cluster, as well as the '''enwiki''' volume.   
+
A sample (150,000 or so) of the 3 million images have been transferred to an '''EBS drive''' in our AWS environment.  You need to attach it to your cluster in order for your program to access the files.  You should also create a 1-GByte EBS for yourself, where you can keep your program files.  Follow the directions (slightly modified since we did the lab on AWS) from [[Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_(EC2)#Creating_an_EBS_Volume| this section]] and the [[Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_(EC2)#Attaching_the_EBS_Volume_to_the_Cluster| the section that follows]] to attach your personal '''data''' EBS volume to your cluster, as well as the '''enwiki''' volume.   
  
Go ahead and follow the tutorial on creating your own data EBS and come back to this point when you're done.
+
Go ahead and follow the tutorial on creating your own ''data EBS'' and come back to this point when you're done.
  
Edit your starcluster config file to specify the EBS with the 200,000 images, and that it should be mounted automatically.
+
Edit your starcluster config file to add the EBS with the 150,000 images, and that it should be mounted automatically.
  
 
<onlysmith>
 
<onlysmith>
Line 53: Line 53:
  
  
 +
<br />
 +
=ImageMagick and Identify=
 +
<br />
 +
'''Identify''' is a utility that is part of '''Imagemagick'''.  Unfortunately, Imagemagick is not installed by default on our clusters.  Doing image processing is apparently something not regularly performed by '''mpi''' programs.  But installing it is easy:
 +
 +
On the master node, type
 +
 +
apt-get update
 +
apt-get install imagemagick
 +
 +
And identify will be installed on the master.  Unfortunately, you'll have to install it as well on all the workers.  If you '''stop''' your cluster and not '''terminate''' it, the installation will remain until the next time you restart your cluster.    If you '''terminate''' your cluster, however, you'll have to reinstall imagemagick the next time to start your cluster.
 +
 +
<br />
 +
=Measurements=
 +
<br />
 +
Run the program on a cluster of 10
  
  

Revision as of 22:27, 4 November 2013

--D. Thiebaut (talk) 20:06, 4 November 2013 (EST)


This assignment is due on 11/14 at 11:59 p.m.


Assignment


Run an MPI program on Amazon AWS that finds the geometry of image files. Entering the image geometry in a database will be skipped for this assignment. We are interested in optimizing a master-workers protocol on an MPI cluster of N nodes.

Implementation Details


Program


You can use the program we saw in class, and covered in this tutorial. You need to remove the storing of information in the MySQL database.

Images


A sample (150,000 or so) of the 3 million images have been transferred to an EBS drive in our AWS environment. You need to attach it to your cluster in order for your program to access the files. You should also create a 1-GByte EBS for yourself, where you can keep your program files. Follow the directions (slightly modified since we did the lab on AWS) from this section and the the section that follows to attach your personal data EBS volume to your cluster, as well as the enwiki volume.

Go ahead and follow the tutorial on creating your own data EBS and come back to this point when you're done.

Edit your starcluster config file to add the EBS with the 150,000 images, and that it should be mounted automatically.


This section is only visible to computers located at Smith College


When you next start the cluster, you will have two directories that will appear in the root directory, /data, and /enwiki. All nodes will have access to both of them. Approximately 150,000 images have already been uploaded to the directory /data/enwiki/, in three subdirectories, 0, 1, and 2.

To get a sense of where the images are, start your cluster with just 1 node (no need to create a large cluster just to explore the system), and ssh to the master:

starcluster start mycluster
starcluster sshmaster mycluster
ls /enwiki
ls /enwiki/0
ls /enwiki/0/01
etc...



ImageMagick and Identify


Identify is a utility that is part of Imagemagick. Unfortunately, Imagemagick is not installed by default on our clusters. Doing image processing is apparently something not regularly performed by mpi programs. But installing it is easy:

On the master node, type

apt-get update
apt-get install imagemagick

And identify will be installed on the master. Unfortunately, you'll have to install it as well on all the workers. If you stop your cluster and not terminate it, the installation will remain until the next time you restart your cluster. If you terminate your cluster, however, you'll have to reinstall imagemagick the next time to start your cluster.


Measurements


Run the program on a cluster of 10







Misc. Information

In case you wanted to have the MPI program store the image geometry in your database, you'd have to follow the process described in this tutorial. However, if you were to create the program mysqlTest.c on your AWS cluster, you'd find that the command mysql_config is not installed on the default AMI used by starcluster to create the MPI cluster.

To install the mysql_config utility, run the following commands on the master node of your cluster as root:

apt-get update
apt-get build-dep python-mysqldb

Edit the constants in mysqlTest.c that define the address of the database server (hadoop0), as well as the credentials of your account on the mysql server.

You can then compile and run program:

gcc -o mysqlTest $(mysql_config --cflags) mysqlTest.c $(mysql_config --libs)
./mysqlTest
MySQL Tables in mysql database:
images 
images2  
pics1