Difference between revisions of "CSC352 Homework 5 2013"

From dftwiki3
Jump to: navigation, search
(Assignment)
(Images)
Line 17: Line 17:
 
==Images==
 
==Images==
 
<br />
 
<br />
A sample (200,000 or so) of the 3 million images have been transferred to an EBS drive in our AWS environment.  You need to attach it to your cluster in order for your program to access the files.  Follow the directions (slightly modified since we did the lab on AWS) from [[Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_(EC2)#Creating_an_EBS_Volume| this section]] and the [[Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_(EC2)#Attaching_the_EBS_Volume_to_the_Cluster| the section that follows]] to attach the EBS volume to your cluster.  The two sections above use a fake EBS volume Id. Instead of the one specified in the tutorial, use the one shown below:
+
A sample (200,000 or so) of the 3 million images have been transferred to an EBS drive in our AWS environment.  You need to attach it to your cluster in order for your program to access the files.  You should also create an EBS for yourself, where you can keep your program files.  Follow the directions (slightly modified since we did the lab on AWS) from [[Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_(EC2)#Creating_an_EBS_Volume| this section]] and the [[Create_an_MPI_Cluster_on_the_Amazon_Elastic_Cloud_(EC2)#Attaching_the_EBS_Volume_to_the_Cluster| the section that follows]] to attach your personal '''data''' EBS volume to your cluster, as well as the '''enwiki''' volume.   
 +
 
 +
Go ahead and follow the tutorial on creating your own data EBS and come back to this point when you're done.
 +
 
 +
Edit your starcluster config file to specify the EBS with the 200,000 images, and that it should be mounted automatically.
  
 
<onlysmith>
 
<onlysmith>
[volume dataABC]
+
VOLUME_ID = vol-cf950a8c
+
VOLUMES = enwiki, dataABC
MOUNT_PATH = /data
+
 
 +
...
 +
 +
[volume enwiki]
 +
VOLUME_ID = vol-f60093b5
 +
MOUNT_PATH = /enwiki
 +
 +
[volume dataABC]
 +
VOLUME_ID = vol-xxxxxxxx  ''(your volume Id will be different)''
 +
MOUNT_PATH = /data
 +
 
</onlysmith>
 
</onlysmith>
 +
 +
 +
When you next start the cluster, you will have two directories that will appear in the root directory, '''/data''', and '''/enwiki'''.  All nodes will have access to both of them.  Approximately 150,000  images have already been uploaded to the directory '''/data/enwiki/''', in three subdirectories, '''0''', '''1''', and '''2'''.
 +
 +
To get a sense of where the images are, start your cluster with just 1 node (no need to create a large cluster just to explore the system), and ssh to the master:
 +
 +
'''starcluster start mycluster'''
 +
'''starcluster sshmaster mycluster'''
 +
'''ls /enwiki'''
 +
'''ls /enwiki/0'''
 +
'''ls /enwiki/0/01'''
 +
etc...
 +
  
  

Revision as of 21:09, 4 November 2013

--D. Thiebaut (talk) 20:06, 4 November 2013 (EST)


This assignment is due on 11/14 at 11:59 p.m.


Assignment


Run an MPI program on Amazon AWS that finds the geometry of image files. Entering the image geometry in a database will be skipped for this assignment. We are interested in optimizing a master-workers protocol on an MPI cluster of N nodes.

Implementation Details


Program


You can use the program we saw in class, and covered in this tutorial. You need to remove the storing of information in the MySQL database.

Images


A sample (200,000 or so) of the 3 million images have been transferred to an EBS drive in our AWS environment. You need to attach it to your cluster in order for your program to access the files. You should also create an EBS for yourself, where you can keep your program files. Follow the directions (slightly modified since we did the lab on AWS) from this section and the the section that follows to attach your personal data EBS volume to your cluster, as well as the enwiki volume.

Go ahead and follow the tutorial on creating your own data EBS and come back to this point when you're done.

Edit your starcluster config file to specify the EBS with the 200,000 images, and that it should be mounted automatically.


This section is only visible to computers located at Smith College


When you next start the cluster, you will have two directories that will appear in the root directory, /data, and /enwiki. All nodes will have access to both of them. Approximately 150,000 images have already been uploaded to the directory /data/enwiki/, in three subdirectories, 0, 1, and 2.

To get a sense of where the images are, start your cluster with just 1 node (no need to create a large cluster just to explore the system), and ssh to the master:

starcluster start mycluster
starcluster sshmaster mycluster
ls /enwiki
ls /enwiki/0
ls /enwiki/0/01
etc...








Misc. Information

In case you wanted to have the MPI program store the image geometry in your database, you'd have to follow the process described in this tutorial. However, if you were to create the program mysqlTest.c on your AWS cluster, you'd find that the command mysql_config is not installed on the default AMI used by starcluster to create the MPI cluster.

To install the mysql_config utility, run the following commands on the master node of your cluster as root:

apt-get update
apt-get build-dep python-mysqldb

Edit the constants in mysqlTest.c that define the address of the database server (hadoop0), as well as the credentials of your account on the mysql server.

You can then compile and run program:

gcc -o mysqlTest $(mysql_config --cflags) mysqlTest.c $(mysql_config --libs)
./mysqlTest
MySQL Tables in mysql database:
images 
images2  
pics1