Difference between revisions of "CSC352 Project 3"

From dftwiki3
Jump to: navigation, search
 
(7 intermediate revisions by the same user not shown)
Line 7: Line 7:
 
<onlysmith>
 
<onlysmith>
 
=The Big Picture=
 
=The Big Picture=
 +
{|
 +
|
 
<tanbox>
 
<tanbox>
[[Image:cherries.jpg|right|50px]]
 
 
Your project should present your answers to the following three questions:
 
Your project should present your answers to the following three questions:
 
* How should one attempt to process 5 Million Wikipedia pages with MapReduce/Hadoop?  What parameters control the execution time, and what is the best guess for the values they should be set at?
 
* How should one attempt to process 5 Million Wikipedia pages with MapReduce/Hadoop?  What parameters control the execution time, and what is the best guess for the values they should be set at?
Line 14: Line 15:
 
* How does this compare to the execution time of the 5 Million pages on an XGrid system?
 
* How does this compare to the execution time of the 5 Million pages on an XGrid system?
 
</tanbox>
 
</tanbox>
 +
|
 +
[[Image:cherriesXparent.gif|right|100px]]
 +
|}
 +
<br />
  
 
=Assignment (same as for the XGrid Project)=
 
=Assignment (same as for the XGrid Project)=
Line 117: Line 122:
  
 
</pre></code>
 
</pre></code>
 +
 +
You are free to put additional wiki pages from the local disk of Hadoop6 into HDFS, but if you do so, do it in the '''wikipages''' directory, and update the README_dft.txt file in the HDFS wikipages directory with information about what you have added and how to access it.  Thanks!
  
 
===Web Server===
 
===Web Server===
Line 123: Line 130:
  
  
==Submission==
+
=Submission=
  
 
Submit a pdf (and additional files if needed) as follows:
 
Submit a pdf (and additional files if needed) as follows:
Line 141: Line 148:
 
     submit project3  ''yourFirstNameProject3.tgz''
 
     submit project3  ''yourFirstNameProject3.tgz''
  
 +
=Extra Credits=
 +
 +
Extra credits will be given for some work done on AWS.  This could be the whole project or sections of it, or just comparison on some of the input sets.
 
</onlysmith>
 
</onlysmith>
  
Line 150: Line 160:
 
<br />
 
<br />
 
<br />
 
<br />
[[Category:CSC352]][[Category:Projects]][[Category:MapReduce]][[Category:XGrid]]
+
[[Category:CSC352]][[Category:Project]][[Category:MapReduce]][[Category:XGrid]]

Latest revision as of 12:07, 18 November 2010


This is the extension of Project #2, which is built on top of the Hadoop/Mapreduce Tutorials. It is due on the last day of Exams, at 4:00 p.m.


This section is only visible to computers located at Smith College