Difference between revisions of "CSC352 Resources"

From dftwiki3
Jump to: navigation, search
(Documentation on Cloud Computing, Map-Reduce, & Hadoop)
(Documentation on Cloud Computing, Map-Reduce, & Hadoop)
Line 131: Line 131:
 
* Matthews, S., & Williams, T. [http://www.biomedcentral.com/1471-2105/11/S1/S15 MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees BMC Bioinformatics], 11, 2010 (Suppl 1) <font color=magenta>(authors show that speedups of close to 18 on 32 cores can be reached for treating 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each.)</font>
 
* Matthews, S., & Williams, T. [http://www.biomedcentral.com/1471-2105/11/S1/S15 MrsRF: an efficient MapReduce algorithm for analyzing large collections of evolutionary trees BMC Bioinformatics], 11, 2010 (Suppl 1) <font color=magenta>(authors show that speedups of close to 18 on 32 cores can be reached for treating 20,000 trees of 150 taxa each and 33,306 trees of 567 taxa each.)</font>
 
* Chris K Wensel, [http://www.manamplified.org/archives/2008/11/hadoop-is-about-scalability.html Hadoop Is About Scalability, Not Performance], www.manamplified.org, November 12, 2008.
 
* Chris K Wensel, [http://www.manamplified.org/archives/2008/11/hadoop-is-about-scalability.html Hadoop Is About Scalability, Not Performance], www.manamplified.org, November 12, 2008.
 +
* TimeLine Graphs and Performance
 +
**  Owen O'Malley and Arun Murthy, [http://developer.yahoo.net/blogs/hadoop/2009/05/hadoop_sorts_a_petabyte_in_162.html Hadoop Sorts a Petabyte in 16.25 Hours and a Terabyte in 62 Seconds], http://developer.yahoo.net, May 2009.
 +
** Joseph Gebis, [http://blogs.sun.com/jgebis/entry/understanding_hadoop_task_timelines Understanding Hadoop Task Timelines], http://blogs.sun.com, June 2009. (<font color="magenta">A good description of the ''Task Timelines'' used to quantify hadoop performance</font>)
 +
** Joseph Gebis, [http://blogs.sun.com/jgebis/entry/hadoop_resource_utilization_monitoring_scripts Hadoop Resource Utilization Monitoring -- scripts], http://blogs.sun.com, June 2009.
 +
** Joseph Gebis, [http://blogs.sun.com/jgebis/entry/hadoop_resource_utilization_and_performance Hadoop resource utilization and performance analysis], http://blogs.sun.com, June 2009.
 +
 
===Tutorials===
 
===Tutorials===
 
* Tom White, [http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 Running Hadoop MapReduce on Amazon EC2 and S3], Amazon Web Services Articles and Tutorials, 2007.
 
* Tom White, [http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 Running Hadoop MapReduce on Amazon EC2 and S3], Amazon Web Services Articles and Tutorials, 2007.

Revision as of 11:59, 4 April 2010


Main Page | Syllabus | Schedule | Links & Resources


Resources: References & Bibliography for CSC352

General Knowledge Papers

Papers, Articles and University Courses on Parallel & Distributed Processing

Videos: Big Data and Analytics


A
video by Linkedin's Chief Scientist DJ Patil. As a mathematician specializing in dynamical systems and chaos theory, DJ began his career as a weather forecaster working for the Federal government. DJ shares his observations on how analytics has changed in recent years, especially as Big Data increasingly becomes common.

Roger Magoulas, from O'Reily Radar, discusses "big data" (10 minutes).

Jeff Veen: Designing for "Big Data", April 2009.

Documentation on Python Threads

SmilingPython.png

Documentation on XGrid

XgridLogo.png

General References

Applications

Documentation on Cloud Computing, Map-Reduce, & Hadoop

"Failure is the defining difference between distributed and local programming"

Ken Arnold, CORBA Designer

Literature

Tutorials

Media Reports

News Feed

Class Material on the Web

Software/Web Links

HadoopCartoon.png
The IBM MapReduce Tools for Eclipse Plug-in is a robust plug-in that brings Hadoop support to the Eclipse platform. Features include server configuration, support for launching MapReduce jobs and browsing the distributed file system. This setup assumes that you are running Eclipse (version 3.3 or above) on your computer.

Videos

Visualizations

  • Visualizations of Hadoop Data Transfers, from the U. of Nebraska (more videos)




  • Monitoring a Cluster of Computer as a school of fish (U. Nebraska)





Notes

Cloud Cluster @ Smith















class notes