Difference between revisions of "CSC352 Resources"

From dftwiki3
Jump to: navigation, search
(Documentation on Python Threads)
(Literature)
Line 113: Line 113:
 
===Literature===
 
===Literature===
 
* [[Media:ApacheChapterOnStreaming.pdf | Apache's chapter on Hadoop Streaming]], Apache.org.
 
* [[Media:ApacheChapterOnStreaming.pdf | Apache's chapter on Hadoop Streaming]], Apache.org.
 +
* [http://answers.oreilly.com/topic/460-how-to-benchmark-a-hadoop-cluster/ How to Benchmark a Hadoop Cluster], by Tom White, [http://answers.oreilly.com O'Reilly Answers], Oct. 2009.
 
* [[Image:hadoopOReilly.jpg | right |100px]] [http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/0596521979  Hadoop, the definitive guide], Tim White, O'Reilly Media, June 2009, ISBN 0596521979.  The Web site for the book is http://www.hadoopbook.com/ (with the data used as examples in the book)
 
* [[Image:hadoopOReilly.jpg | right |100px]] [http://www.amazon.com/Hadoop-Definitive-Guide-Tom-White/dp/0596521979  Hadoop, the definitive guide], Tim White, O'Reilly Media, June 2009, ISBN 0596521979.  The Web site for the book is http://www.hadoopbook.com/ (with the data used as examples in the book)
 
* Dan Sullivan [http://nexus.realtimepublishers.com/dgcc.php The Definitive Guide to Cloud Computing], IBM, 2010, ''in production'' (but can be downloaded in parts).
 
* Dan Sullivan [http://nexus.realtimepublishers.com/dgcc.php The Definitive Guide to Cloud Computing], IBM, 2010, ''in production'' (but can be downloaded in parts).

Revision as of 12:15, 31 July 2010


Main Page | Syllabus | Schedule | Links & Resources


Resources: References & Bibliography for CSC352

General Knowledge Papers

Papers, Articles and University Courses on Parallel & Distributed Processing

Videos: Big Data and Analytics


A
video by Linkedin's Chief Scientist DJ Patil. As a mathematician specializing in dynamical systems and chaos theory, DJ began his career as a weather forecaster working for the Federal government. DJ shares his observations on how analytics has changed in recent years, especially as Big Data increasingly becomes common.

Roger Magoulas, from O'Reily Radar, discusses "big data" (10 minutes).

Jeff Veen: Designing for "Big Data", April 2009.

Documentation on Python Threads

SmilingPython.png

Documentation on XGrid

XgridLogo.png

General References

Applications

Documentation on Cloud Computing, Map-Reduce, & Hadoop

"Failure is the defining difference between distributed and local programming"

Ken Arnold, CORBA Designer

Literature

Collections of Hadoop Papers and/or Algorithms

Presentations

Tutorials

Installation Tutorials

Media Reports

News Feed

Class Material on the Web

Software/Web Links

HadoopCartoon.png
The IBM MapReduce Tools for Eclipse Plug-in is a robust plug-in that brings Hadoop support to the Eclipse platform. Features include server configuration, support for launching MapReduce jobs and browsing the distributed file system. This setup assumes that you are running Eclipse (version 3.3 or above) on your computer.

Videos

Visualizations

  • Visualizations of Hadoop Data Transfers, from the U. of Nebraska (more videos)




  • Monitoring a Cluster of Computers as a school of fish (U. Nebraska).
In this video, the researchers at U. of Nebraska decided to use fish swiming in a tank as a way of displaying what is going on with a cluster of many computers working on a large problem. All the computers are involved in a common computation. Each fish (as far as we can tell, given the lack of better information) represents a computer or a program running on a computer. As the user zooms in on a fish, a blue window pops up to give some vital information about that system's health. Fish change color and size to indicate a change in status. One could imaging that green fish represent computers not doing much work, which orange fish represent computers loaded with work. It is interesting to see how researchers would use a school of fish as a way to indicate what is going on in a cluster of computers, and relying on human beings's ability to recognize visual clues quickly to understand what is going on quickly and accurately. This is certainly better than trying to have the same human beings read tons of log files containing the date and time of many different events occurring in the cluster.




  • The evolution of Hadoop (Code-Swarm)






Notes

Cloud Cluster @ Smith















class notes