Difference between revisions of "CSC352 Resources"
(→Documentation on Cloud Computing, Map-Reduce, & Hadoop) |
(→Documentation on Cloud Computing, Map-Reduce, & Hadoop) |
||
Line 125: | Line 125: | ||
* Tom White, [http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 Running Hadoop MapReduce on Amazon EC2 and S3], Amazon Web Services Articles and Tutorials, 2007. | * Tom White, [http://developer.amazonwebservices.com/connect/entry.jspa?externalID=873 Running Hadoop MapReduce on Amazon EC2 and S3], Amazon Web Services Articles and Tutorials, 2007. | ||
* Robert Sosinski, [http://www.robertsosinski.com/2008/01/26/starting-amazon-ec2-with-mac-os-x/ Starting Amazon EC2 with Mac OS X], www.robertsosinski.com, 2008. | * Robert Sosinski, [http://www.robertsosinski.com/2008/01/26/starting-amazon-ec2-with-mac-os-x/ Starting Amazon EC2 with Mac OS X], www.robertsosinski.com, 2008. | ||
+ | * [http://developer.amazonwebservices.com/connect/entry.jspa?externalID=848&categoryID=135 Introduction to Java for AWS developers], Amazon Web Services, 2007. | ||
===Media Reports=== | ===Media Reports=== |
Revision as of 08:19, 20 March 2010
Resources: References & Bibliography for CSC352
General Knowledge Papers
- Von Neumann J., First Draft of a Report on the EDVAC, Moore School of Electrical Engineering, University of Pennsylvania, June 30, 1945. (Especially interesting are the first 5 pages)
- Rob Weir's 4Z Method for reviewing papers.
Papers, Articles and University Courses on Parallel & Distributed Processing
- Parallelism in general
- Asanovic K. et al, The Landscape of Parallel Computing Research: A View from Berkeley, Dec. 2006. (cached copy)
- Performance Evaluation
- Lei Hu, and I. Gorton, Performance Evaluation for Parallel Systems: A Survey, University of NSW, Technical Report UNSW-CSE-TR-9707, October 1997 (cached copy)
- Video Tutorial: Optimizing Performance in Parallel Processing, Jan. 2010 (19 minutes)
- Amdahl's Law in the Multicore Era, Mark Hill and Michael Marty, IEEE Computer, July 2008, and accompanying dynamic graph. (cached copy)
- Xen
- Mauer, R., Xen Virtualization and Linux Clustering, Linux Journal January 12th, 2006
- Barham P., et al., Xen and the Art of Virtualization, University of Cambridge Computer Laboratory 15 JJ Thomson Avenue, Cambridge, UK, CB3 0FD
- AMD News
- Hardwidge, B., AMD plans supercomputer with 1,000 GPUs, Jan. 2009, bit-tech.net (or graphics goes to the clouds!)
- Halfacree G., AMD supercomputer tops TOP500 list, November 2009, bit-tech.net (or Intel gets a black eye!)
- Google University Code
- Lecture Notes by Paul Krzyzanowski for a course on Distributed Computing at Rutgers. Quite complete, and covering the basics of parallelism, RPC, synchronization, fault tolerance, security, and distributed file systems.
- The Fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft Research, 2009. Table of Contents. A superb collection of essays on different topics (Low-res cached copy). The main chapters are:
- Part 1: Earth and Environment
- Part 2: Health and Wellbeing
- Part 3: Scientific Infrastructure
- Part 4: Scholarly Communication
- Final Thoughts
- Threading
- D. Tullsen, S. Eggers, and H. M. Levy, Simultaneous Multithreading: Maximizing On-Chip Parallelism, Proc. ISCA, Santa Margherita Ligure, Italy, 1997 (cached copy)
- Xgrid
- Hughes, B., Building Computational Grids with Apple's XGrid Middleware, ACM International Conference Proceeding Series, Vol. 167, Hobart, Tasmania, Australia, 2006. (cached copy)
- Tsouloupas G, and M. Dikaiakos, Characterization of Computational Grid Resources Using Low-Level Benchmarks, Second IEEE International Conference on e-Science and Grid Computing, Amsterdam, Netherlands, 2006 (cached copy)
Videos: Big Data and Analytics
A |
video by Linkedin's Chief Scientist DJ Patil. As a mathematician specializing in dynamical systems and chaos theory, DJ began his career as a weather forecaster working for the Federal government. DJ shares his observations on how analytics has changed in recent years, especially as Big Data increasingly becomes common. |
|
Roger Magoulas, from O'Reily Radar, discusses "big data" (10 minutes). |
|
Jeff Veen: Designing for "Big Data", April 2009. |
Documentation on Python Threads
| ||
Documentation on XGrid
- Introduction: What's an XGrid system?
- XGrid Overview from Apple
- Videos
- A Video presentation of the XGrid (click on movie reel icon to start).
- A YouTube short video showing the XGrid running the Mandelbrot demo.
- A very good overview of the XGrid from macdevcenter.com
- Programming Examples, Setup, and References relating to the XGrid system at Smith College.
- Tutorial #1: Monte Carlo
General References
- XGrid Admin and High Performance Computing document (PDF)
- Apple Xgrid
- Apple Xgrid FAQ
- MacDevCenter
- MacResearch
- Stanford Xgrid
- Utah Xgrid
Applications
- XGrid Programming Guide
- An Introduction to R
- POVray on the XGrid
- Stanford Xgrid: One of the largest XGrid systems around.
- Utah Xgrid: Lots of good stuff.
- Using the Mathematica Kernel.
Documentation on Cloud Computing, Map-Reduce, & Hadoop
"Failure is the defining difference between distributed and local programming"
Ken Arnold, CORBA Designer
Literature
- Hadoop, the definitive guide, Tim White, O'Reilly Media, June 2009, ISBN 0596521979. The Web site for the book is http://www.hadoopbook.com/ (with the data used as examples in the book)
- Dean, J., and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Dec. 2004, (cached copy)
- Czajkowski G., Sorting 1 PB with MapReduce, Nov. 2008, (cached copy)
- Armbrust M, et al, Above the Clouds: A Berkeley View of Cloud Computing, Tech Rep. CB/EECS-2009-28, Feb. 2009 (cached copy)
- Olson C. et. al., Pig Latin: A Not-So-Foreign Language for Data Processing, SIGMOD’08, June 9–12, 2008, Vancouver, BC, Canada.
- Ghemawat S., H. Gobioff, and S.T. Leung, The Google File System, SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA.
- The Fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft Research, 2009. Table of Contents, (Low-res cached copy).
- Multicore Computing and Scientific Discovery, by Larus and Gannon
- Parallelism and the Cloud, by Gannon and Reed
- Visualization and Data-Intensive Science by Hansen, Johnson, Pascucci, and Silva.
- Talbot D., Security in the Ether, Technology Review, Jan/Feb 2010. (cached copy)
- HadoopWiki, Partitioning your job into Maps and Reduces, 2009.
Tutorials
- Tom White, Running Hadoop MapReduce on Amazon EC2 and S3, Amazon Web Services Articles and Tutorials, 2007.
- Robert Sosinski, Starting Amazon EC2 with Mac OS X, www.robertsosinski.com, 2008.
- Introduction to Java for AWS developers, Amazon Web Services, 2007.
Media Reports
- Markoff, J., A Deluge of Data Shapes a New Era in Computing, New York Times, 12/15/09
News Feed
- cloud-computing.alltop.com: aggregated news about the cloud
Class Material on the Web
- Google's series of 4 lectures on map-reduce, distributed file-system, and clustering algorithms.
- University of Washington: Problem Solving on Large Scale Clusters
- Brandeis University: Distributed Systems Course
- Google: Introduction to Parallel Programming and MapReduce
- U. C. Berkeley: Intro to Parallel Programming and Threading
- California PolyTech: A lab on the NetFlix data set
- New Mexico Tech: syllabus (pdf)
- U. Maryland: Syllabus, and Jimmy Lin's Cloud 9 page.
Software/Web Links
- Apache's Documentation on Hadoop Common
- The Hadoop Tutorial from Apache. A "Must-Do!"
- Hadoop Streaming, i.e. using Hadoop with Python, for example.
- A Yahoo Tutorial on Hadoop. Another "Must-Do!"
- An Hadoop-On-Eclipse tutorial. For Windows platform but works for Macs as well. Best way to setup Eclipse! You will need Eclipse 3.3.2 and Hadoop 0.19.1.
- The Hadoop-Book Web site.
- The Hadoop Wiki, the authoritative source on working with Hadoop. Many examples in Java and Python
- Hadoop at Google: A preconfigured single node instance available at Google.
- Writing the WordCount in Python
- Guide for setting up IBM's Eclipse Tools for Hadoop (go to bottom of page)
- The IBM MapReduce Tools for Eclipse Plug-in is a robust plug-in that brings Hadoop support to the Eclipse platform. Features include server configuration, support for launching MapReduce jobs and browsing the distributed file system. This setup assumes that you are running Eclipse (version 3.3 or above) on your computer.
- Guide from Cornell for setting up Hadoop on a Mac.
- Configuring Eclipse for Hadoop A video from Cloudera on setting up Hadoop... not easy to follow...
- The source code for the examples that come with the Hadoop 0.19.1 distribution. Includes WordCount, WordCountAggregate, WordCountHistogram, PiEstimator, Join, and Grep, among others.
Videos
- Google's series of 4 lectures on map-reduce, distributed file-system, and clustering algorithms.
- Berkeley lecture on Map-Reduce (CS 61A Lecture 34)
- A video of Tom White, author of O'Reilly's Hadoop guide, on BlipTV. White outlines the suite of projects centered around Hadoop ( an open source Map / Reduce project)
- Cloudera's collection of videos.
- CNBC's report: Inside the Mind of Google. "The best way to watch “Inside the Mind of Google,” Maria Bartiromo’s report on the Internet giant Thursday on CNBC, is to not watch the first quarter of it. (from Neil enzlinger's 12/02/09 NYT review)
- Short video by consultant at http://www.stratoslearning.com (5 min) . Outlines a course on Cloud Computing.
- Part I: cloud fondamentals
- Part II: technology and barriers
- Part III: security
- Part IV: what options? players?
- Part V: Application, hands on
- Users Amazon as test platform.