CSC352 Class Page 2010
Contents
Python Threads
Week | Topics | Reading |
Week 1 1/25 |
|
|
Week 2 2/1 |
|
|
Week 3 2/8 |
|
|
XGrid Programming
Cloud Computing
Resources: References & Bibliography
Parallel Processing/Good background information
- Asanovic K. et al, The Landscape of Parallel Computing Research: A View from Berkeley, Dec. 2006. (cached copy)
- Xen
- Mauer, R., Xen Virtualization and Linux Clustering, Linux Journal January 12th, 2006
- Barham P., et al., Xen and the Art of Virtualization, University of Cambridge Computer Laboratory 15 JJ Thomson Avenue, Cambridge, UK, CB3 0FD
- AMD News
- Hardwidge, B., AMD plans supercomputer with 1,000 GPUs, Jan. 2009, bit-tech.net (or graphics goes to the clouds!)
- Halfacree G., AMD supercomputer tops TOP500 list, November 2009, bit-tech.net (or Intel gets a black eye!)
- Google University Code
- Lecture Notes by Paul Krzyzanowski for a course on Distributed Computing at Rutgers. Quite complete, and covering the basics of parallelism, RPC, synchronization, fault tolerance, security, and distributed file systems.
- The Fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft Research, 2009. Table of Contents. A superb collection of essays on different topics (Low-res cached copy). The main chapters are:
- Part 1: Earth and Environment
- Part 2: Health and Wellbeing
- Part 3: Scientific Infrastructure
- Part 4: Scholarly Communication
- Final Thoughts
Python Threads
| ||
XGrid
- What's an XGrid system?
- XGrid Overview from Apple
- A Video presentation of the XGrid (click on movie reel icon to start).
- A very good overview of the XGrid from macdevcenter.com
- Programming Examples, Setup, and References
General References
- XGrid Admin and High Performance Computing document (PDF)
- Apple Xgrid
- Apple Xgrid FAQ
- MacDevCenter
- MacResearch
- Stanford Xgrid
- Utah Xgrid
Applications
- XGrid Programming Guide
- An Introduction to R
- POVray on the XGrid
- Stanford Xgrid: One of the largest XGrid systems around.
- Utah Xgrid: Lots of good stuff.
- Using the Mathematica Kernel.
Cloud Computing
"Failure is the defining difference between distributed and local programming"
Ken Arnold, CORBA Designer
Literature
- Hadoop, the definitive guide, Tim White, O'Reilly Media, June 2009, ISBN 0596521979. The Web site for the book is http://www.hadoopbook.com/ (with the data used as examples in the book)
- Dean, J., and S. Ghemawat, MapReduce: Simplified Data Processing on Large Clusters, Dec. 2004, (cached copy)
- Czajkowski G., Sorting 1 PB with MapReduce, Nov. 2008, (cached copy)
- Armbrust M, et al, Above the Clouds: A Berkeley View of Cloud Computing, Tech Rep. CB/EECS-2009-28, Feb. 2009 (cached copy)
- Olson C. et. al., Pig Latin: A Not-So-Foreign Language for Data Processing, SIGMOD’08, June 9–12, 2008, Vancouver, BC, Canada.
- Ghemawat S., H. Gobioff, and S.T. Leung, The Google File System, SOSP’03, October 19–22, 2003, Bolton Landing, New York, USA.
- The Fourth Paradigm: Data-Intensive Scientific Discovery, Microsoft Research, 2009. Table of Contents, (Low-res cached copy).
- Multicore Computing and Scientific Discovery, by Larus and Gannon
- Parallelism and the Cloud, by Gannon and Reed
- Visualization and Data-Intensive Science by Hansen, Johnson, Pascucci, and Silva.
Media Reports
- Markoff, J., A Deluge of Data Shapes a New Era in Computing, New York Times, 12/15/09
Class Material on the Web
- Google's series of 4 lectures on map-reduce, distributed file-system, and clustering algorithms.
- University of Washington: Problem Solving on Large Scale Clusters
- Brandeis University: Distributed Systems Course
- Google: Introduction to Parallel Programming and MapReduce
- U. C. Berkeley: Intro to Parallel Programming and Threading
- California PolyTech: A lab on the NetFlix data set
Software/Web Links
- Apache's Documentation on Hadoop Common
- The Hadoop Tutorial from Apache. A "Must-Do!"
- Hadoop Streaming, i.e. using Hadoop with Python, for example.
- A Yahoo Tutorial on Hadoop. Another "Must-Do!"
- An Hadoop-On-Eclipse tutorial. For Windows platform but works for Macs as well. Best way to setup Eclipse! You will need Eclipse 3.3.2 and Hadoop 0.19.1.
- The Hadoop-Book Web site.
- The Hadoop Wiki, the authoritative source on working with Hadoop. Many examples in Java and Python
- Hadoop at Google: A preconfigured single node instance available at Google.
- Writing the WordCount in Python
- Guide for setting up IBM's Eclipse Tools for Hadoop (go to bottom of page)
- The IBM MapReduce Tools for Eclipse Plug-in is a robust plug-in that brings Hadoop support to the Eclipse platform. Features include server configuration, support for launching MapReduce jobs and browsing the distributed file system. This setup assumes that you are running Eclipse (version 3.3 or above) on your computer.
- Guide from Cornell for setting up Hadoop on a Mac.
- Configuring Eclipse for Hadoop A video from Cloudera on setting up Hadoop... not easy to follow...
- The source code for the examples that come with the Hadoop 0.19.1 distribution. Includes WordCount, WordCountAggregate, WordCountHistogram, PiEstimator, Join, and Grep, among others.
Videos
- Google's series of 4 lectures on map-reduce, distributed file-system, and clustering algorithms.
- A video of Tom White, author of O'Reilly's Hadoop guide, on BlipTV. White outlines the suite of projects centered around Hadoop ( an open source Map / Reduce project)
- Cloudera's collection of videos.
- CNBC's report: Inside the Mind of Google. "The best way to watch “Inside the Mind of Google,” Maria Bartiromo’s report on the Internet giant Thursday on CNBC, is to not watch the first quarter of it. (from Neil enzlinger's 12/02/09 NYT review)