CSC352 MapReduce/Hadoop Class Notes
Outline
- History
- Infrastructure
- Submitting a Job
- Smith Cluster
- Example 1: Java
- Example 2: Python
- Useful Commands
References
- Apache Hadoop Tutorial: http://hadoop.apache.org/common/docs/current/mapred_tutorial.html
- Wikipedia: http://en.wikipedia.org/wiki/MapReduce
History
- Introduced in 2004
- MapReduce is a patented[1] software framework introduced by Google to support distributed computing on large data sets on clusters of computers. [wikipedia]
- 2010 first conference: The First International Workshop on MapReduce and its Applications (MAPREDUCE'10). (http://graal.ens-lyon.fr/mapreduce/) Interesting tidbit: nobody from Google on planning committee. Mostley INRIA