Difference between revisions of "CSC352 Syllabus"

From dftwiki3
Jump to: navigation, search
(Presentations)
(Grading)
 
(50 intermediate revisions by the same user not shown)
Line 1: Line 1:
----
+
[[Image:couldComputing.png | right |300px]]
 
__TOC__
 
__TOC__
<br />
+
<center>[[CSC352 | Main Page]] | [[CSC352_Syllabus | Syllabus]] | [[CSC352_Class_Page | Schedule]] |
<br />
+
[[CSC352 Resources | Links &amp; Resources]]</center><br />
 
 
<center> [[CSC352|Main Page]] | [[CSC352_Syllabus|Syllabus]] | [[CSC352_Class_Page | Schedule]] </center>
 
<br />
 
 
<br />
 
<br />
  
Line 14: Line 11:
 
Telephone: 3854<br />
 
Telephone: 3854<br />
 
Office hours TBA and by appointments
 
Office hours TBA and by appointments
|}
 
  
 
=Introduction=
 
=Introduction=
  
Parallel and Distributed Processing (formally Parallel Processing) is a seminar mixing theory and programming that explores the issues facing today's programmers in need to process data existing in either a large volume, or distributed over the Internet.
+
[[Image:SmilingPython.png | right | 75px]]Parallel and Distributed Processing (formally Parallel Processing) is a seminar mixing theory and programming that explores the issues facing today's programmers in need to process data existing in either a large volume, or distributed over the Internet.
  
 
The class mixes lectures, the reading and presentation of research papers, and programming assignments/projects.
 
The class mixes lectures, the reading and presentation of research papers, and programming assignments/projects.
  
We start at the micro level of parallelism, revisiting processor interrupts and their functionality, observing once again (see your notes on assembly language and operating systems) that they are the main agent of parallelism in a computer.  After a quick review of interrupts, we move to threading with Python, using this platform to study how performance is assessed in a parallel environment, and how to recognize problems associated with sharing resources, including deadlocks, deadlock detection, and deadlock prevention.  A first project caps the unit on Python threads.
+
[[Image:XgridLogo.png | right | 75px]] We start at the micro level of parallelism, revisiting processor interrupts and their functionality, observing once again (see your notes on assembly language and operating systems) that they are the main agent of parallelism in a computer.  After a quick review of interrupts, we move to threading with Python, using this platform to study how performance is assessed in a parallel environment, and how to recognize problems associated with sharing resources, including deadlocks, deadlock detection, and deadlock prevention.  A first project caps the unit on Python threads.
  
 
We next switch scale and work with distributed processing and explore ''grid computing'' with Apple's XGrid environment on Smith College's 88-processor XGrid cluster, and a project caps this unit.
 
We next switch scale and work with distributed processing and explore ''grid computing'' with Apple's XGrid environment on Smith College's 88-processor XGrid cluster, and a project caps this unit.
  
The final paradigm we visit is parallel on a grand scale: Google's Map-Reduce programming solution for processing large amount of textual data.  We will explore Hadoop, the open-source version of Map-Reduce on a local cluster of computers which will be built from scratch during the beginning of the semester.  A project will cap this unit as well.  
+
[[Image:HadoopCartoon.png | right | 75px]] The final paradigm we visit is parallel on a grand scale: Google's Map-Reduce programming solution for processing large amount of textual data.  We will explore Hadoop, the open-source version of Map-Reduce on a local cluster of computers which will be built from scratch during the beginning of the semester.  A project will cap this unit as well.
  
 
=Class Notes=
 
=Class Notes=
  
Everybody is responsible for transcribing the notes for the class and posting them on the wiki, in a rotation pattern (roughly once a month for each person in the class).
+
Everybody will be responsible for transcribing the notes for the class and posting them on the wiki, in a rotation pattern (roughly once a month for each person in the class).
   
+
 
 +
=Homework assignments/Projects=
 +
There will be homework assignments and 3 projects.  The homework assignments will be used to create various solutions that will be included in the projects.  
 +
 
 +
There will be 3 projects, roughly one month apart, and capping the material covered in each section.  More details will be available as we go along.  The current project ideas are the following:
 +
;Project 1:
 +
:Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1.  Filter the docs received and keep only those that contain most of the words in List2.
 +
 
 +
;Project 2:
 +
:XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)!
 +
 
 +
;Project 3:
 +
:Map-Reduce: process wikipedia pages and create an index of words and their associated categories
 +
 
 +
;Project 4:
 +
:Setup of Cloud Cluster.  Self-scheduled, lasting until Spring break.  Teams of two students will setup a PC with Ubuntu and Hadoop and contribute to documentation ([http://cs.smith.edu/classwiki/index.php/CSC352_Hadoop_Cluster_Howto#Workstation_Setup Wiki Setup Page])
 +
 
 
=Smith Cloud=
 
=Smith Cloud=
  
Line 38: Line 50:
 
=Presentations=
 
=Presentations=
  
We'll read, present and discuss papers during the semester.  Most papers are already posted on the [[CSC352_Schedule | schedule page]].  More information will be available as we proceed through the semester.
+
We'll read, present and discuss papers during the semester.  Most papers are already posted on the [[CSC352 Resources | Links &amp; Resources]] page.  More information will be available as we proceed through the semester.
 +
 
 +
<!--For the presenters, the following [http://www.cs.swarthmore.edu/~newhall/presentation.html page] from Prof. Tia Newall of Swarthmore College for good advice on preparing a presentation.-->
 +
 
 +
Whenever a paper is scheduled for presentation or discussion, everybody not presenting the paper is responsible for handing out at the beginning of the class a one-page (possibly two pages) with a summary of the paper, in 3 parts:
 +
* a one-sentence summary of the paper
 +
* a one-paragraph summary of the paper
 +
* a half-page summary of the paper.
  
 
=Prerequisites=
 
=Prerequisites=
  
Algorithms CSC252, or permission of the instructor.
+
Algorithms CSC252, or permission of the instructor.  A good knowledge of C and Java is important.
  
 
=Schedule=
 
=Schedule=
Line 50: Line 69:
 
=Textbook=
 
=Textbook=
  
There are no textbooks for this course.  The Web has a rich collection of documents we'll be using and which are catalogued in the [[CSC352_Schedule| schedule page].
+
There are no textbooks for this course.  The Web has a rich collection of documents we'll be using and which are catalogued in the[[CSC352 Resources | Links &amp; Resources]] page.
  
 
=Other Sources of Material=
 
=Other Sources of Material=
Line 58: Line 77:
 
=Lateness Policy=
 
=Lateness Policy=
  
No late assignments/projects will be accepted (except in case of documented illness or personal difficulties).  
+
No late assignment/paper summariy/project will be accepted (except in case of documented illness or personal difficulties).
 +
Do your work on time!
 +
 
 +
<font color="red">You can, however, drop any one homework assignment and any one reading assignment without penalty.</font>  If you do not drop any assignment and do not drop any assigned reading, I will remove the ones with the lowest grade automatically.
  
 
=Grading=
 
=Grading=
 +
 +
You can pick between 3 options for the final grade.  If you do not make your choice known <font color="red">before the last day of class</font>, Option 1, the original grading option, will be used.
 +
 +
==Option 1==
  
 
{|
 
{|
|  
+
|
Class participation <br />
+
Class participation (summaries, class notes, discussion) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br />
Projects <br />
+
Homework <br />
Presentation <br />
+
Projects (equal weight for all 3)<br />
|-
+
Paper presentations &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
20% <br />
+
|
 +
10% <br />
 +
15% <br />
 
60% <br />
 
60% <br />
 +
15%
 +
|}
 +
 +
==Option 2==
 +
{|
 +
|
 +
Class participation (summaries, class notes, discussion) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br />
 +
Homework <br />
 +
Projects (equal weight for all 3)<br />
 +
Paper presentations &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
 +
|
 +
15% <br />
 
20% <br />
 
20% <br />
 +
50% <br />
 +
15%
 +
|}
 +
 +
==Option 3 ==
 +
Option 3 is the same as Option 1 but with more weight for Project 3.
 +
 +
{|
 +
|
 +
Class participation (summaries, class notes, discussion) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br />
 +
Homework <br />
 +
Project 1 <br />
 +
Project 2 <br />
 +
Project 3 <br />
 +
Paper presentations &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
 +
|
 +
10% <br />
 +
15% <br />
 +
10% <br />
 +
10% <br />
 +
40% <br />
 +
15%
 
|}
 
|}
  
Line 76: Line 138:
  
 
No TA for this class.
 
No TA for this class.
 +
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
<br />
 +
[[Category:CSC352]][[Category:Class]][[Category:Syllabus]]
 +
<br />

Latest revision as of 23:38, 9 May 2010

CouldComputing.png
Main Page | Syllabus | Schedule | Links & Resources


Prof

Dominique Thiébaut email
Dept. Computer Science
Ford Hall 356
Telephone: 3854
Office hours TBA and by appointments

Introduction

SmilingPython.png
Parallel and Distributed Processing (formally Parallel Processing) is a seminar mixing theory and programming that explores the issues facing today's programmers in need to process data existing in either a large volume, or distributed over the Internet.

The class mixes lectures, the reading and presentation of research papers, and programming assignments/projects.

XgridLogo.png
We start at the micro level of parallelism, revisiting processor interrupts and their functionality, observing once again (see your notes on assembly language and operating systems) that they are the main agent of parallelism in a computer. After a quick review of interrupts, we move to threading with Python, using this platform to study how performance is assessed in a parallel environment, and how to recognize problems associated with sharing resources, including deadlocks, deadlock detection, and deadlock prevention. A first project caps the unit on Python threads.

We next switch scale and work with distributed processing and explore grid computing with Apple's XGrid environment on Smith College's 88-processor XGrid cluster, and a project caps this unit.

HadoopCartoon.png
The final paradigm we visit is parallel on a grand scale: Google's Map-Reduce programming solution for processing large amount of textual data. We will explore Hadoop, the open-source version of Map-Reduce on a local cluster of computers which will be built from scratch during the beginning of the semester. A project will cap this unit as well.

Class Notes

Everybody will be responsible for transcribing the notes for the class and posting them on the wiki, in a rotation pattern (roughly once a month for each person in the class).

Homework assignments/Projects

There will be homework assignments and 3 projects. The homework assignments will be used to create various solutions that will be included in the projects.

There will be 3 projects, roughly one month apart, and capping the material covered in each section. More details will be available as we go along. The current project ideas are the following:

Project 1
Threading in Python: given two lists of keywords, List1 and List2, retrieve docs from a site (xgridmac.dyndns.org, yahoo, google) that respond/match List1. Filter the docs received and keep only those that contain most of the words in List2.
Project 2
XGrid: process a gzip xml dump of wikipedia and break it up into individual pages (9 million or so of them)!
Project 3
Map-Reduce: process wikipedia pages and create an index of words and their associated categories
Project 4
Setup of Cloud Cluster. Self-scheduled, lasting until Spring break. Teams of two students will setup a PC with Ubuntu and Hadoop and contribute to documentation (Wiki Setup Page)

Smith Cloud

6 PCs recovered from Burton Basement are awaiting to be reincarnated in a networked cluster of Ubuntu machines running the hadoop software. Once initialized and connected together they will form Smith's first cloud computing platform. One of the required projects for the class is for students to pair up in teams and each setup one of the computers, documenting the process in the class wiki.

Presentations

We'll read, present and discuss papers during the semester. Most papers are already posted on the Links & Resources page. More information will be available as we proceed through the semester.


Whenever a paper is scheduled for presentation or discussion, everybody not presenting the paper is responsible for handing out at the beginning of the class a one-page (possibly two pages) with a summary of the paper, in 3 parts:

  • a one-sentence summary of the paper
  • a one-paragraph summary of the paper
  • a half-page summary of the paper.

Prerequisites

Algorithms CSC252, or permission of the instructor. A good knowledge of C and Java is important.

Schedule

The class meets twice a week, on Tuesdays and Thursdays, 10:30 am - 11:50 am, in Ford Hall 342.

Textbook

There are no textbooks for this course. The Web has a rich collection of documents we'll be using and which are catalogued in the Links & Resources page.

Other Sources of Material

The science library has a good collection of books on parallel processing and algorithms that you might find useful for supplementing the material presented and covered in class. "Parallel algorithm", "Parallel Programming," or "Grid Computing" are good keywords to start a search on.

Lateness Policy

No late assignment/paper summariy/project will be accepted (except in case of documented illness or personal difficulties). Do your work on time!

You can, however, drop any one homework assignment and any one reading assignment without penalty. If you do not drop any assignment and do not drop any assigned reading, I will remove the ones with the lowest grade automatically.

Grading

You can pick between 3 options for the final grade. If you do not make your choice known before the last day of class, Option 1, the original grading option, will be used.

Option 1

Class participation (summaries, class notes, discussion)       
Homework
Projects (equal weight for all 3)
Paper presentations       

10%
15%
60%
15%

Option 2

Class participation (summaries, class notes, discussion)       
Homework
Projects (equal weight for all 3)
Paper presentations       

15%
20%
50%
15%

Option 3

Option 3 is the same as Option 1 but with more weight for Project 3.

Class participation (summaries, class notes, discussion)       
Homework
Project 1
Project 2
Project 3
Paper presentations       

10%
15%
10%
10%
40%
15%

Teaching Assistants

No TA for this class.