Difference between revisions of "CSC352 Syllabus 2013"

From dftwiki3
Jump to: navigation, search
Line 1: Line 1:
 
--[[User:Thiebaut|D. Thiebaut]] ([[User talk:Thiebaut|talk]]) 10:55, 9 August 2013 (EDT)
 
--[[User:Thiebaut|D. Thiebaut]] ([[User talk:Thiebaut|talk]]) 10:55, 9 August 2013 (EDT)
 
----
 
----
[[Image:couldComputing.png | right |300px]]
+
[[Image:CloudComputingCartoon.jpg | right |300px]]
 
__TOC__
 
__TOC__
 
<center>[[CSC352 2013 | Main Page]] | [[CSC352 Syllabus 2013 | Syllabus]] | [[CSC352 Class Page 2013 | Schedule]] |
 
<center>[[CSC352 2013 | Main Page]] | [[CSC352 Syllabus 2013 | Syllabus]] | [[CSC352 Class Page 2013 | Schedule]] |

Revision as of 10:09, 9 August 2013

--D. Thiebaut (talk) 10:55, 9 August 2013 (EDT)


CloudComputingCartoon.jpg
Main Page | Syllabus | Schedule | Links & Resources


Prof

Dominique Thiébaut (dthiebaut at smith.edu)
Dept. Computer Science
Ford Hall 356
Telephone: 3854
Office hours TBA and by appointments

Introduction

Parallel and Distributed Processing (formally Parallel Processing) is a seminar mixing theory and programming that explores the issues facing today's programmers needing to process data existing in either a large volume, or distributed over a network (local or the Internet).

The course this semester centers on a problem and formulating a solution for it. The problem is to take a large collection of images (possibly several million images) and creating a collage of them where images are scaled according to some quantity (popularity, frequency of use, date of posting, date of last viewing, etc...). The goal is to understand the various tasks required to process such large amount of data, investigate various parallel computing resources, and test different approaches to solve this problem in an acceptable time (minutes or hours rather than days or years of computation!).

The class mixes lectures, the reading and presentation of research papers, and programming assignments/projects.

We discuss different levels of parallelism, uncovering it in processors, in operating systems, and seek our teeth in multi-threading which we explore with Java. We learn about various theoretical approaches to safely share data (e.g. semaphores), and the problems one can expect by not adopting safe solution (deadlocks).

The final paradigm we visit is the processing of data in parallel on a grand scale and we learn about Google's Map-Reduce solution for processing large amount of textual data. We study Hadoop, the open-source version of Map-Reduce on a local cluster of computers.

The goal of the semester's work is a project that includes a parallel program that solves one of the problems associated with the processing of the collection of images, the redaction of a scientific paper describing the semester-long research using the LaTex document formatting language.

Newsletter

Everybody will be responsible for generating a 2-page newsletter every other week.

Homework assignments/Projects

There will be homework assignments and a project. The homework assignments will contribute to the advancement of the project.

Smith Cloud

We will use different computer clusters available on campus. More on this later.

Presentations

We'll read, present and discuss papers during the semester. Papers will be posted on the Links & Resources page. More information will be available as we proceed through the semester.


Whenever a paper is scheduled for presentation or discussion, everybody not presenting the paper is responsible for handing out at the beginning of the class a one-page (possibly two pages) with a summary of the paper, in 3 parts:

  • a one-sentence summary of the paper
  • a one-paragraph summary of the paper
  • a half-page summary of the paper.

Prerequisites

Algorithms CSC252, or permission of the instructor. A good knowledge of Java is important.

Schedule

The class meets twice a week, on Tuesdays and Thursdays, 1:00-2:50 p.m., in Ford Hall 345.

Textbook

There are no textbooks for this course. The Web has a rich collection of documents we'll be using and which are catalogued in the Links & Resources page.

Other Sources of Material

The science library has a good collection of books on parallel processing and algorithms that you might find useful for supplementing the material presented and covered in class. "Parallel algorithm", "Parallel Programming," or "Grid Computing" are good keywords to start a search on.

Lateness Policy

No late assignment/paper summariy/project will be accepted (except in case of documented illness or personal difficulties). Do your work on time!

You can, however, drop any one homework assignment and any one reading assignment without penalty. If you do not drop any assignment and do not drop any assigned reading, I will remove the ones with the lowest grade automatically.

Grading

Class participation (summaries, class notes, discussion)       
Homework
Project
Paper presentations       

10%
15%
60%
15%


Teaching Assistants

No TA for this class.