Difference between revisions of "CSC352 Syllabus -- Spring 2017"

From dftwiki3
Jump to: navigation, search
(Software)
 
(7 intermediate revisions by the same user not shown)
Line 19: Line 19:
 
<br />
 
<br />
 
<center>[[CSC_352_--_2017 | Home]] | [[CSC352 Syllabus -- Spring 2017 | Syllabus]] | [[CSC352 Class Page 2017 | Schedule]] |
 
<center>[[CSC_352_--_2017 | Home]] | [[CSC352 Syllabus -- Spring 2017 | Syllabus]] | [[CSC352 Class Page 2017 | Schedule]] |
[[CSC352_Class_Page_2017#Links_and_Resources | Links &amp; Resources]]</center><br />
+
[[CSC352_Class_Page_2017#Links_and_Resources | Links &amp; Resources]] | [[CSC352_Project_Page_2017|Final Project]]</center><br />
 
<br />
 
<br />
  
Line 35: Line 35:
 
Ford Hall 356<br />
 
Ford Hall 356<br />
 
Telephone: 3854<br />
 
Telephone: 3854<br />
Office hours '''Wed 1-4 p.m'''.  and by appointments
+
Office hours '''TuTh 3-5 p.m'''.  and by appointments
 
<br />
 
<br />
  
 
=Introduction=
 
=Introduction=
  
Parallel and Distributed Processing is a seminar mixing '''theory''', '''programming''' and '''research'''.  It explores the issues facing today's programmers in need of process data existing in either a large volume, or distributed over a network (local area, or wide area).
+
Parallel and Distributed Processing is a seminar mixing '''theory''', '''programming''' and '''research'''.  It explores the issues facing today's programmers needing to process data in an efficient way, either because its size, or because the computation is required to be fast.
 
    
 
    
 
The goal of the seminar is to understand how to devise solutions for the various tasks required to process, catalog, sort, and display such a large amount of data.  In the process we will investigate various parallel computing tools, and test different approaches to solve this problem in an acceptable time (minutes or hours rather than days or years of computation!).
 
The goal of the seminar is to understand how to devise solutions for the various tasks required to process, catalog, sort, and display such a large amount of data.  In the process we will investigate various parallel computing tools, and test different approaches to solve this problem in an acceptable time (minutes or hours rather than days or years of computation!).
  
The class mixes lectures, the reading and presentation of research papers, and programming assignments/projects.
+
The class mixes lectures, the reading and presentation of research papers, and programming assignments/[[CSC352_Project_Page_2017 | projects]].
  
 
==Topics Covered==
 
==Topics Covered==
Line 55: Line 55:
 
* ''Message-Passing'' with ''MPI''
 
* ''Message-Passing'' with ''MPI''
 
* ''Manager-Worker'' paradigm on MPI
 
* ''Manager-Worker'' paradigm on MPI
* ''MPI'' and "MySQL''
 
 
* ''Amdahl's law''
 
* ''Amdahl's law''
 
* ''Data caching''
 
* ''Data caching''
* Creating an ''MPI cluster'' and running applications on it
 
 
* ''MapReduce'' and ''Hadoop.''  Running Hadoop applications
 
* ''MapReduce'' and ''Hadoop.''  Running Hadoop applications
  
 
   
 
   
A group project will cap the end of the semester.  The goal of the project will be to pick a topic or problem, generate a parallel solution to solve it, and comparing its performance to the current state of the art, and reporting the results in a research paper written in Latex.
+
A [[CSC352_Project_Page_2017|group project]] will cap the end of the semester.  The goal of the project will be to pick a topic or problem, generate a parallel solution to solve it, and comparing its performance to the current state of the art, and reporting the results in a research paper written in Latex.
 
<br />
 
<br />
  
 
=Newsletter=
 
=Newsletter=
 
+
<br />
 
Students will be responsible for generating a 2-page newsletter every other week.   
 
Students will be responsible for generating a 2-page newsletter every other week.   
 
<br />
 
<br />
  
 
=Homework assignments/Projects=
 
=Homework assignments/Projects=
 
+
<br />
There will be homework assignments and a project.  The homework assignments will contribute to the advancement of the overall project.
+
There will be homework assignments and a [[CSC352_Project_Page_2017|project]].
 
<br />
 
<br />
 
=Piazza=
 
=Piazza=
 +
<br />
 +
We will use Piazza for on-line discussion of issues related to the class material. 
  
On an experimental basis, we will use Piazza four on-line discussion of issues related to the class material.
+
Find our class page at: [https://piazza.com/smith/spring2017/csc352/home  https://piazza.com/smith/spring2017/csc352/home], and its user guide [http://www.piazza.com/pdfs/piazza_product_introduction.pdf here].
The system is  catered to getting you help fast and efficiently from classmates, and your instructor.   When a question is about an assignment, a software bug, or something the whole class could benefit knowing about, you are encouraged to post your questions on Piazza.  
+
<br />
  
Find our class page at: [https://piazza.com/smith/spring2017/csc352/home https://piazza.com/smith/spring2017/csc352/home], and its user guide [http://www.piazza.com/pdfs/piazza_product_introduction.pdf  here].
+
=Smith Cloud=
 
<br />
 
<br />
=Smith Cloud=
+
We will use different computer servers and clusters available on campus, and on Amazon.  More information will be released as the course progresses.
 
 
We will use different computer clusters available on campus.  More information will be released as the course progresses.
 
 
<br />
 
<br />
 
=Presentations=
 
=Presentations=
 +
<br />
 +
We will read, present and discuss papers during the semester.  Papers will be posted on the [[CSC352 Resources 2017| Links &amp; Resources]] page.  More information will be available as we progress through the semester.
  
We'll read, present and discuss papers during the semester.  Papers will be posted on the [[CSC352 Resources 2017| Links &amp; Resources]] page.  More information will be available as we proceed through the semester.
+
Whenever a paper is scheduled for presentation or discussion, everybody not presenting the paper will be <font color="magenta">responsible for handing out at the beginning of the class a one-page summary of the paper, formatted in Latex.</font>
 
 
<!--For the presenters, the following [http://www.cs.swarthmore.edu/~newhall/presentation.html page] from Prof. Tia Newall of Swarthmore College for good advice on preparing a presentation.-->
 
 
 
Whenever a paper is scheduled for presentation or discussion, everybody not presenting the paper will be responsible for handing out at the beginning of the class a one-page summary of the paper, formatted in Latex.
 
 
<br />
 
<br />
 
=Prerequisites=
 
=Prerequisites=
 
+
<br />
Algorithms CSC252, or permission of the instructor.  A good knowledge of  Java is important.
+
Algorithms CSC252 is waved.  A good knowledge of  Java and Python is important.
 
<br />
 
<br />
 
=Schedule=
 
=Schedule=
 
+
<br />
The class meets twice a week, on Tuesdays and Thursdays, 1:00-2:50 p.m., in '''Ford Hall 345'''.
+
The class meets twice a week, on Tuesdays and Thursdays, 1:00-2:50 p.m., in '''Ford Hall 345'''.  Assuming everybody's availability, we will meet during the lunch hour section before class the last 4 lectures of the semester.
 
<br />
 
<br />
  
 
=Textbook=
 
=Textbook=
 
+
<br />
 
There are no textbooks for this course.  The Web has a rich collection of documents we'll be using and which are catalogued in the[[CSC352 Resources | Links &amp; Resources]] page.
 
There are no textbooks for this course.  The Web has a rich collection of documents we'll be using and which are catalogued in the[[CSC352 Resources | Links &amp; Resources]] page.
 
<br />
 
<br />
 
=Other Sources of Material=
 
=Other Sources of Material=
 
+
<br />
 
The science library has a good collection of books on parallel processing and algorithms that you might find useful for supplementing the material presented and covered in class.  "Parallel algorithm", "Parallel Programming," or "Grid Computing" are good keywords to start a search on.
 
The science library has a good collection of books on parallel processing and algorithms that you might find useful for supplementing the material presented and covered in class.  "Parallel algorithm", "Parallel Programming," or "Grid Computing" are good keywords to start a search on.
 +
<br />
 +
Rocco Piccinino, the head of the Science Library, will be giving a presentation on some of the library resources we have available on Feb 16, 2017.
 
<br />
 
<br />
 
=Lateness Policy=
 
=Lateness Policy=
 
+
<br />
 
No late assignment/paper summariy/project will be accepted (except in case of ''documented'' illness or personal difficulties).
 
No late assignment/paper summariy/project will be accepted (except in case of ''documented'' illness or personal difficulties).
 
Do your work on time!
 
Do your work on time!
Line 118: Line 116:
 
<br />
 
<br />
 
=Grading=
 
=Grading=
 
+
<br />
  
 
{|
 
{|
 
|
 
|
Class participation (summaries, class notes, discussion) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br />
+
Class participation (class notes, discussions) &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp; <br />
Homework <br />
+
Homework (summaries, newsletter, programs)<br />
 
Project<br />
 
Project<br />
 
Paper presentations &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
 
Paper presentations &nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;<br />
 
|
 
|
 
10% <br />
 
10% <br />
15% <br />
+
35% <br />
60% <br />
+
35% <br />
15%  
+
20%  
 
|}
 
|}
  
 
<br />
 
<br />
 +
 
=Teaching Assistants=
 
=Teaching Assistants=
 
+
<br />
 
No TA for this class.
 
No TA for this class.
 
<br />
 
<br />
Line 145: Line 144:
 
* [http://matplotlib.org/users/intro.html MatPlotLib] or [http://www.harding.edu/fmccown/r/ R] for processing data and generating graphs.
 
* [http://matplotlib.org/users/intro.html MatPlotLib] or [http://www.harding.edu/fmccown/r/ R] for processing data and generating graphs.
 
* [http://www.mpich.org/ MPI], the Message-Passing Interface platform for parallel programs.  It is installed on beowulf and beowulf2, but you may like to also have it on your computer, although it is not necessary.
 
* [http://www.mpich.org/ MPI], the Message-Passing Interface platform for parallel programs.  It is installed on beowulf and beowulf2, but you may like to also have it on your computer, although it is not necessary.
 +
* [https://en.wikipedia.org/wiki/Apache_Hadoop Hadoop], the open-source version of MapReduce.
 
<br />
 
<br />
 
<br />
 
<br />

Latest revision as of 10:37, 21 March 2017

--D. Thiebaut (talk) 10:32, 17 October 2016 (EDT)


                             

CloudComputingCartoon2.jpg


Home | Syllabus | Schedule | Links & Resources | Final Project


Limited Enrollment


This class is a seminar, with limited enrollment. Make sure you pre-register early to be able to get in. Priority will be given to

  • Seniors, CS majors
  • Juniors, CS majors
  • CS & SDS majors
  • Others


Instructor

Prof. Dominique Thiébaut (dthiebaut at smith.edu)
Dept. Computer Science
Ford Hall 356
Telephone: 3854
Office hours TuTh 3-5 p.m. and by appointments

Introduction

Parallel and Distributed Processing is a seminar mixing theory, programming and research. It explores the issues facing today's programmers needing to process data in an efficient way, either because its size, or because the computation is required to be fast.

The goal of the seminar is to understand how to devise solutions for the various tasks required to process, catalog, sort, and display such a large amount of data. In the process we will investigate various parallel computing tools, and test different approaches to solve this problem in an acceptable time (minutes or hours rather than days or years of computation!).

The class mixes lectures, the reading and presentation of research papers, and programming assignments/ projects.

Topics Covered

  • Writing research papers with Latex
  • Taxonomy of parallel architectures
  • The elementary parallel concept: processor interrupts in the 80X86 processor
  • Differences between processes and threads
  • Data-sharing programming with Java threads, how to avoid data inconsistency and deadlocks with mutexes, locks and semaphores.
  • Learning the language C and working in a Bash console
  • Message-Passing with MPI
  • Manager-Worker paradigm on MPI
  • Amdahl's law
  • Data caching
  • MapReduce and Hadoop. Running Hadoop applications


A group project will cap the end of the semester. The goal of the project will be to pick a topic or problem, generate a parallel solution to solve it, and comparing its performance to the current state of the art, and reporting the results in a research paper written in Latex.

Newsletter


Students will be responsible for generating a 2-page newsletter every other week.

Homework assignments/Projects


There will be homework assignments and a project.

Piazza


We will use Piazza for on-line discussion of issues related to the class material.

Find our class page at: https://piazza.com/smith/spring2017/csc352/home, and its user guide here.

Smith Cloud


We will use different computer servers and clusters available on campus, and on Amazon. More information will be released as the course progresses.

Presentations


We will read, present and discuss papers during the semester. Papers will be posted on the Links & Resources page. More information will be available as we progress through the semester.

Whenever a paper is scheduled for presentation or discussion, everybody not presenting the paper will be responsible for handing out at the beginning of the class a one-page summary of the paper, formatted in Latex.

Prerequisites


Algorithms CSC252 is waved. A good knowledge of Java and Python is important.

Schedule


The class meets twice a week, on Tuesdays and Thursdays, 1:00-2:50 p.m., in Ford Hall 345. Assuming everybody's availability, we will meet during the lunch hour section before class the last 4 lectures of the semester.

Textbook


There are no textbooks for this course. The Web has a rich collection of documents we'll be using and which are catalogued in the Links & Resources page.

Other Sources of Material


The science library has a good collection of books on parallel processing and algorithms that you might find useful for supplementing the material presented and covered in class. "Parallel algorithm", "Parallel Programming," or "Grid Computing" are good keywords to start a search on.
Rocco Piccinino, the head of the Science Library, will be giving a presentation on some of the library resources we have available on Feb 16, 2017.

Lateness Policy


No late assignment/paper summariy/project will be accepted (except in case of documented illness or personal difficulties). Do your work on time!

You can, however, drop any one homework assignment and any one reading assignment without penalty. If you do not drop any assignment and do not drop any assigned reading, I will remove the ones with the lowest grade automatically.

Grading


Class participation (class notes, discussions)       
Homework (summaries, newsletter, programs)
Project
Paper presentations       

10%
35%
35%
20%


Teaching Assistants


No TA for this class.

Software

Below is a non-exhaustive list of software packages we'll use in the class. You may want to investigate installing them on your computer.

  • Java and Eclipse. All serious programmers should know how to use Eclipse, and should have it installed on their computer. One advantage of having Eclipse is that it supports Processing with very little additional effort (see these tutorials for examples of how to set this up).
  • Latex for writing scientific papers. TexStudio is a good visual editor, but there is also a nice on-line editor at sharelatex.com that does not require any installation and works well.
  • MatPlotLib or R for processing data and generating graphs.
  • MPI, the Message-Passing Interface platform for parallel programs. It is installed on beowulf and beowulf2, but you may like to also have it on your computer, although it is not necessary.
  • Hadoop, the open-source version of MapReduce.