Difference between revisions of "CSC352 Class Page 2013"

From dftwiki3
Jump to: navigation, search
(Weekly Schedule)
 
(19 intermediate revisions by the same user not shown)
Line 61: Line 61:
 
*** Regroup and gather statistics on the different machines in the classroom
 
*** Regroup and gather statistics on the different machines in the classroom
  
----
 
**Introduction to '''Latex'''
 
*** [http://cs.smith.edu/dftwiki/index.php/Tutorial:_Writing_a_Latex_paper_with_ShareLatex.com Tutorial #1] on Latex and ShareLatex
 
*** [http://cs.smith.edu/dftwiki/index.php/Latex_Skeleton_for_Simple_Articles_and_Tech_Reports Latex document template]
 
*** [[Latex Example: Bib File: Example Bib File]] for Latex paper (from ACM)
 
*** Learn how to find BibTex entries.  Example: [http://dl.acm.org/citation.cfm?id=1525689 The Unreasonable Effectiveness of Data] (go to ACM and click on BibTex link).
 
*** If you are considering working on an honors thesis, you might want to take a look at this [[Latex and Editing Tools to write an Honors Thesis|  page]] on writing Honors thesis with Latex.
 
<br />
 
 
----
 
----
 
** Comments on '''bimonthly newsletter'''
 
** Comments on '''bimonthly newsletter'''
Line 85: Line 77:
 
**** The Official Google blog
 
**** The Official Google blog
 
**** Review: Tom's Hardware
 
**** Review: Tom's Hardware
**** Some of the  sites listed in [http://tech.blorge.com/Structure:%20/2008/11/15/top-40-technology-news-sites-the-definitive-guide/ blorge.com]'s top-40 list.
+
**** Some of the  sites listed in [https://rohidassanap.wordpress.com/2013/06/18/top-40-best-technology-news-websites-the-definitive-list/ this page's] top 40 list.
 
*** Recommendation for news aggregator:  [http://cloud.feedly.com/#welcome Feedly.com]
 
*** Recommendation for news aggregator:  [http://cloud.feedly.com/#welcome Feedly.com]
  
Line 324: Line 316:
  
 
----
 
----
* [[CSC352 Homework 5 2013 | Homework 5]] will be due 11/14 at 11:59 p.m.
+
* [[CSC352 Homework 5 2013 | Homework 5]] and [[CSC352 Homework 5 Solution 2013| Solution]]
 
||
 
||
 
&nbsp;
 
&nbsp;
Line 368: Line 360:
 
** Finish the [[Tutorial:_Creating_a_Hadoop_Cluster_on_Amazon_AWS | MapReduce lab on AWS]] and make sure you do the [[Tutorial:_Creating_a_Hadoop_Cluster_with_StarCluster_on_Amazon_AWS#Challenge_.23_2 | Challenge 2]] part of the lab.
 
** Finish the [[Tutorial:_Creating_a_Hadoop_Cluster_on_Amazon_AWS | MapReduce lab on AWS]] and make sure you do the [[Tutorial:_Creating_a_Hadoop_Cluster_with_StarCluster_on_Amazon_AWS#Challenge_.23_2 | Challenge 2]] part of the lab.
 
** Food for thought: some videos<br />I suggest one of you connects her laptop to the projection system and you all watch these videos together.  After each one, discuss it as a group.  Take notes and be ready to share your comments during Thursday's class when we resume our regular schedule.
 
** Food for thought: some videos<br />I suggest one of you connects her laptop to the projection system and you all watch these videos together.  After each one, discuss it as a group.  Take notes and be ready to share your comments during Thursday's class when we resume our regular schedule.
 +
*** The Cave 2 Project at the University of Illinois:  Just another hardware solution for presenting the user with a large number of pixels; in this case  27320 x 3072 pixels. ''Short, 3 minutes.''
 +
<center>
 +
<videoflash>yf0sllpZx3w</videoflash>
 +
</center>
 +
<br />
 
*** The Creators Projects video<br />
 
*** The Creators Projects video<br />
 
::::This video is not necessarily anything that can work for us, but it's just "food for thought."  Just a different way an artist has come up to make still pictures interesting to look at.  ''Short, 6 minutes''.
 
::::This video is not necessarily anything that can work for us, but it's just "food for thought."  Just a different way an artist has come up to make still pictures interesting to look at.  ''Short, 6 minutes''.
Line 382: Line 379:
 
</center>
 
</center>
 
<br />
 
<br />
::: Good interview of Tim O'Reilly describing Web 2.0, and his view of a data-driven Internet.  8-minute long.  You may want to think about how our wikipedia data (images, stats) relate to what is said about data is described in the interview.  ''About 8 minutes''.
+
::: Good interview of Tim O'Reilly describing Web 2.0, and his view of a data-driven Internet.  8-minute long.  You may want to think about how our wikipedia data (images, stats) relate to what is said about data as described in the interview.  ''About 8 minutes''.
 
<center>
 
<center>
 
<videoflash>FJ3TxeE_tHI</videoflash>
 
<videoflash>FJ3TxeE_tHI</videoflash>
 
</center>
 
</center>
 
<br />
 
<br />
::: The next video filmed in June 2013 presents Bruno Fernandez-Ruiz of Yahoo, who speaks about Hadoop since 2005, Hadoop today, and what is ahead.  An important type of data property Fernandez-Ruiz is interested about is ''timeliness'', which we haven't really looked at for our project, but you will see that it could apply easily to the dynamics of wikipedia.  Some interesting statistics about the number of servers, the size of the HDFS they use, the number of processes are given.  ''About 17 minutes''.
+
::: The next video filmed in June 2013 presents Bruno Fernandez-Ruiz of Yahoo, who speaks about Hadoop since 2005, Hadoop today, and what is ahead.  An important type of data property Fernandez-Ruiz is interested in is ''timeliness'', which we haven't really looked at for our project, but you will see that it could apply easily to the dynamics of wikipedia.  Some interesting statistics about the number of servers, the size of the HDFS they use, the number of processes are given.  ''About 17 minutes''.
 
<center>
 
<center>
 
[[Image:LookingBeyondHadoop.png | 430px | link=http://fora.tv/2013/06/26/Hadoop_and_Continuous_Computing_Looking_Beyond_MapReduce ]]
 
[[Image:LookingBeyondHadoop.png | 430px | link=http://fora.tv/2013/06/26/Hadoop_and_Continuous_Computing_Looking_Beyond_MapReduce ]]
Line 396: Line 393:
 
* '''Thursday''':  
 
* '''Thursday''':  
 
** <font color="magenta">Tentative guest lecture: Nick Howe on CUDA and GPUs</font>
 
** <font color="magenta">Tentative guest lecture: Nick Howe on CUDA and GPUs</font>
** Going over Homework #5
+
** Some thoughts about INFOCOMP 2013 ([[CSC352 Keynote Presentations 2013| keynote]])
** A bit of Bash
+
** Going over Homework #5 ([[CSC352 Walking a 2-Level Directory in C| Walking a 2-Level Directory in C]])
** [[Tutorial:_Running_a_Python_version_of_WorkCount_on_an_AWS_cluster| MapReducing in Python]]
+
 
 +
 
 
----
 
----
 
*  
 
*  
Line 411: Line 409:
 
** <font color="red">No newsletter due</font>
 
** <font color="red">No newsletter due</font>
 
** <font color="goldenrod">Paper presentation</font>:  [[Media:AViewOfCloudComputing_CACM_Apr2010.pdf| A View of Cloud Computing]] presented by Dana&euml;.
 
** <font color="goldenrod">Paper presentation</font>:  [[Media:AViewOfCloudComputing_CACM_Apr2010.pdf| A View of Cloud Computing]] presented by Dana&euml;.
 +
** 5-minute project presentations (everybody)
 
**  Instead of a newsletter, you may turn today a [[CSC352 Project Introduction  in Latex | draft of an introduction to your final project]].  If you have too much work this week, you can turn this in on 12/3.
 
**  Instead of a newsletter, you may turn today a [[CSC352 Project Introduction  in Latex | draft of an introduction to your final project]].  If you have too much work this week, you can turn this in on 12/3.
 +
** [[Tutorial: A bit of Bash | A bit of Bash]]
 +
** The challenges of the [[Tutorial:_Running_a_Python_version_of_WorkCount_on_an_AWS_cluster| MapReducing in Python]] lab
 
* '''Thursday''': <font color="magenta">Thanksgiving Break</font>
 
* '''Thursday''': <font color="magenta">Thanksgiving Break</font>
  
Line 425: Line 426:
 
** <font color="goldenrod">Paper presentation</font>:  [[Media:unreasonableEffectivenessOfData2009_HalevyNorvigPereira.pdf | The Unreasonable Effectiveness of Data]] presented by Julia
 
** <font color="goldenrod">Paper presentation</font>:  [[Media:unreasonableEffectivenessOfData2009_HalevyNorvigPereira.pdf | The Unreasonable Effectiveness of Data]] presented by Julia
 
**  Instead of a newsletter, you need to turn in a [[CSC352 Project Introduction  in Latex | draft of an introduction to your final project]] (unless you submitted it last week).
 
**  Instead of a newsletter, you need to turn in a [[CSC352 Project Introduction  in Latex | draft of an introduction to your final project]] (unless you submitted it last week).
 +
 +
** The challenges of the [[Tutorial:_Running_a_Python_version_of_WorkCount_on_an_AWS_cluster| MapReducing in Python]] lab.  We have done Challenge #1 last time.  We'll look at Challenge #2 and #3.
 +
** Some feedback on Homework #5 and one [[CSC352 Homework 5 Solution 2013| solution]].
 +
** MapReduce task graphs
 +
----
 +
----
 
* '''Thursday'''
 
* '''Thursday'''
 
+
** [[Hadoop_Tutorial_1.1_--_Generating_Task_Timelines | Distribution of Map and Reduce tasks over time]]
 +
** Project work and discussion
 +
** 20-minute individual session (in class) to go over project, questions, setup, etc...
 
----
 
----
 
*
 
*
Line 450: Line 459:
  
 
----
 
----
*
+
An afternoon of packing circular crepes, including some imaginative variations...
 +
[[Image:PackingCrepes1.jpg|200px]][[Image:PackingCrepes2.jpg|200px]]
 +
[[Image:PackingCrepes3.jpg|200px]]
 +
[[Image:PackingCrepes4.jpg|200px]]
 +
[[Image:PackingCrepes5.jpg|200px]]
 +
[[Image:PackingCrepes6.jpg|200px]]
 
||
 
||
  
Line 468: Line 482:
 
<br />
 
<br />
  
 +
==Smith Elements of Style==
 +
<br />
 +
* [[media:SmithJacobsonCenterWritingPapers-1.pdf | "Writing Papers" from the Smith College Jacobson Center for writing]]
 +
<br />
 
==On-Line Resources==
 
==On-Line Resources==
 
* [https://computing.llnl.gov/tutorials/parallel_comp/ Introduction to Parallel Processing], by Blaise Barney, Lawrence Livermore National Laboratory.  A good read.  Covers most of the important topics.
 
* [https://computing.llnl.gov/tutorials/parallel_comp/ Introduction to Parallel Processing], by Blaise Barney, Lawrence Livermore National Laboratory.  A good read.  Covers most of the important topics.

Latest revision as of 11:31, 31 January 2017

--D. Thiebaut (talk) 11:15, 9 August 2013 (EDT)




Main Page|Syllabus|Project Page | PIAZZA



Weekly Schedule

Week Topics Reading
Week 1
9/3
  • Tuesday
    • Syllabus
    • Introduction to final project
      • Approach
      • Programming
      • Testing
      • ==> paper (see 2011 paper for example).
    • Parallelism: going to the source: Interrupts!
      • 8086 type of interrupts (simplified)
      • Interrupt Vector
      • Interrupt Priority
      • Context Switch
      • Stack and Stack Frame
      • Global and Local Variables
    • What is a process?
    • What is a thread?




  • Thursday
    • Goals of multithreading:
      • Enhanced performance
      • Increased throughput
      • Greater user responsiveness
    • What should we remember 5 years from now?



    • Introduction to a graph we'll use all throughout the semester. The idea of threads
    Thread 1 ----------------------|====|-------------------------> time

    Thread 2 ------------|====|-----------------------------------> time


    • Multithreaded programming.

    • Comments on bimonthly newsletter
      • The format should be similar to the ACM Tech News format.
      • The header should contain a title, your name, the class (CSC352) and the date
      • Each paragraph should have a header with a title, the source of news, the date, and possibly a link to the full article.
      • The paragraph describing a news item should be between 3 to 10 lines, give or take.
      • Write 1 full page to 2 pages, depending on the richness of events in the technology field
      • Feel free to present N-1 topics with just 3 lines, and 1 topic which you highlight with a longer paragraph.
      • Topics: anything related to parallelism: computers, mobile platforms, cloud, companies, new software, new algorithms, conferences, people in the field, etc.
      • Good sources of information to get started:
      • Recommendation for news aggregator: Feedly.com

  • Homework: play with Latex. Find or adapt a document template for your needs (minimalist is the name of the game at this point), and start gathering news bits. First newsletter due Thursday Sept. 19th. The ACM Tech News format is a good and simple format to emulate.

Week 2
9/10
  • Tuesday
      • Introduction to measuring performance. Comparing execution times.
      • Introduction to Speedup( N ), where N is the number of threads, or the number of processors.
      • Amdahl's Law
AmdahlsLaw.jpg
      • A bit of Computer Architecture: Cores and Caches
4CoreAndLevel123Caches.png



4CoreAndLevel3CacheDie.jpg



LatenciesInMemoryHierarchy.png

(last slide taken from [www.cs.utexas.edu/users/mckinley/352/lectures/16.pdf http://www.cs.utexas.edu/users/mckinley/352/lectures/16.pdf])





  • Thursday
    • Discussion of A View of Parallel Processing from Berkeley. Prepare a 1- to 2-page summary of the paper in Latex. Hand-in the summary in class. No summaries will be accepted after class.
AViewFromBerkeleyWordle.png


    • Some topics taken from the paper:
      • Moore's Law:

MooresLawProcessorMemoryGap.gif

      • Barnes and Hut approach to N-Body problem


ManyCoreArchitecture.jpg

(Image taken from URL: http://www.altera.com/technology/system-design/articles/2012/multicore-many-core.html)

      • nanometers: where are we now?
Nm fabricationProcess.png


( Image taken from http://en.wikipedia.org/wiki/22_nanometer)

RingNetworkLinkingMultiCoreIntelArch.png
    • Short preparation for Maggie Lind's tour of the SCMA on Tuesday. Meeting place is entrance of SCMA.
      • What the project is about can be included in the field of Culturomics




Week 3
9/17

All the data structures of interest (concurrent non-blocking and blocking) can be found in the Oracle documentation. The information is a bit cryptic, but you need to get comfortable with it!

Week 4
9/24
  • Tuesday: Guest Lecture/Informal discussion with by Tim Draper
    • Some questions to start the conversation:
    • How has the cloud infrastructure changed entrepreneurship, if at all?
    • There is a whole ecosystem growing around the cloud services offered by Amazon and the other players: new companies offering services and using Amazon's AWS for example. What are some of the most interesting companies/ideas/technologies emerging that you have discovered or been involved with?
    • There is tremendous worries about the safety and privacy of data in the cloud. Is this an area of growth students should consider?
    • What other area of growth do you see that students should keep in their view-sight?
    • If a graduating major is interested in joining a start-up company, what are the signs she should be looking for before joining such a group?
    • Some students are interested in a management track, starting at a big company and climbing fast. What is your advice for best preparing for this type of career?
    • What is the most exciting development in your eyes happening now with cloud technology?
    • It has been said that the 21st century is the century of the entrepreneur. Do you see this as true?
    • Companies rise and fall. Microsoft was once the place where all our majors wanted to go. The most prestigious company for programmers. Now it's Google, and Facebook. Which company(ies) do you see as potential new meccas for programmers?
    • If somebody were to form a start-up with friends. Say 10 people. Who/What/Where? Who should the people be? What field should they be experts in? Where should the company locate?



TimMelissaDraper.png


    • Review of Homework 1 and its Solution.
      • Understand static variables
      • don't use global random generators!
      • /usr/bin/time multiplies time by the # of cores for threaded applications
      • be sure to understand if you need the same random seed or a different seed in your experiments
      • create a different user on your laptop with no extra applications loaded in the background (e.g. Skype): less stress on the O.S.



  • Thursday Mountain Day!


MountainDay.png


 

Week 5
10/1
  • Tuesday (Grace Hopper Conference)
    • Introduction to Packing pdf and ppt
    • Studying the Red-Black Tree data-structure
      • Why is it not thread-safe?
      • How can we make it thread-safe?
      • Devise a test to verify that the modifications have resulted in a thread-safe class
      • Profiling Java applications (introduction to Java's GC).




  • Thursday (Grace Hopper Conference)
    • Newsletter #2 due today. Please include 1 news item about some form of image collage, representation of many images in some form, hopefully digital. Also, please use a Latex feature you haven't used in your first newsletter
    • Elaborating a roadmap for the final [Project]

Week 6
10/8

  • MPI by Blaise Barney, at Lawrence Livermore National Laboratory: an excellent reference on MPI
Week 7
10/15
  • Tuesday: Fall Break
  • Thursday



Week 8
10/22
  • Tuesday
    • Paper presentation: Learning from the Success of MPI, presented by Gavi ( Bibtex)
    • Hadoop0 accounts
    • Learn how to become rsync champions!
    • Continuation of the introduction to MPI ( keynote). We stopped on Thursday on the MPI_Send() function.
    • Code for the pi2.c program computing Pi using summation of a series
    • Newsletter #3 due today!
  • Thursday
    • Continuation of the introduction to MPI ( keynote)
    • Introduction on how to operate a MySQL database ( keynote)
    • A project-oriented MPI example. Bring your Mac!

 

Week 9
10/29

Week 10
11/5
  • Tuesday: Otelia Cromwell Day
  • Thursday:
    • Paper presentation: MapReduce: Simplified Data Processing on Large Clusters presented by Sharon Pamela
    • Newsletter #4 due today!. Please include at least one image, and at least one news item covering some form of project that could be related or influential for our own wiki-collage project. See this document on writing theses for information about the inclusion of images in Latex. The end section has a good list of sites that have good coverage of Latex topics. There is also plenty of information on the Web about this subject.
    • Preparation for Homework 5: attaching EBS volumes. We'll do a lab in class to create and attach an EBS volume to your AWS cluster.



 

Week 11
11/12
  • Tuesday
    • Paper presentation: General-Purpose vs. GPU: Comparisons of Many-Cores on Irregular Workloads, presented by Yoshie
      • Questions about the paper:
        • What kind of paper is this? Broad distribution? Research? Small group?
        • Organization? Abstract? Introduction? Definition of specialized terms? Early enough in the paper?
        • Are the contributions of paper clear? The section on related research sufficient?
        • What is being compared? Similar machines? Hardware? Software?
        • Are authors partial? Do they have a stake?
        • How does the paper advance the state of research?
        • What does it tell us about the way computer systems evolve?
    • Thinking about the project
      • What do we know better about the overall project. What pieces have we looked at?
      • What is it we don't know?
      • Can we turn any of these questions into a project?



Yahoo has some very good reading material on Hadoop. One reason is that they may be one of the largest users of AWS and of Hadoop.

Week 12
11/19
  • Tuesday:
    • 1 month to go (exactly) before the project is due (Dec. 19)!
    • Student-directed work (DT @ INFOCOMP 2013)
    • Finish the MapReduce lab on AWS and make sure you do the Challenge 2 part of the lab.
    • Food for thought: some videos
      I suggest one of you connects her laptop to the projection system and you all watch these videos together. After each one, discuss it as a group. Take notes and be ready to share your comments during Thursday's class when we resume our regular schedule.
      • The Cave 2 Project at the University of Illinois: Just another hardware solution for presenting the user with a large number of pixels; in this case 27320 x 3072 pixels. Short, 3 minutes.


      • The Creators Projects video
This video is not necessarily anything that can work for us, but it's just "food for thought." Just a different way an artist has come up to make still pictures interesting to look at. Short, 6 minutes.


      • O'Reilly Radar Videos
OReillyPerlBookCover.jpg
Tim O'Reilly is a visionary who figured out a long time ago that computer technology was an exploding field and he started a very successful line of books to support all new technology projects that were emerging and promising. The books all have animals on them and are uniquely easy to spot. O'Reilly now also has an on-line channel (O'Reilly Radar), and organizes conferences with top researchers and intellectuals in the field of computer science.
The first video is with Doug Cutting, one of the creators of Hadoop. He makes some very good points about what Hadoop is, what it is good at, and what it might not be good at (Homework 5 lesson?). After Cutting you can skip the 2nd interview (about video technology) and zip to the 3rd interview with Jeremy Howard, at time-tag 13:47. Then learn about big data and analytics, and what is said of data scientists. About 12 minutes total.


Good interview of Tim O'Reilly describing Web 2.0, and his view of a data-driven Internet. 8-minute long. You may want to think about how our wikipedia data (images, stats) relate to what is said about data as described in the interview. About 8 minutes.


The next video filmed in June 2013 presents Bruno Fernandez-Ruiz of Yahoo, who speaks about Hadoop since 2005, Hadoop today, and what is ahead. An important type of data property Fernandez-Ruiz is interested in is timeliness, which we haven't really looked at for our project, but you will see that it could apply easily to the dynamics of wikipedia. Some interesting statistics about the number of servers, the size of the HDFS they use, the number of processes are given. About 17 minutes.

LookingBeyondHadoop.png

    • If you have at least 25 minutes left before the class time is over, do the MapReduce-Python lab, without attempting the challenges at the end. We'll do these together.




 

Week 13
11/26

Week 14
12/3
    • The challenges of the MapReducing in Python lab. We have done Challenge #1 last time. We'll look at Challenge #2 and #3.
    • Some feedback on Homework #5 and one solution.
    • MapReduce task graphs



 

Week 15
12/10
CSC352Row.jpg
  • Tuesday: Last Day of Class
    • 20-minute presentations of projects. Suggested outline:
      • The context: how your project fits in the overall pictures
      • Has other similar work been done and documented before
      • What you decided to do
        • The challenges
        • The choices
        • The target experiments
      • Preliminary results
      • Expected results
      • Possible directions for continuing research after the project

An afternoon of packing circular crepes, including some imaginative variations... PackingCrepes1.jpgPackingCrepes2.jpg PackingCrepes3.jpg PackingCrepes4.jpg PackingCrepes5.jpg PackingCrepes6.jpg



Links and Resources


Latex



Smith Elements of Style



On-Line Resources


Classics



Papers

This is a tentative and non exhaustive list of papers scheduled for reading this semester.

Introduction

Paper Pages

50

2

General/Parallelism

Paper Pages

5

7

5

MPI

Paper Pages
  • Learning from the Success of MPI, by WIlliam D. Gropp, Argonne National Lab, 2002.                                             

11

GPUs

Paper Pages

6

Virtualization

Paper Pages

5

Cloud

Paper Pages

1.5

  • A View of Cloud Computing, 2010, By Armbrust, Michael and Fox, Armando and Griffith, Rean and Joseph, Anthony D. and Katz, Randy and Konwinski, Andy and Lee, Gunho and Patterson, David and Rabkin, Ariel and Stoica, Ion and Zaharia, Matei.

9

13

5

2

Project-Related

Paper Pages

8