Difference between revisions of "Tackling Big Data MIT Course"

From dftwiki3
Jump to: navigation, search
(Certificate)
 
(18 intermediate revisions by the same user not shown)
Line 12: Line 12:
 
* http://professionaleducation.mit.edu
 
* http://professionaleducation.mit.edu
 
* Self Service Center: https://crm.orionondemand.com/crm/forms/C6700mB00x6G0x67028F
 
* Self Service Center: https://crm.orionondemand.com/crm/forms/C6700mB00x6G0x67028F
 +
 +
=Login to EdX=
 +
* https://edge.edx.org/
 +
* screen name: dominique
 +
<onlydft>
 +
* Account: dthiebaut@smith.edu
 +
* Password: edxxxx111!!?
 +
</onlydft>
 +
 +
 +
=Overall Syllabus=
 +
 +
==MODULES, TOPICS, AND FACULTY ==
 +
 +
===Module One: Introduction and Use Cases===
 +
The introductory module aims to give a broad survey of Big Data
 +
challenges and opportunities and highlights applications as case
 +
studies.
 +
* Introduction: Big Data Challenges (Sam Madden)
 +
* Case Study: Transportation (Daniela Rus)
 +
* Case Study: Visualizing Twitter (Sam Madden)
 +
 +
===Module Two: Big Data Collection===
 +
The data capture module surveys approaches to data collection,
 +
cleaning, and integration.
 +
* Data Cleaning and Integration (Mike Stonebraker)
 +
* Hosted Data Platforms and the Cloud (Matei Zaharia)
 +
 +
===Module Three: Big Data Storage===
 +
The module on Big Data storage describes modern approaches
 +
to databases and computing platforms.
 +
* Modern Databases (Mike Stonebraker)
 +
* Distributed Computing Platforms (Matei Zaharia)
 +
* NoSQL, NewSQL (Sam Madden)
 +
 +
 +
===Module Four: Big Data Systems===
 +
The systems module discusses solutions to creating and deploying
 +
working Big Data systems and applications.
 +
* Multicore Scalability (Nickolai Zeldovich)
 +
* Security (Nickolai Zeldovich)
 +
* User Interfaces for Data (David Karger)
 +
 +
===Module Five: Big Data Analytics===
 +
The analytics module covers state-of-the-art algorithms for very
 +
large data sets and streaming computation.
 +
* Machine Learning Tools (Tommi Jaakkola)
 +
* Fast Algorithms I (Ronitt Rubinfeld)
 +
* Fast Algorithms II (Piotr Indyk)
 +
* Data Compression (Daniela Rus)
 +
* Case Study: Information Summarization (Regina Barzilay)
 +
* Applications: Medicine (John Guttag)
 +
* Applications: Finance (Andrew Lo)
 +
 +
Note: Schedule and faculty are subject to change without notice
 +
 +
=Notes=
 +
<onlydft>
 +
 +
==Rus: Transportation==
 +
* Rus.  Transportation in Singapore.  Singapore small country. 16,000 taxis.  High number of loops embedded in streets.  Can sample of taxis with GPS be used to approximate well the real traffic given (every 15 minutes) by loops (expensive to maintain).
 +
* taxis can yield data every 30 sec to 1 min.
 +
* Loops every 15 min.
 +
* problem = can taxis be used?
 +
* answer: not a good representative
 +
* use linear regression to find coefficients during time windows. Windows cannot be too small.
 +
* taxi data can be represented by Markov chains so a sample is representative of the whole population (I didn't get that)
 +
* After applying model, can see that a sample of taxis (maybe 1000) can be good at estimating real traffic.
 +
----
 +
;Problems
 +
: goes very quickly over concepts.  Like markov chains
 +
: holly jump from one problem to a solution.  No explaining.
 +
: The slides given as support are way too small.  Hard to see unless manipulate and make bigger
 +
: when using full screen for the video, transcript overlaps with right quarter of video.
 +
 +
==Madden: MapD==
 +
* Presents a products generated at MIT: MadD
 +
* shows that you can geo locate words in tweets to geographical location
 +
* no real applications.  "So what?"
 +
* Links
 +
** MapD: [http://mapd.csail.mit.edu/tweetmap/ http://mapd.csail.mit.edu/tweetmap/]
 +
** Harvard '''tweetmap''': [http://worldmap.harvard.edu/tweetmap/ http://worldmap.harvard.edu/tweetmap/]
 +
<br />
 +
----
 +
<br />
 +
=Complaints=
 +
* Quiz not satisfying
 +
* Forced to contribute to discussion: Ok
 +
* Forced to comment on somebody else's comment: bad
 +
* Should be on each course, rather than on each unit
 +
* Plenty to complain about regarding UI
 +
[[Image:MITMoocComplaint1.jpg|300px]]
 +
[[Image:MITMoocComplaint2.jpg|300px]]
 +
[[Image:MITMoocComplaint3.jpg|300px]]
 +
[[Image:MITMoocComplaint4.jpg|300px]]
 +
<br />
 +
=Ideas for Presentation=
 +
* Great idea for syllabus for new course
 +
* Great way to learn new techniques in algorithms: streaming vs sampling
 +
* Great way to get a student to learn new material for independent study or for thesis (UI)
 +
* Great way to find some organization to a given subject: UI for example
 +
* Simplicity of presentation hardware
 +
* For UI, invent system that allow users to not have to program
 +
* Good to watch videos twice in a row.
 +
* HD essential for graphs
 +
* Office hours 2-3 times by a few profs.  Use Cisco WebEx for office hours.  About 47 participants.  Audio only.
 +
<br />
 +
<center>[[Image:MITTacklingBigData_OfficeHours.png|700px]]</center>
 +
<br />
 +
<center>[[Image:MITTacklingBigData_Format.png|700px]]</center>
 +
<br />
 +
<center>[[Image:MITTacklingBigDataAssessment.png|700px]]</center>
 +
<br />
 +
<center>[[Image:MITTacklingBigDataOfficeHours.png|700px]]</center>
 +
 +
</onlydft>
 +
<br />
 +
<center>[[Image:MITEmailParticipationWorkshops.png|700px]]</center>
 +
<br />
 +
=Certificate=
 +
* [[Media:TacklingBigData_Certificate.pdf| Certificate]]
 +
<center>[[Image:TacklingBigData_Certificate.png|500px]]</center>

Latest revision as of 12:38, 23 April 2014

--D. Thiebaut (talk) 20:12, 3 March 2014 (EST)


Misc. Information


...

Login to EdX


...


Overall Syllabus

MODULES, TOPICS, AND FACULTY

Module One: Introduction and Use Cases

The introductory module aims to give a broad survey of Big Data challenges and opportunities and highlights applications as case studies.

  • Introduction: Big Data Challenges (Sam Madden)
  • Case Study: Transportation (Daniela Rus)
  • Case Study: Visualizing Twitter (Sam Madden)

Module Two: Big Data Collection

The data capture module surveys approaches to data collection, cleaning, and integration.

  • Data Cleaning and Integration (Mike Stonebraker)
  • Hosted Data Platforms and the Cloud (Matei Zaharia)

Module Three: Big Data Storage

The module on Big Data storage describes modern approaches to databases and computing platforms.

  • Modern Databases (Mike Stonebraker)
  • Distributed Computing Platforms (Matei Zaharia)
  • NoSQL, NewSQL (Sam Madden)


Module Four: Big Data Systems

The systems module discusses solutions to creating and deploying working Big Data systems and applications.

  • Multicore Scalability (Nickolai Zeldovich)
  • Security (Nickolai Zeldovich)
  • User Interfaces for Data (David Karger)

Module Five: Big Data Analytics

The analytics module covers state-of-the-art algorithms for very large data sets and streaming computation.

  • Machine Learning Tools (Tommi Jaakkola)
  • Fast Algorithms I (Ronitt Rubinfeld)
  • Fast Algorithms II (Piotr Indyk)
  • Data Compression (Daniela Rus)
  • Case Study: Information Summarization (Regina Barzilay)
  • Applications: Medicine (John Guttag)
  • Applications: Finance (Andrew Lo)

Note: Schedule and faculty are subject to change without notice

Notes


...


MITEmailParticipationWorkshops.png


Certificate

TacklingBigData Certificate.png