Difference between revisions of "Tackling Big Data MIT Course"

From dftwiki3
Jump to: navigation, search
(Login to EdX)
(Certificate)
 
(16 intermediate revisions by the same user not shown)
Line 21: Line 21:
 
</onlydft>
 
</onlydft>
  
 +
 +
=Overall Syllabus=
 +
 +
==MODULES, TOPICS, AND FACULTY ==
 +
 +
===Module One: Introduction and Use Cases===
 +
The introductory module aims to give a broad survey of Big Data
 +
challenges and opportunities and highlights applications as case
 +
studies.
 +
* Introduction: Big Data Challenges (Sam Madden)
 +
* Case Study: Transportation (Daniela Rus)
 +
* Case Study: Visualizing Twitter (Sam Madden)
 +
 +
===Module Two: Big Data Collection===
 +
The data capture module surveys approaches to data collection,
 +
cleaning, and integration.
 +
* Data Cleaning and Integration (Mike Stonebraker)
 +
* Hosted Data Platforms and the Cloud (Matei Zaharia)
 +
 +
===Module Three: Big Data Storage===
 +
The module on Big Data storage describes modern approaches
 +
to databases and computing platforms.
 +
* Modern Databases (Mike Stonebraker)
 +
* Distributed Computing Platforms (Matei Zaharia)
 +
* NoSQL, NewSQL (Sam Madden)
 +
 +
 +
===Module Four: Big Data Systems===
 +
The systems module discusses solutions to creating and deploying
 +
working Big Data systems and applications.
 +
* Multicore Scalability (Nickolai Zeldovich)
 +
* Security (Nickolai Zeldovich)
 +
* User Interfaces for Data (David Karger)
 +
 +
===Module Five: Big Data Analytics===
 +
The analytics module covers state-of-the-art algorithms for very
 +
large data sets and streaming computation.
 +
* Machine Learning Tools (Tommi Jaakkola)
 +
* Fast Algorithms I (Ronitt Rubinfeld)
 +
* Fast Algorithms II (Piotr Indyk)
 +
* Data Compression (Daniela Rus)
 +
* Case Study: Information Summarization (Regina Barzilay)
 +
* Applications: Medicine (John Guttag)
 +
* Applications: Finance (Andrew Lo)
 +
 +
Note: Schedule and faculty are subject to change without notice
  
 
=Notes=
 
=Notes=
 
<onlydft>
 
<onlydft>
 +
 
==Rus: Transportation==
 
==Rus: Transportation==
 
* Rus.  Transportation in Singapore.  Singapore small country. 16,000 taxis.  High number of loops embedded in streets.  Can sample of taxis with GPS be used to approximate well the real traffic given (every 15 minutes) by loops (expensive to maintain).
 
* Rus.  Transportation in Singapore.  Singapore small country. 16,000 taxis.  High number of loops embedded in streets.  Can sample of taxis with GPS be used to approximate well the real traffic given (every 15 minutes) by loops (expensive to maintain).
Line 40: Line 87:
 
: when using full screen for the video, transcript overlaps with right quarter of video.
 
: when using full screen for the video, transcript overlaps with right quarter of video.
  
 +
==Madden: MapD==
 +
* Presents a products generated at MIT: MadD
 +
* shows that you can geo locate words in tweets to geographical location
 +
* no real applications.  "So what?"
 +
* Links
 +
** MapD: [http://mapd.csail.mit.edu/tweetmap/ http://mapd.csail.mit.edu/tweetmap/]
 +
** Harvard '''tweetmap''': [http://worldmap.harvard.edu/tweetmap/ http://worldmap.harvard.edu/tweetmap/]
 +
<br />
 +
----
 +
<br />
 +
=Complaints=
 +
* Quiz not satisfying
 +
* Forced to contribute to discussion: Ok
 +
* Forced to comment on somebody else's comment: bad
 +
* Should be on each course, rather than on each unit
 +
* Plenty to complain about regarding UI
 +
[[Image:MITMoocComplaint1.jpg|300px]]
 +
[[Image:MITMoocComplaint2.jpg|300px]]
 +
[[Image:MITMoocComplaint3.jpg|300px]]
 +
[[Image:MITMoocComplaint4.jpg|300px]]
 +
<br />
 +
=Ideas for Presentation=
 +
* Great idea for syllabus for new course
 +
* Great way to learn new techniques in algorithms: streaming vs sampling
 +
* Great way to get a student to learn new material for independent study or for thesis (UI)
 +
* Great way to find some organization to a given subject: UI for example
 +
* Simplicity of presentation hardware
 +
* For UI, invent system that allow users to not have to program
 +
* Good to watch videos twice in a row.
 +
* HD essential for graphs
 +
* Office hours 2-3 times by a few profs.  Use Cisco WebEx for office hours.  About 47 participants.  Audio only.
 +
<br />
 +
<center>[[Image:MITTacklingBigData_OfficeHours.png|700px]]</center>
 +
<br />
 +
<center>[[Image:MITTacklingBigData_Format.png|700px]]</center>
 +
<br />
 +
<center>[[Image:MITTacklingBigDataAssessment.png|700px]]</center>
 +
<br />
 +
<center>[[Image:MITTacklingBigDataOfficeHours.png|700px]]</center>
  
 
</onlydft>
 
</onlydft>
 +
<br />
 +
<center>[[Image:MITEmailParticipationWorkshops.png|700px]]</center>
 +
<br />
 +
=Certificate=
 +
* [[Media:TacklingBigData_Certificate.pdf| Certificate]]
 +
<center>[[Image:TacklingBigData_Certificate.png|500px]]</center>

Latest revision as of 11:38, 23 April 2014

--D. Thiebaut (talk) 20:12, 3 March 2014 (EST)


Misc. Information


...

Login to EdX


...


Overall Syllabus

MODULES, TOPICS, AND FACULTY

Module One: Introduction and Use Cases

The introductory module aims to give a broad survey of Big Data challenges and opportunities and highlights applications as case studies.

  • Introduction: Big Data Challenges (Sam Madden)
  • Case Study: Transportation (Daniela Rus)
  • Case Study: Visualizing Twitter (Sam Madden)

Module Two: Big Data Collection

The data capture module surveys approaches to data collection, cleaning, and integration.

  • Data Cleaning and Integration (Mike Stonebraker)
  • Hosted Data Platforms and the Cloud (Matei Zaharia)

Module Three: Big Data Storage

The module on Big Data storage describes modern approaches to databases and computing platforms.

  • Modern Databases (Mike Stonebraker)
  • Distributed Computing Platforms (Matei Zaharia)
  • NoSQL, NewSQL (Sam Madden)


Module Four: Big Data Systems

The systems module discusses solutions to creating and deploying working Big Data systems and applications.

  • Multicore Scalability (Nickolai Zeldovich)
  • Security (Nickolai Zeldovich)
  • User Interfaces for Data (David Karger)

Module Five: Big Data Analytics

The analytics module covers state-of-the-art algorithms for very large data sets and streaming computation.

  • Machine Learning Tools (Tommi Jaakkola)
  • Fast Algorithms I (Ronitt Rubinfeld)
  • Fast Algorithms II (Piotr Indyk)
  • Data Compression (Daniela Rus)
  • Case Study: Information Summarization (Regina Barzilay)
  • Applications: Medicine (John Guttag)
  • Applications: Finance (Andrew Lo)

Note: Schedule and faculty are subject to change without notice

Notes


...


MITEmailParticipationWorkshops.png


Certificate

TacklingBigData Certificate.png