** Finish the [[Tutorial:_Creating_a_Hadoop_Cluster_on_Amazon_AWS | MapReduce lab on AWS]] and make sure you do the [[Tutorial:_Creating_a_Hadoop_Cluster_with_StarCluster_on_Amazon_AWS#Challenge_.23_2 | Challenge 2]] part of the lab.
** Finish the [[Tutorial:_Creating_a_Hadoop_Cluster_on_Amazon_AWS | MapReduce lab on AWS]] and make sure you do the [[Tutorial:_Creating_a_Hadoop_Cluster_with_StarCluster_on_Amazon_AWS#Challenge_.23_2 | Challenge 2]] part of the lab.
** Food for thought: some videos<br />I suggest one of you connects her laptop to the projection system and you all watch these videos together. After each one, discuss it as a group. Take notes and be ready to share your comments during Thursday's class when we resume our regular schedule.
** Food for thought: some videos<br />I suggest one of you connects her laptop to the projection system and you all watch these videos together. After each one, discuss it as a group. Take notes and be ready to share your comments during Thursday's class when we resume our regular schedule.
−
*** The Cave 2 Project at the University of Illinois: Just another hardware solution for presenting the user with a large number of pixels; in this case 27320 x 3072 pixels.
+
*** The Cave 2 Project at the University of Illinois: Just another hardware solution for presenting the user with a large number of pixels; in this case 27320 x 3072 pixels. ''Short, 3 minutes.''
If you are considering working on an honors thesis, you might want to take a look at this page on writing Honors thesis with Latex.
Comments on bimonthly newsletter
The format should be similar to the ACM Tech News format.
The header should contain a title, your name, the class (CSC352) and the date
Each paragraph should have a header with a title, the source of news, the date, and possibly a link to the full article.
The paragraph describing a news item should be between 3 to 10 lines, give or take.
Write 1 full page to 2 pages, depending on the richness of events in the technology field
Feel free to present N-1 topics with just 3 lines, and 1 topic which you highlight with a longer paragraph.
Topics: anything related to parallelism: computers, mobile platforms, cloud, companies, new software, new algorithms, conferences, people in the field, etc.
Homework: play with Latex. Find or adapt a document template for your needs (minimalist is the name of the game at this point), and start gathering news bits. First newsletter due Thursday Sept. 19th. The ACM Tech News format is a good and simple format to emulate.
Discussion of A View of Parallel Processing from Berkeley. Prepare a 1- to 2-page summary of the paper in Latex. Hand-in the summary in class. No summaries will be accepted after class.
Some topics taken from the paper:
Moore's Law:
Processor-DRAM gap increasing (graph taken from www.cs.virginia)
All the data structures of interest (concurrent non-blocking and blocking) can be found in the Oracle documentation. The information is a bit cryptic, but you need to get comfortable with it!
Tuesday: Guest Lecture/Informal discussion with by Tim Draper
Some questions to start the conversation:
How has the cloud infrastructure changed entrepreneurship, if at all?
There is a whole ecosystem growing around the cloud services offered by Amazon and the other players: new companies offering services and using Amazon's AWS for example. What are some of the most interesting companies/ideas/technologies emerging that you have discovered or been involved with?
There is tremendous worries about the safety and privacy of data in the cloud. Is this an area of growth students should consider?
What other area of growth do you see that students should keep in their view-sight?
If a graduating major is interested in joining a start-up company, what are the signs she should be looking for before joining such a group?
Some students are interested in a management track, starting at a big company and climbing fast. What is your advice for best preparing for this type of career?
What is the most exciting development in your eyes happening now with cloud technology?
It has been said that the 21st century is the century of the entrepreneur. Do you see this as true?
Companies rise and fall. Microsoft was once the place where all our majors wanted to go. The most prestigious company for programmers. Now it's Google, and Facebook. Which company(ies) do you see as potential new meccas for programmers?
If somebody were to form a start-up with friends. Say 10 people. Who/What/Where? Who should the people be? What field should they be experts in? Where should the company locate?
Newsletter #2 due today. Please include 1 news item about some form of image collage, representation of many images in some form, hopefully digital. Also, please use a Latex feature you haven't used in your first newsletter
Newsletter #4 due today!. Please include at least one image, and at least one news item covering some form of project that could be related or influential for our own wiki-collage project. See this document on writing theses for information about the inclusion of images in Latex. The end section has a good list of sites that have good coverage of Latex topics. There is also plenty of information on the Web about this subject.
Preparation for Homework 5: attaching EBS volumes. We'll do a lab in class to create and attach an EBS volume to your AWS cluster.
Food for thought: some videos I suggest one of you connects her laptop to the projection system and you all watch these videos together. After each one, discuss it as a group. Take notes and be ready to share your comments during Thursday's class when we resume our regular schedule.
The Cave 2 Project at the University of Illinois: Just another hardware solution for presenting the user with a large number of pixels; in this case 27320 x 3072 pixels. Short, 3 minutes.
The Creators Projects video
This video is not necessarily anything that can work for us, but it's just "food for thought." Just a different way an artist has come up to make still pictures interesting to look at. Short, 6 minutes.
O'Reilly Radar Videos
Tim O'Reilly is a visionary who figured out a long time ago that computer technology was an exploding field and he started a very successful line of books to support all new technology projects that were emerging and promising. The books all have animals on them and are uniquely easy to spot. O'Reilly now also has an on-line channel (O'Reilly Radar), and organizes conferences with top researchers and intellectuals in the field of computer science.
The first video is with Doug Cutting, one of the creators of Hadoop. He makes some very good points about what Hadoop is, what it is good at, and what it might not be good at (Homework 5 lesson?). After Cutting you can skip the 2nd interview (about video technology) and zip to the 3rd interview with Jeremy Howard, at time-tag 13:47. Then learn about big data and analytics, and what is said of data scientists. About 12 minutes total.
Good interview of Tim O'Reilly describing Web 2.0, and his view of a data-driven Internet. 8-minute long. You may want to think about how our wikipedia data (images, stats) relate to what is said about data as described in the interview. About 8 minutes.
The next video filmed in June 2013 presents Bruno Fernandez-Ruiz of Yahoo, who speaks about Hadoop since 2005, Hadoop today, and what is ahead. An important type of data property Fernandez-Ruiz is interested in is timeliness, which we haven't really looked at for our project, but you will see that it could apply easily to the dynamics of wikipedia. Some interesting statistics about the number of servers, the size of the HDFS they use, the number of processes are given. About 17 minutes.
If you have at least 25 minutes left before the class time is over, do the MapReduce-Python lab, without attempting the challenges at the end. We'll do these together.
Thursday:
Tentative guest lecture: Nick Howe on CUDA and GPUs
Introduction to Parallel Processing, by Blaise Barney, Lawrence Livermore National Laboratory. A good read. Covers most of the important topics.
Introduction to MPI, by Blaise Barney, Lawrence Livermore National Laboratory. Another short but excellent coverage of a topic in parallel processing, this time MPI.
The Unreasonable Effectiveness of Data, by Halevy, Norvig, Pereira, IEEE Intelligent Systems, IEEE Intelligent Systems, March 2009, Vol. 24, No. 2, pp. 8-12.
A View of Cloud Computing, 2010, By Armbrust, Michael and Fox, Armando and Griffith, Rean and Joseph, Anthony D. and Katz, Randy and Konwinski, Andy and Lee, Gunho and Patterson, David and Rabkin, Ariel and Stoica, Ion and Zaharia, Matei.