AWS Bootcamp, July 2015

From dftwiki3
Revision as of 10:12, 10 July 2015 by Thiebaut (talk | contribs)
Jump to: navigation, search

--D. Thiebaut (talk) 11:07, 10 July 2015 (EDT)


AWSSummit2015.png

AWS Summit 2015

New York City, NY, July 7-8, 2015

JavitsConventionCenter.jpg



  • I went to the Amazon Web Services (AWS) Summit 2015, in New York City on July 7-8 2015, and attended the 1-day bootcamp titled "Store, Manage, and Analyze Big Data in the Cloud":
The Store, Manage, and Analyze Big Data in the Cloud Bootcamp provides a broad, hands-on introduction to data collection, data storage, and analysis using AWS analytics services plus third party tools. In this one-day bootcamp, we show you how to use cloud-based big data solutions and Amazon Elastic MapReduce (EMR), the AWS big data platform, to capture and process data. We also teach you how to work with Amazon Redshift, Amazon Kinesis, and Amazon Data Pipeline. Hands-on lab activities help you learn to work with services and leverage best practices in designing big data environments.


This was a super fast overview of some of the technologies made available by Amazon to deal with Big Data. The labs allowed participants to work on AWS with Hive, Kinesis, and mostly Redshift. It would probably take me 3 weeks of class and lab time to cover the same material in my CSC352 seminar.
  • The second day, I quickly went through the hall with all the different vendors of services built on top of AWS. Keywords found on most of the banners: "Backup", "Agile", "Secure", "Analytics", "Make sense" (as in make sense of your data), "Visualize." Most displays showed colorful graphs of timelines: the variation of workloads and traffic patterns experienced by software apps running on AWS clusters. One display showed the console and a bash prompt in it...
  • At 10 that morning, Werner Vogels, CTO of Amazon, gave a presentation of where Amazon is, what it does, and what is new. Interesting to see how CTOs can be treated as rock-stars. Here are some of the comments and ideas I jotted down:
  • SQL is very much alive. There's an effort to make the cloud look like a big SQL repository. Why? Because a lot of programmers out there know SQL and how to get information out of SQL databases, and few understand Hadoop (which is mostly out, by the way). So it's easier to provide a view of the cloud as an SQL space.
  • "There's no hardware any longer," a quote by Werner Vogels. You manage your hardware infrastructure through the Web. It behaves as just another piece of software.
  • Instead of tuning your cluster of computers so that you get the performance wanted, "just add another node!" That seemed to be the message during the Bootcamp session. Instances (computers) are just pennies an hour. You need more power, just add some more nodes; it's cheaper than paying somebody long hours to figure out how to tune the system.
  • In some of the new AWS services, instead of specifying the hardware infrastructure, the system manager specifies the throughput wanted, both for input, and for output. AWS will figure out how many nodes and what configuration to use. Cool!
  • AWS is making the cloud real-time, but introducing cloud events: Functions that are triggered when new data comes in. Or when some result is generated in the cloud. Streaming and real-time applications become possible.