Hadoop Tutorial 3.3 -- How much for 1 month of AWS MapReduce?

From dftwiki3
Jump to: navigation, search

--D. Thiebaut 17:45, 18 April 2010 (UTC)



AmazonAWS.jpgHadoopCartoon.png

This is Part 3 of the Hadoop on AWS Tutorial. This part deals with the economics of AWS.



How Much Will it Cost You to run your own Hadoop Cluster?

Let's figure out how it would cost to have Amazon host our cluster of 6 hadoop Linux boxes for a month.

First we need to figure out what to rent, and what is charged.

Instances

Amazon rents different types of instances (http://aws.amazon.com/ec2/#instance). As of 4/18/10, these are:

  • Standard/Small: 1 32-bit virtual core, 1.7 GB RAM, 160 GB disk
  • Standard/Large: 2 64-bit virtual cores, 7.5 GB RAM, 850 GB disk
  • Standard/Extra-Large: 4 64-bit virtual cores, 15 GB RAM, 1.6 TB disk.
  • High-memory/Extra-Large: 2 64-bit virtual cores, 17.1 GB RAM, 420 GB disk
  • High-Memory/Double Extra-Large: 4 64-bit virtual cores, 34.2 GB of RAM, 850 GB disk
  • High-Memory/Quadruple Extra-Large: 8 64-bit virtual cores, 68.4 GB of RAM, 1.6 TB disk

The pricing (as of 4/18/10) for these is:


AWS instancePricing.png


Data Transfer

Amazon charges for data transfer in and out of EC2/S3. Only data transfers out are charged, and this until June 30, 2010. The pricing is shown below (as of 4/18/10):

AWS DataTransferPricing.png


S3 Storage

Storing data on S3 is charged by the GB stored, and the GB of transfer in and out of storage. The pricing as of 4/18/10 is shown below:


AWS S3Pricing.png


MapReduce Infrastructure

Amazon is banking on the MapReduce interest and offers clusters setup for MapReduce with a pricing that is on top of EC2 and S3 pricing. The pricing as of 4/18/10 is shown below.


AWS MapReducePricing.png


Calculator

  • Launch the calculator.
  • Compute EC2 cost:
    AWS calculatorEC2.png

    1. Select EC2 tab
    2. Use EC2 on-demand instances
    3. Pick 6 Small instances
    4. Set demand to 4 hours/day (pretty reasonable)
    5. Add to Bill

AWS calculaterEC2Result.png

  • Note the cost/month under the pie-chart (the $0.00 shown above is a fluke...)




ComputerLogo.png
Lab Experiment #1
Continue calculating the costs for a 6-node cluster using the following assumptions:
  • We only keep the cluster for a month
  • We store all the pages of wikipedia in S3 (27 GB)
  • We upload the pages only once
  • We download 5% of the size of wikipedia as result (category/most-frequent words) per month.
  • We use MapReduce on the 6 instances
Question 1
How much will it cost to maintain our 6-node cluster on Amazon?
Question 2
How does that compare to buying 6 Dell PCs today comparable to small AMIs (Amazon Machine Instances) and keeping them for 4 years before retiring them? We also need a hub/switch (~$100 for a 1Gbps switch).
Question 3
Is it fair to compare the amazon cost to the purchase price of 6 PCs amortized over 4 years? Why? Why not? What are some of the hidden costs? What are some of the advantages of purchasing the PCs? What are some of the advantages of renting the PCs?





ComputerLogo.png
Lab Experiment #2
If we assume that execution time scales (decreases) linearly with the number of processors in the MapReduce cluster, and if we




...