AWS Startups Blog

Follow the Money

Guest post by Eric Feng of Flipboard


An important responsibility for any startup is managing technology costs and investments of which AWS can be a significant component. When reviewing your AWS account activity in the AWS Billing and Cost Management console, you may notice large cost items that you don’t fully understand because there’s no detailed breakdown showing which hosts and services are transferring data between Availability Zones. This is particularly true when operating a large network of machines.

For example, imagine you look at the Data Transfer portion of the Details section on the Bills page and see a big cost for “AWS Data Transfer (excluding Amazon CloudFront)” under the subcategory of “$0.010 per GB — regional data transfer — in/out/between EC2 AZs or using IPs or ELB.” When you have thousands of machines all talking to each other across Availability Zones, it can be difficult to know which ones are the largest contributors to that expense. Fortunately, you can use AWS Billing and Cost Management reports to figure this out.

  1. First, generate tags for each of your Amazon EC2 instances based on its host name, service, and Availability Zone. For instance, create a key called “service” and set the value to “appserver” for all EC2 instances that run as application services. Or use the value “mydb” for all your databases.
  2. Then set the EC2 tag “Name” to the host name to make it easier to identify the role of the machine in your billing analysis.
  3. Finally, remember to add your own tag for the Availability Zone your EC2 instance resides using the key “az,” since the detailed reports do not yet populate the Availability Zone field in the Bills page details for regional transfers. I recommend setting these tags up automatically when you deploy your instances. Note that when you add tags later on, your cost allocation reports will only contain those tags for charges incurred after the tag is created.
  4. Next, in the AWS Billing and Cost Management console, click Preferences and select Receive Billing Reports. Follow the instructions to designate an Amazon S3 bucket for your reports. You’ll need to copy some variant of the sample policy to your S3 bucket to grant AWS access to publish the reports to your bucket.
  5. Then select the options to sign up for Monthly report, Detailed billing report with resources and tags, and Cost allocation report.
  6. Finally, below the list of report options, click Manage report tags. Here, you can enter the three tags you created earlier (in this example, the “service,” “az, “ and “Name” tags).
  7. Now, wait until your next report gets generated. I’ve noticed reports tend to get generated several times a day, overwriting the original. Once available, download it using your favorite S3 tool (I like s3cmd). The name will look something like: s3://your-bucket/XXXXXXXXXXXX-aws-billing-detailed-line-items-with-resources-and-tags-2013-10.csv.zip where XXXXXXXXXXXX is your account ID. Uncompress the file and you’ll see a detailed .csv file that you can start reporting against.

For this example, we might want to extract just the header and billing lines related to regional data transfer in order to shorten our processing times. Run the following command to do that and save the output to the file regional.csv:

$ egrep -i ‘(invoiceid|regional)’ downloaded.csv > regional.csv

Now, we can run a simple python script to read the .csv file and print the relative regional transfer cost of each of our core services. The output might look similar to the following:

===============================
Top 10 Most expensive services (%cost)
===============================
appservices:a:InterZone-Out (14.52%)
mydb:d:InterZone-In (13.07%)
serviceX:a:InterZone-Out (12.44%)
serviceY:d:InterZone-In (5.55%)
mydb:a:InterZone-Out (1.76%)
serviceY:a:InterZone-Out (1.48%)
serviceZ:a:InterZone-Out (1.03%)
memcached:a:InterZone-In (1.01%)
serviceT:a:InterZone-In (0.93%)
appservices:d:InterZone-Out (0.80%)

You can see that application servers in zone a account for over 14% of the total costs via outbound traffic, while outbound traffic from application servers in zone d is a fraction of that. This would imply that application servers in zone a are writing to some service (or services) in another AZ. Since mydb in d zone above shows a similarly large cost for inbound data, one could surmise the cause of the high cost is application servers in zone a writing to mydb in zone d. Mystery solved! We now know where we need to focus our time and energy to reduce the primary cost for intra-AZ costs.

By using AWS billing reports along with EC2 tags and some custom scripts, you’ll be able to understand (and most importantly optimize) your EC2 costs in efficient and effective ways. For more information, see Use Cost Allocation Tags for Custom Billing Reports in the AWS Billing and Cost Management User Guide. Enjoy!