AWS DevOps Blog

Part 5: Develop, Deploy, and Manage for Scale with Elastic Beanstalk and CloudFormation

RDS Read Replicas and ElastiCache: Bolting Jet Engines onto our Elastic Beanstalk App

Welcome to the 5th (and final!) part of this 5-part series. Today we’ll talk about performance optimizations we made to our application, and we’ll do all of this in the context of a real load test that simulates actual customers using the application. We’ll look at how we automatically discover (and use) RDS Read Replicas and the impact that introducing a Read Replica has on our loaded application. We’ll also discuss how we use Amazon ElastiCache to speed up some common queries, and then measure just exactly what impact introducing ElastiCache has on the performance of a loaded application. Ready? Let’s go!

All application source and accompanying CloudFormation templates are available on GitHub at

The Hangout

We’ll be discussing this blog post – including your Q&A – during a live Office Hours Hangout at 9a Pacific on Thursday, May 8, 2014. Sign up at

How the aMediaManager App Uses RDS

It’s worth a quick refresher on how the aMediaManager application uses RDS. Here’s the schema:

You can see that we store the following data:

  1. Video metadata in the videos table. This includes important data about every video, including the owner, title, description, location in S3, preview image, etc.
  2. Video tags in the tags table, joined to the videos table via the videos_tags table.

And every time the user loads the default page or clicks the Videos link in our app, we have to make two separate queries (including a JOIN+GROUP BY query for tags) to draw the page:

The Load Test

To simualate load on the application, we wrote a simple Ruby script that uses the Mechanize library to interact with the aMediaManager application. Here’s what happens when we execute the script:

  1. Spawn two threads
  2. In each thread, flush DNS cache and resolve the application’s URL (e.g., to a single IP address (i.e., the address of an ELB)
  3. Register a new user with a random name and e-mail address
  4. Sign out (the user is automatically authenticated when a new account is created)
  5. Sign in
  6. Upload a profile picture
  7. Upload a small video (~256KB)
  8. View the /videos page 10 times
  9. Sign out

By my count, each run of the script (account for the 2 threads in each run of the script) results in 8 POSTs and 34 GETs (including GETs as a result of 302 redirects after some POSTs).

Quick sidebar comment: I usually think about Elastic Beanstalk running web applications, APIs, or background worker tasks, but I was really pleased with how easy the service made it to deploy and scale a load test like this. We’ll go into more detail on the load test in a future post…

Load ’em Up

Using CloudWatch we’ll observe 5 metrics to baseline performance:

  1. Requests – Sum of requests received by the application’s ELB in a 5 minute period
  2. Latency – Average latency for ELB to receive a response from the app
  3. Database CPU – Average CPU utilization (%) of the RDS database
  4. Cache Hits/Misses – Sum of cache hits and misses to the ElastiCache cluster
  5. App Server CPU – Average CPU utilization (%) across all EC2 instances in an Auto Scaling Group running our app

To start the load test, we used the Elastic Beanstalk Docker environment to run the script in an infinite loop on 2 x t1.micros. After 20 minutes, here’s what we see as measured by CloudWatch (order of the graphs corresponds to the list above):

We’re serving ~1200 requests every 5 minutes with an average latency of almost 1s. It looks like the CPU of our RDS database is pegged at 100%. This isn’t terribly surprising: I’m running the app to stay inside of the free tier, which means a t1.micro database. We’re throwing a lot of queries (including that join and group by query to show tags) at this t1.micro.

We also notice that there are no cache hits or misses (we haven’t enabled caching yet). The CPU utilization of our app servers looks fine.

Crank it Up

Let’s double the load and see what happens:

We can see that doubling the load didn’t increase the number of requests served by our application, but it did cause latency to more than double! RDS CPU utilization is still at 100%.

Introduce an RDS Read Replica

From the documentation: “Amazon RDS uses MySQL’s built-in replication functionality to create a special type of DB instance called a read replica from a source DB instance. Updates made to the source DB instance are copied to the read replica. You can reduce the load on your source DB instance by routing read queries from your applications to the read replica.

We built our application to automatically discover and use any Read Replicas that exist. Our application’s configuration (which you can read more about in Part 3 of this series) contains the ID of the master RDS. When the app starts up, our code queries the RDS API to determine if any read replicas exist for the master. If they do, we include the Read Replica hosts in the connection string and use the com.mysql.jdbc.ReplicationDriver JDBC driver.

Here’s a snippet from com.amediamanager.dao.RdsDriverManagerDataSource

private void initializeDataSource() {
       // Use the RDS DB and the dbEndpointRetriever to discover the URL of the
       // database. If there
       // are read replicas, set the correct driver and use them.
       final String masterId = config

       try {
           Endpoint master = dbEndpointRetriever.getMasterDbEndpoint(masterId);
           List<Endpoint> replicas = dbEndpointRetriever .getReadReplicaEndpoints(masterId);

           if (master != null) {
  "Detected RDS Master database");
               StringBuilder builder = new StringBuilder();
               if (replicas != null) { builder.append("replication:"); super.setDriverClassName("com.mysql.jdbc.ReplicationDriver"); } else {

               builder.append("//" + master.getAddress() + ":"
                       + master.getPort());
               if (replicas != null) {"Detected RDS Read Replicas"); for (Endpoint endpoint : replicas) { builder.append("," + endpoint.getAddress() + ":" + endpoint.getPort()); } } else {
      "No Read Replicas detected");
                       + config.getProperty(ConfigurationSettings.ConfigProps.RDS_DATABASE));
               String connectionString = builder.toString();

I navigated to the RDS Management Console, selected my master database and created a Read Replica:

Create Rr

When the replica was ready, I used the Elastic Beanstalk Management Console to restart Tomcat on all of my instances:

com.amediamanager.dao.RdsDriverManagerDataSource detected the new Read Replica and began using it.

The load generators were running the entire time and here’s what we saw:

Requests served were not improved, and neither was latency. But from the third graph we can see the Read Replica appear, and its CPU spikes and allows the master to take a break and only serve write requests. This is expected as we didn’t increase the size of the replica (i.e., it’s still a t1.micro), and it’s serving all read requests. We see a small spike in app server CPU from the Tomcat restart.

Amazon ElastiCache

From the documentation: “Amazon ElastiCache is a web service that makes it easy to set up, manage, and scale distributed in-memory cache environments in the cloud. It provides a high performance, resizeable, and cost-effective in-memory cache, while removing the complexity associated with deploying and managing a distributed cache environment.”

When we launched the app with CloudFormation we provisioned an ElastiCache cluster using the memcached engine with 2 t1.micro nodes in the cache cluster. Although there are several Java libraries for interacting with memcached, the Elasticache Java Cluster Client is preferred as it will automatically discover and use all of the nodes in our cache cluster. You can read more about auto discovery in the documentation at

Using the ElastiCache client in the aMediaManager application was straightforward:

  1. In our Maven pom.xmladd the client as a dependency:

  2. In com.amediamanager.springconfig.ServerConfig we create a MemcachedClient when the app is deployed. Because we’re using the elasticache-java-cluster-client we included as a depenedency above, the client will automatically detect both instances in our cache cluster and use them automatically:

    package com.amediamanager.springconfig;
    public class ServerConfig {
        public MemcachedClient memcachedClient(final ConfigurationSettings settings) throws IOException {
            String configEndpoint = settings.getProperty(ConfigurationSettings.ConfigProps.CACHE_ENDPOINT);
            Integer clusterPort = Integer.parseInt(settings.getProperty(ConfigurationSettings.ConfigProps.CACHE_PORT));
            return new MemcachedClient(new InetSocketAddress(configEndpoint, clusterPort));   

Now we can simply inject the MemcachedClient into our code to cache queries to list a user’s videos and tags.

Click com.amediamanager.service.VideoServiceImpl or com.amediamanager.service.TagsServiceImpl to see the simple implementation that stores Videos and Tags in the cache (and removes them whenever a new video is added).

We chose to enable or disable caching via the app’s configuration, and also exposed it at the /config route in the app. I visit that page and enable caching:

The app will now begin using the 2 x t1.micro ElastiCache memcached cluster that was provisioned when we deployed the app:

Wow, a lot to talk about here!

Requests served jumped by ~600% and latency dropped from an average of ~2s/request down to ~0.25s/request.

We also reduced CPU utilization on our t1.micro Read Replica from 100% to 90%.

The ElastiCache graph indicates our application is retrieving ~3,000 cached items per 5-minute period (the GetHits metric). We also see ~700 GetMisses per 5-minute period, which indicates our app asked for an item from the cache that wasn’t there. This 3,000:700 hit:miss ratio is great and dramatically improved the performance of our app as measured by requests and request latency, which also allowed our application servers to consume more of their CPU (up from 10% average to 30%).

Add Another Read Replica

We’re quite satisfied with the performance improvements that ElastiCache provided, but let’s see if we can serve even more requests even faster by adding another RDS Read Replica. I add one more t1.micro replica and restart my app servers so it will be detected:

Nice! Adding the second read replica boosts the requests we serve in a 5-minute period by 33% (from 7,500 to 10,000). And latency drops by about 50%, this time from ~0.25ms to ~0.12ms. Jet engines attached and operational!

And we’ve really eased up on the CPU utilization of the databases. Where the single Read Replica was sitting at ~90% utilization after we enabled caching, adding the 2nd replica shares that load, with both hovering around 70% CPU utilization.


When designing this app we chose to consider how we could scale the read load to our database, and settled on using Read Replicas and Caching. We then incorporated into our app the ability to use replicas and caching, but – and I think very importantly – we built it to be completely configurable: If no read replicas exist, the app works fine. If ElastiCache isn’t present (or if we disable it in the configuration) – the app still works. If there are 10 Read Replicas and no ElastiCache, the app uses the replicas. If there are no replicas but a 6-node ElastiCache cluster…well, you get the idea. As more customers use our application, we can scale with them.

Finally, we have the ability to optimize our application for both performance and cost. I can use the Simple Monthly Calculator to look up the cost of those Read Replicas or ElastiCache clusters and attribute an exact dollar amount per request served, or we can say how much we’re willing to pay to keep latency below 300ms. That’s a mighty powerful set of sliders we can use to dial in the cost/performance of our app.

Thank You!

I hope you’ve enjoyed this 5-part series! You can find links to the other posts in the series at, and don’t forget that we’ll be discussing this blog post – including your Q&A – during a live Office Hours Hangout at 9a Pacific on Thursday, May 8, 2014. Sign up at