Amazon ElastiCache – Distributed In-Memory Caching
Today we are introducing Amazon ElastiCache so that you can easily add caching logic to your application. You can now create Cache Clusters, each comprised of one or more Cache Nodes, in a matter of minutes. Each Cache Cluster is a distributed, in-memory cache that can be accessed using the popular Memcached protocol.
You can often make your application run faster by caching critical pieces of data in memory. Information that is often cached includes the results of time-consuming database queries or the results of complex calculations.
Suppose that your application includes a function called Calculate, and that it accepts two parameters, and that it is an actual function in the mathematical sense, where there’s precisely one output for each input. The non-cached version of Calculate would look like this:
function Calculate(A, B)
C = [some lengthy calculation dependent on A and B];
If numerous calls to Calculate are making your application run too slowly, you can cache previous answers like this:
function CachedCalculate(A, B)
C = Cache.Get(“Calculate”, A, B);
if (C == null)
C = Calculate(A, B);
Cache.Put(“Calculate”, A, B, C);
In this example, the Cache keys are the string “Calculate” and the values of A and B. In practice these three values are generally combined into a single string key. The Cache will store previously computed values. Implicit in this example is the assumption that it takes more time to perform the calculation than it does to check the cache. Also implicit is the fact that the cache can expire or evict values if they become too old or if the cache becomes full.
You can also cache the results of database queries. The tradeoffs here can be a little bit more complicated and will often involve the ratio of reads to writes for a given query or for the tables referenced in the query. If you are implementing your own social network, it would be worthwhile to cache each user’s list of friends if this information is required with great regularity (perhaps several times per minute) but changes infrequently (hourly or daily). In this case your cache key would include the name of the query and the user name; something like “getfriends_jeffbarr.” In order to make sure that the cache does not contain outdated information, you would invalidate the data stored under this key each time you alter the friend list for a particular user. I don’t have room to list all of the considerations; for more information check out the following articles on the High Scalability blog:
- A Bunch of Great Strategies for Using Memcached and MySQL Better Together
- Secrets to Fotolog’s Scaling Success
If you are already running Memcached on some Amazon EC2 instances, you can simply create a new cluster and point your existing code at the nodes in the cluster. If you are not using any caching, you’ll need to spend some time examining your application architecture in order to figure out how to get started. Memcached client libraries exist for just about every popular programming language.
You will need to learn a few new terms in order to fully understand and appreciate ElastiCache. Here is a quick reference:
- A Cache Security Group regulates access to the Cache Nodes in a Cache Cluster.
- A Cache Cluster is a collection of Cache Nodes. Each cluster resides in a particular AWS Availability Zone.
- A Cache Node is a processing and storage unit within a Cache Cluster. The size of a cluster can be increased or decreased as needed. Each node runs a particular version of a Cache Engine. Amazon ElastiCache supports nodes with cache sizes ranging from 6 to 67 GB. A DNS name is assigned to each Cache Node when it is created.
- A Cache Engine implements a caching protocol, algorithm, and strategy. The initial release of Amazon ElastiCache supports version 1.4.5 of Memcached.
- A Cache Parameter Group holds a set of configuration values that are specific to a particular type and version of a Cache Engine.
Here is how it all fits together:
Creating a Cluster Using the Console
The AWS Management Console includes complete support for Amazon ElastiCache. Let’s walk through the process of creating a cluster.
The first step is to create a Cache Security Group. Each such group allows access to the cluster from the EC2 instances associated with one or more EC2 Security Groups. The EC2 security groups are identified by name and AWS Account Id:
Next, we can create the Cache Cluster. The console makes this quick and easy using a wizard. Push the button to get started:
First, name the cluster, choose the node type, and set the number of nodes. You can also set the port and the Availability Zone, and you can choose to receive notification from Amazon SNS on the topic of your choice. You can also give Amazon ElastiCache permission to automatically perform upgrades to the Cache Engine when a new minor version is available:
Next, you can select one or more Cache Security Groups, and a Cache Parameter Group. You can also specify a maintenance window during which Amazon ElastiCache will install patches and perform other pending modifications to the cluster.
Finally, confirm your selections and launch the cluster:
The cluster will be up and running within a few minutes. Once it is ready, you can copy the list of endpoints and use them to configure your application (you can also retrieve this information programmatically using the Amazon ElastiCache APIs):
You can click on any of your clusters to see a description of the cluster:
The Nodes tab contains information about each of the Cluster Nodes in the selected cluster:
Each Cache Node reports a number of metrics to Amazon CloudWatch. You can watch these metrics to measure the efficacy of your caching strategy. The metrics should also give you the information that you need to make sure that you have enough memory devoted to caching.
You can also inspect each of your Cache Parameter Groups. The groups can be modified using the Amazon ElastiCache APIs or from the command line.
Caching in Action
Once you have launched your cluster, you can configure the DNS names of the nodes into the client library of your choice. At present this is a manual copy and paste process. However, over time, I expect some of the client libraries to add Amazon ElastiCache support and thereby obviate this configuration step.
Your application can elect to receive an Amazon SNS (Simple Notification Service) notification when a cluster is created, or when nodes are added to or removed from an existing cluster.
You should definitely watch the CloudWatch metrics for your Nodes, and you should adjust the type and number of nodes as necessary.
Client Libraries and Node Selection
Most of the client libraries treat the cluster as a unit. In other words, you direct your Put and Get requests to the cluster and the library will algorithmically choose a particular node. The libraries do this using a hash function to spread the data out across the nodes.
If you plan to dynamically resize your cluster, you need to make sure that you client library uses a consistent hash function. A function of this type produces results that will remain valid even as the size of the cluster changes. Ketama is a popular consistent hashing algorithm for Memcached; you can read all about it here.
Watch the Movie
AWS Evangelist Simone Brunozzi has produced a completed demonstration of Amazon ElastiCache in action: