Caching for high-volume workloads with Amazon ElastiCache

Introduction

In the last lesson, you created an application that allowed users to review restaurants and browse reviews left by other users. You used Amazon DynamoDB as the primary database for this application because of its performance at scale.

In this lesson, you see how to add caching to your restaurant-review application. Caching can help relieve pressure on your primary data store or reduce latency for popular workflows.

For this lesson, you use Amazon ElastiCache, a fully managed, in-memory caching solution provided by AWS. First, you learn why you would want to use ElastiCache. Then you work through the steps to create and configure an ElastiCache instance and use it in your application. At the end of this lesson, you should feel confident in your ability to know when and how to use ElastiCache in your application.

Time to complete: 3045 minutes

Purpose-built Databases - ElastiCache (17:05)
Why use ElastiCache?

Applications often need a durable, powerful database for their primary data storage. This could mean a relational database, such as Amazon Aurora, or it could mean a NoSQL database, such as DynamoDB or Amazon DocumentDB (with MongoDB compatibility). These databases provide powerful query capabilities with strong data guarantees that make them an excellent fit for your primary database.

But sometimes you need lower latency than you can get from a primary data store. Perhaps you have a common request flow in your application that is read significantly more often than it is written. For these situations, you may want to use an in-memory cache to help. You can save the results of frequently read queries to a cache to improve response times and reduce pressure on your primary database.

ElastiCache provides support for two open-source, in-memory cache engines: Redis and Memcached. When you use ElastiCache, you get a fully managed, in-memory cache. You aren’t responsible for instance failovers, backups and restores, or software upgrades. This frees you to innovate for your customers.

Lesson contents

In this lesson, you learn how to use ElastiCache as an in-memory cache for an existing application. This lesson has four steps.

  • 1. Create an AWS Cloud9 environment

    In this module, you create and prepare an AWS Cloud9 environment. AWS Cloud9 is a cloud-based integrated development environment (IDE). It gives you a fast, consistent development environment from which you can quickly build AWS applications.


    To get started, navigate to the AWS Cloud9 console. Choose Create environment to start the AWS Cloud9 environment creation wizard.

    Screenshot of AWS Cloud9
    (click to zoom)

    On the first page of the wizard, give your environment a name and a description. Then choose Next step.

    Give your environment a name and description
    (click to zoom)

    The next step allows you to configure environment settings, such as the instance type for your environment, the platform, and network settings.

    The default settings work for this lesson, so scroll to the bottom and choose Next step.

    Retain the default environment settings for this lesson
    (click to zoom)

    The last step shows your settings for review. Scroll to the bottom and choose Create environment.

    Choose "Create environment"
    (click to zoom)

    Your AWS Cloud9 environment should take a few minutes to provision. As it is being created, the following screen is displayed.

    The AWS Cloud9 environment as it is provisioning
    (click to zoom)

    After a few minutes, you should see your AWS Cloud9 environment. There are three areas of the AWS Cloud9 console to know, as illustrated in the following screenshot:

    • File explorer: On the left side of the IDE, the file explorer shows a list of the files in your directory.
    • File editor: In the upper right area of the IDE, the file editor is where you view and edit files that you’ve chosen in the file explorer.
    • Terminal: In the lower right area of the IDE, the terminal is where you run commands to execute code samples.
    The three main areas of the AWS Cloud9 console
    (click to zoom)

    In this lesson, you use Python to interact with your ElastiCache database. Run the following commands in your AWS Cloud9 terminal to download and unpack the module code.

    cd ~/environment
    curl -sL https://s3.amazonaws.com/aws-data-labs/restaurant-dynamodb.tar | tar -xv

    Run the following command in your AWS Cloud9 terminal to view the contents of your directory.

    ls

    You should see the following files in your AWS Cloud9 terminal:

    • cache.py
    • dynamodb.py
    • entities.py
    • fetch_restaurant_summary.py
    • items.json
    • README.md
    • requirements.txt
    • test_connection.py

    Run the following command in your terminal to install the dependencies for your application.

    sudo pip install -r requirements.txt


    In this module, you configured an AWS Cloud9 instance to use for development. In the next module, you create a Redis instance by using ElastiCache.

  • 2. Create a Redis instance by using ElastiCache

    In this module, you provision an ElastiCache instance by using the Redis engine. You also configure access to your Redis instance and test your connection from your AWS Cloud9 instance.


    First, navigate to the ElastiCache console. Choose Create to begin the instance-creation wizard.

    Choose "Create" to begin the instance-creation wizard
    (click to zoom)

    In the Cluster engine settings, choose Redis.

    Choose "Redis" in the "Cluster engine" settings
    (click to zoom)

    In the Redis settings section, give your Redis instance a name. Then change the Node type to cache.t2.micro and the Number of replicas to 0. In a production setting, you might choose to use replicas and a larger Node type.

    Redis settings to establish
    (click to zoom)

    In the Advanced Redis settings, you can leave the first few settings with their default options.

    Under the Security section, take note of the security group that is attached to your Redis instance. You will edit this after your cluster is created to allow access to Redis from your AWS Cloud9 instance.

    The default security group is chosen and works fine for this lesson.

    Take note of the security group attached to your Redis instance
    (click to zoom)

    The rest of the default settings are fine. Scroll to the bottom and choose Create to create your Redis instance.

    Choose "Create" to create your Redis instance
    (click to zoom)

    ElastiCache begins creating your Redis instance. While your instance is being created, it shows a Status of creating in the ElastiCache console.

    When your ElastiCache instance is ready, its Status is available.

    While your instance is being created, it shows a "Status" of "creating"
    (click to zoom)

    Next, you need to allow inbound access to the security group for your Redis instance from your AWS Cloud9 environment. This allows you to connect to your Redis instance.

    Navigate to the Security Groups page of the Amazon EC2 console. You should see the default security group you used for your ElastiCache instance as well as a security group for your AWS Cloud9 instance.

    Choose the default security group to see more details.

    Image showing the AWS Cloud9 security group and the default security group
    (click to zoom)

    After you choose the default security group, you should see details about your group as well as the inbound rules for your security group. Choose Edit inbound rules to edit the rules.

    Choose "Edit inbound rules"
    (click to zoom)

    Add an inbound rule to allow traffic to your Redis instance from your AWS Cloud9 instance. The Type of your rule should be Custom TCP with a Port range of 6379. For the Source, choose the security group used for your AWS Cloud9 instance.

    Your screen should look as follows.

    Add an inbound rule
    (click to zoom)

    Choose Save rules to save your new inbound security group rules.

    Finally, test the connection to your Redis instance from your AWS Cloud9 environment. In the ElastiCache console, find your instance and expand the details. Find the value for Primary endpoint and copy the value.

    Set the value of the endpoint in your AWS Cloud9 environment by running the following command in your terminal.

    export REDIS_HOSTNAME=<yourPrimaryEndpoint>

    Be sure to replace <yourPrimaryEndpoint> with the primary endpoint you copied from the ElastiCache console.

    There is a script called test_connection.py to test that you can connect to Redis. The contents of the script are as follows.

    import os
    
    import redis
    
    HOST = os.environ["REDIS_HOSTNAME"].replace(":6379", "")
    
    r = redis.Redis(host=HOST)
    
    r.ping()
    
    print("Connected to Redis!")

    The script uses Python's redis library to connect to Redis. It uses the hostname from the REDIS_HOSTNAME environment you set in your terminal to initialize a connection to Redis. It then runs a PING command against the Redis server. If the connection was successful, you should see console output indicating that you could connect. If you see an exception, review the preceding steps to ensure you set your environment variable correctly and allowed inbound access from your AWS Cloud9 environment.


    In this module, you created a Redis instance by using ElastiCache. You also configured the security group on your Redis instance to allow incoming network traffic from your AWS Cloud9 environment. Finally, you ran a test command in your AWS Cloud9 environment to ensure your configuration was correct.

    In the next module, you implement a cache-aside strategy in your Redis instance.

  • 3. Implement a cache-aside strategy with your instance

    In this module, you learn how to implement a cache-aside strategy with your Redis instance. You use this strategy to cache results from the Fetch Restaurant Summary access pattern in the restaurant-ratings service. This helps reduce the load on your database and speed up response times to your end users.


    There are a number of different strategies for implementing caching in applications. Two of the more popular strategies are cache-aside (or lazy caching) and write-ahead caching.

    With a cache-aside strategy, results are cached after you fetch them for the first time. For example, imagine you implement a cache-aside strategy for specific webpages. When a user requests a page for the first time, your application checks the cache for the cached data. Because it is the first request, the data does not exist in the cache. The application goes to the source database for the data and stores the results in the cache before returning to the user. On subsequent requests, the data is available in the cache and your application doesn’t need to go to the source database.

    A cache-aside strategy is good when data is being read much more often than it is being written. By caching the results, you can prevent a heavy load on your database. With a cache-aside strategy, you need to think carefully about how to evict or expire your cached data so that you don't show stale data to your users.

    The second strategy, write-ahead caching, is when you proactively add data to your cache. Rather than waiting for a user to request the data for the first time, you update data whenever it changes. This is a good pattern to use when the data being requested is derived from other sources and hard to calculate at query time. If you have a good understanding of when a change in the underlying data invalidates the cache, you can speed up your response times and ensure fresh data with a write-ahead strategy.

    In this module, you use a cache-aside strategy. When implementing the Fetch Restaurant Summary access pattern in the restaurant-ratings application, it requires reading and returning six items from DynamoDB. This data is likely to be valid for quite a while because restaurant page views are much more common than reviews. Accordingly, you can cache this data to reduce load on your DynamoDB table.

    There is a file named fetch_restaurant_summary.py in your AWS Cloud9 environment. Open the file in your file editor. The contents should look as follows.

    from dynamodb import fetch_restaurant_summary_from_database
    from cache import fetch_restaurant_summary_from_cache, store_restaurant_summary_in_cache
    
    
    def fetch_restaurant_summary(restaurant_name):
        restaurant = fetch_restaurant_summary_from_cache(restaurant_name)
        if restaurant:
            print("Using cached result!")
            return restaurant
    
        restaurant = fetch_restaurant_summary_from_database(restaurant_name)
        store_restaurant_summary_in_cache(restaurant)
    
        print("Using uncached result!")
    
        return restaurant
    
    
    restaurant = fetch_restaurant_summary("The Vineyard")
    
    print(restaurant)
    for review in restaurant.reviews:
        print(review)

    This file includes a function, fetch_restaurant_summary, that is similar to a function you would have in your application. The function takes a restaurant name and performs the following actions:

    1. It checks the Redis cache for the restaurant summary. If the data exists in the cache, it is returned to the user.
    2. If the data does not exist in the cache, the function retrieves the restaurant summary from DynamoDB. This would use code similar to that used to handle the Fetch Restaurant Summary access pattern in the DynamoDB lesson.
    3. After receiving the results from DynamoDB, the function caches the results in Redis for future requests.
    4. The summary is returned to the user.

    There is a file named cache.py that contains the code for interacting with the Redis cache. Open the file in your file editor. The contents look as follows.

    
    from entities import Restaurant, Review
    
    import json
    import os
    
    import redis
    
    HOST = os.environ["REDIS_HOSTNAME"].replace(":6379", "")
    
    r = redis.Redis(host=HOST)
    
    
    class ObjectEncoder(json.JSONEncoder):
        def default(self, o):
            return o.__dict__
    
    
    def store_restaurant_summary_in_cache(restaurant):
        key = restaurant.name
        r.set(key, json.dumps(restaurant, cls=ObjectEncoder), ex=900)
    
        return True
    
    
    def fetch_restaurant_summary_from_cache(restaurant_name):
        response = r.get(restaurant_name)
        if response:
            data = json.loads(response)
            restaurant = Restaurant(data)
            restaurant.reviews = [Review(review) for review in data["reviews"]]
            return restaurant
        return None

    There are two functions here. The first function is for storing a restaurant summary in Redis. It uses the restaurant name as the key name in Redis and turns the restaurant summary into a JSON string. Then it uses the Redis SET command to set the value of a string key in Redis. When doing so, it uses a Time to Live (TTL) value of 900 so that the value expires in 15 minutes. This helps to ensure your data doesn't get too stale.

    The second function is responsible for retrieving a restaurant summary from Redis. It takes a restaurant name and performs the Redis GET command to retrieve the value. It then parses the data, reconstitutes it into objects in your application, and returns the result.

    You can test this cache usage by running the following command in your terminal.

    python fetch_restaurant_summary.py

    The first time you run the command, you should see the following output.

    $ python fetch_restaurant_summary.py
    Using uncached result!
    Restaurant<The Vineyard -- Fine Dining>
    Review<The Vineyard -- markmartin (2020-05-24T11:59:44)>
    Review<The Vineyard -- kgraham (2020-05-14T15:01:52)>
    Review<The Vineyard -- ewilliams (2020-05-13T02:36:30)>
    Review<The Vineyard -- hannah21 (2020-05-04T03:44:26)>
    Review<The Vineyard -- john97 (2020-04-27T20:45:52)>

    Notice that the script indicates that it used an uncached result before printing out the restaurant and reviews.

    Run the script again to test your caching behavior.

    python fetch_restaurant_summary.py

    This time you should see the following output.

    $ python fetch_restaurant_summary.py
    Using cached result!
    Restaurant<The Vineyard -- Fine Dining>
    Review<The Vineyard -- markmartin (2020-05-24T11:59:44)>
    Review<The Vineyard -- kgraham (2020-05-14T15:01:52)>
    Review<The Vineyard -- ewilliams (2020-05-13T02:36:30)>
    Review<The Vineyard -- hannah21 (2020-05-04T03:44:26)>
    Review<The Vineyard -- john97 (2020-04-27T20:45:52)>

    You can see that the script indicates it's using a cached result this time. The output is still the same as the initial fetch from DynamoDB though. Based on the TTL set on the key (expiration value of 900 seconds), for the next 15 minutes these results stay in your Redis instance to serve cached results quickly.


    In this module, you learned about two popular caching strategies: cache-aside and write-ahead caching. Then you saw how to implement the cache-aside strategy in your application. The cache-aside strategy can be an easy way to boost application performance and reduce load on your primary database.

    In the next module, you clean up the resources you created in this lesson.

  • 4. Clean up the resources you created

    In this lesson, you created an in-memory cache instance by using ElastiCache. An in-memory cache is a great way to provide fast response times in your application and reduce the load. When you use ElastiCache, you get a fully managed experience that allows you to focus on delivering value for your users.

    In this module, clean up the resources you created in this lesson to avoid incurring additional charges.


    First, delete your ElastiCache instance. Navigate to the ElastiCache console. Find the ElastiCache instance you created for this module and select the check box. Then choose Delete to delete your instance.

    Delete your ElastiCache instance
    (click to zoom)

    In the confirmation window, decline the option to create a final backup for your Redis instance, and then choose Delete.

    Choose &quot;Delete&quot; to delete the cluster
    (click to zoom)

    Additionally, you need to delete your AWS Cloud9 development environment.

    To do so, navigate to the AWS Cloud9 console. Choose the environment you created for this lesson, and then choose Delete.

    Delete your AWS Cloud9 development environment
    (click to zoom)

    In this module, you learned how to clean up the ElastiCache instance and the AWS Cloud9 environment that you created in this lesson.

Caching is an important tool to keep your application a high performer. In this lesson, you saw how to create and use a cache to speed up commonly used access patterns in your application. You should feel confident in your ability to use a cache in the future.