Caching Overview

Caching helps applications perform dramatically faster and cost significantly less at scale

What is Caching?

In computing, a cache is a high-speed data storage layer which stores a subset of data, typically transient in nature, so that future requests for that data are served up faster than is possible by accessing the data’s primary storage location. Caching allows you to efficiently reuse previously retrieved or computed data.

How does Caching work?

The data in a cache is generally stored in fast access hardware such as RAM (Random-access memory) and may also be used in correlation with a software component. A cache's primary purpose is to increase data retrieval performance by reducing the need to access the underlying slower storage layer.

Trading off capacity for speed, a cache typically stores a subset of data transiently, in contrast to databases whose data is usually complete and durable.

Getting Started with Caching: Turbocharging Your Application Workloads

Caching Overview

RAM and In-Memory Engines: Due to the high request rates or IOPS (Input/Output operations per second) supported by RAM and In-Memory engines, caching results in improved data retrieval performance and reduces cost at scale. To support the same scale with traditional databases and disk-based hardware, additional resources would be required. These additional resources drive up cost and still fail to achieve the low latency performance provided by an In-Memory cache.

Applications: Caches can be applied and leveraged throughout various layers of technology including Operating Systems, Networking layers including Content Delivery Networks (CDN) and DNS, web applications, and Databases. You can use caching to significantly reduce latency and improve IOPS for many read-heavy application workloads, such as Q&A portals, gaming, media sharing, and social networking. Cached information can include the results of database queries, computationally intensive calculations, API requests/responses and web artifacts such as HTML, JavaScript, and image files. Compute-intensive workloads that manipulate data sets, such as recommendation engines and high-performance computing simulations also benefit from an In-Memory data layer acting as a cache. In these applications, very large data sets must be accessed in real-time across clusters of machines that can span hundreds of nodes. Due to the speed of the underlying hardware, manipulating this data in a disk-based store is a significant bottleneck for these applications.

Design Patterns: In a distributed computing environment, a dedicated caching layer enables systems and applications to run independently from the cache with their own lifecycles without the risk of affecting the cache. The cache serves as a central layer that can be accessed from disparate systems with its own lifecycle and architectural topology. This is especially relevant in a system where application nodes can be dynamically scaled in and out. If the cache is resident on the same node as the application or systems utilizing it, scaling may affect the integrity of the cache. In addition, when local caches are used, they only benefit the local application consuming the data. In a distributed caching environment, the data can span multiple cache servers and be stored in a central location for the benefit of all the consumers of that data.

Caching Best Practices: When implementing a cache layer, it’s important to understand the validity of the data being cached. A successful cache results in a high hit rate which means the data was present when fetched. A cache miss occurs when the data fetched was not present in the cache. Controls such as TTLs (Time to live) can be applied to expire the data accordingly. Another consideration may be whether or not the cache environment needs to be Highly Available, which can be satisfied by In-Memory engines such as Redis. In some cases, an In-Memory layer can be used as a standalone data storage layer in contrast to caching data from a primary location. In this scenario, it’s important to define an appropriate RTO (Recovery Time Objective--the time it takes to recover from an outage) and RPO (Recovery Point Objective--the last point or transaction captured in the recovery) on the data resident in the In-Memory engine to determine whether or not this is suitable. Design strategies and characteristics of different In-Memory engines can be applied to meet most RTO and RPO requirements.

Layer Client-Side DNS Web App Database
Use Case

Accelerate retrieval of web content from websites (browser or device)

Domain to IP Resolution Accelerate retrieval of web content from web/app servers. Manage Web Sessions (server side) Accelerate application performance and data access Reduce latency associated with database query requests
Technologies HTTP Cache Headers, Browsers DNS Servers HTTP Cache Headers, CDNs, Reverse Proxies, Web Accelerators, Key/Value Stores Key/Value data stores, Local caches Database buffers, Key/Value data stores
Solutions Browser Specific Amazon Route 53 Amazon CloudFrontElastiCache for RedisElastiCache for MemcachedPartner Solutions Application Frameworks, ElastiCache for RedisElastiCache for MemcachedPartner Solutions  ElastiCache for RedisElastiCache for Memcached

Caching with Amazon ElastiCache

Amazon ElastiCache is a web service that makes it easy to deploy, operate, and scale an in-memory data store or cache in the cloud. The service improves the performance of web applications by allowing you to retrieve information from fast, managed, in-memory data stores, instead of relying entirely on slower disk-based databases. Learn how you can implement an effective caching strategy with this technical whitepaper on in-memory caching.

Benefits of Caching

Improve Application Performance

Because memory is orders of magnitude faster than disk (magnetic or SSD), reading data from in-memory cache is extremely fast (sub-millisecond). This significantly faster data access improves the overall performance of the application.

Reduce Database Cost

A single cache instance can provide hundreds of thousands of IOPS (Input/output operations per second), potentially replacing a number of database instances, thus driving the total cost down. This is especially significant if the primary database charges per throughput. In those cases the price savings could be dozens of percentage points.

Reduce the Load on the Backend

By redirecting significant parts of the read load from the backend database to the in-memory layer, caching can reduce the load on your database, and protect it from slower performance under load, or even from crashing at times of spikes.

Predictable Performance

A common challenge in modern applications is dealing with times of spikes in application usage. Examples include social apps during the Super Bowl or election day, eCommerce websites during Black Friday, etc. Increased load on the database results in higher latencies to get data, making the overall application performance unpredictable. By utilizing a high throughput in-memory cache this issue can be mitigated.

Eliminate Database Hotspots

In many applications, it is likely that a small subset of data, such as a celebrity profile or popular product, will be accessed more frequently than the rest. This can result in hot spots in your database and may require overprovisioning of database resources based on the throughput requirements for the most frequently used data. Storing common keys in an in-memory cache mitigates the need to overprovision while providing fast and predictable performance for the most commonly accessed data.

Increase Read Throughput (IOPS)

In addition to lower latency, in-memory systems also offer much higher request rates (IOPS) relative to a comparable disk-based database. A single instance used as a distributed side-cache can serve hundreds of thousands of requests per second.

Use Cases & Industries

  • Use Cases
  • Learn about various caching use cases

    Database Caching

    The performance, both in speed and throughput that your database provides can be the most impactful factor of your application’s overall performance. And despite the fact that many databases today offer relatively good performance, for a lot use cases your applications may require more. Database caching allows you to dramatically increase throughput and lower the data retrieval latency associated with backend databases, which as a result, improves the overall performance of your applications. The cache acts as an adjacent data access layer to your database that your applications can utilize in order to improve performance. A database cache layer can be applied in front of any type of database, including relational and NoSQL databases. Common techniques used to load data into you’re a cache include lazy loading and write-through methods. For more information, click here.

    Content Delivery Network (CDN)

    When your web traffic is geo-dispersed, it’s not always feasible and certainly not cost effective to replicate your entire infrastructure across the globe. A CDN provides you the ability to utilize its global network of edge locations to deliver a cached copy of web content such as videos, webpages, images and so on to your customers. To reduce response time, the CDN utilizes the nearest edge location to the customer or originating request location in order to reduce the response time. Throughput is dramatically increased given that the web assets are delivered from cache. For dynamic data, many CDNs can be configured to retrieve data from the origin servers.

    Amazon CloudFront is a global CDN service that accelerates delivery of your websites, APIs, video content or other web assets. It integrates with other Amazon Web Services products to give developers and businesses an easy way to accelerate content to end users with no minimum usage commitments. To learn more about CDNs, click here.

    Domain Name System (DNS) Caching

    Every domain request made on the internet essentially queries DNS cache servers in order to resolve the IP address associated with the domain name. DNS caching can occur on many levels including on the OS, via ISPs and DNS servers.

    Amazon Route 53 is a highly available and scalable cloud Domain Name System (DNS) web service.

    Session Management

    HTTP sessions contain the user data exchanged between your site users and your web applications such as login information, shopping cart lists, previously viewed items and so on. Critical to providing great user experiences on your website is managing your HTTP sessions effectively by remembering your user’s preferences and providing rich user context. With modern application architectures, utilizing a centralized session management data store is the ideal solution for a number of reasons including providing, consistent user experiences across all web servers, better session durability when your fleet of web servers is elastic and higher availability when session data is replicated across cache servers.

    For more information, click here.

    Application Programming Interfaces (APIs)

    Today, most web applications are built upon APIs. An API generally is a RESTful web service that can be accessed over HTTP and exposes resources that allow the user to interact with the application. When designing an API, it’s important to consider the expected load on the API, the authorization to it, the effects of version changes on the API consumers and most importantly the API’s ease of use, among other considerations. It’s not always the case that an API needs to instantiate business logic and/or make a backend requests to a database on every request. Sometimes serving a cached result of the API will deliver the most optimal and cost-effective response. This is especially true when you are able to cache the API response to match the rate of change of the underlying data. Say for example, you exposed a product listing API to your users and your product categories only change once per day. Given that the response to a product category request will be identical throughout the day every time a call to your API is made, it would be sufficient to cache your API response for the day. By caching your API response, you eliminate pressure to your infrastructure including your application servers and databases. You also gain from faster response times and deliver a more performant API.

    Amazon API Gateway is a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale.

    Caching for Hybrid Environments

    In a hybrid cloud environment, you may have applications that live in the cloud and require frequent access to an on-premises database. There are many network topologies that can by employed to create connectivity between your cloud and on-premises environment including VPN and Direct Connect. And while latency from the VPC to your on-premises data center may be low, it may be optimal to cache your on-premises data in your cloud environment to speed up overall data retrieval performance.

    Web Caching

    When delivering web content to your viewers, much of the latency involved with retrieving web assets such as images, html documents, video, etc. can be greatly reduced by caching those artifacts and eliminating disk reads and server load. Various web caching techniques can be employed both on the server and on the client side. Server side web caching typically involves utilizing a web proxy which retains web responses from the web servers it sits in front of, effectively reducing their load and latency. Client side web caching can include browser based caching which retains a cached version of the previously visited web content. For more information on Web Caching, click here.

    General Cache

    Accessing data from memory is orders of magnitude faster than accessing data from disk or SSD, so leveraging data in cache has a lot of advantages. For many use-cases that do not require transactional data support or disk based durability, using an in-memory key-value store as a standalone database is a great way to build highly performant applications. In addition to speed, application benefits from high throughput at a cost-effective price point. Referenceable data such product groupings, category listings, profile information, and so on are great use cases for a general cache. For more information on general cache, click here.

    Integrated Cache

    An integrated cache is an in-memory layer that automatically caches frequently accessed data from the origin database. Most commonly, the underlying database will utilize the cache to serve the response to the inbound database request given the data is resident in the cache. This dramatically increases the performance of the database by lowering the request latency and reducing CPU and memory utilization on the database engine. An important characteristic of an integrated cache is that the data cached is consistent with the data stored on disk by the database engine.

  • Industries
  • Learn about different industries and the various caching use cases


    Mobile applications are an incredibly fast growing market segment given the rapid consumer device adoption and the decline in use of traditional computer equipment. Whether it be for games, commercial applications, health applications, and so on, virtually every market segment today has a mobile friendly application. From an application development perspective, building mobile apps is very similar to building any other form of application. You have the same areas of concern, your presentation tier, business tier and data tier. While your screen real estate and development tools are different, delivering a great user experience is a shared goal across all applications. With effective caching strategies, your mobile applications can deliver the performance your users expect, scale massively, and reduce your overall cost.

    The AWS Mobile Hub is a console that provides an integrated experience for discovering, configuring, and accessing AWS cloud services for building, testing, and monitoring usage of mobile apps.

    Internet of Things (IoT)

    The Internet of Things is a concept behind gathering and delivering information from a device and the physical world via device sensors to the internet or application consuming the data. The value of IoT is being able to understand the captured data at near real time intervals which ultimately allows the consuming system and applications the ability to respond rapidly to that data. Take for example, a device that transmits its GPS coordinates. Your IoT application could respond by suggesting points of interest relative to the proximity of those coordinates. Furthermore, if you had stored preferences related to the user of the device, you could fine tune those recommendations tailored to that individual. In this particular example, the speed at which the application can respond to the coordinates is critical to achieving a great user experience. Caching can play an important role here, for example, the points of interest along with the geo coordinates could be stored in a key/value store such as Redis to enable fast retrieval. From an application development perspective, you can essentially code your IoT application to respond to any event given there is a programmatic means to do so. Important considerations to be made when building an IoT architecture include the response time involved with analyzing the ingested data, architecting a solution that can scale N number of devices and delivering an architecture that is cost-effective.

    AWS IoT is a managed cloud platform that lets connected devices easily and securely interact with cloud applications and other devices.

    Further Reading: Managing IoT and Time Series Data with Amazon ElastiCache for Redis

    Advertising Technology

    Modern Ad Tech applications are particularly demanding in terms of performance. An example of a significant area of growth in AdTech is real-time bidding (RTB), which is the auction-based approach for transacting digital display ads in real time, at the most granular impression level. RTB was the dominant transaction method in 2015, accounting for 74.0 percent of programmatically purchased advertising, or 11 billion dollars in the US (according to eMarketer Analysis). When building a real-time bidding app, a millisecond can be the difference between submitting the bid on time and it becoming irrelevant. This means that getting the bidding information from the database must be extremely fast. Database caching, which can access bidding details in sub milliseconds, is a great solution for achieving that high performance.


    Interactivity is a cornerstone requirement for almost any modern game. Nothing frustrates players more than a slow or unresponsive game, and those rarely become successful. The requirement on performance is even more demanding for mobile multiplayer games, where an action that any one player takes needs to be shared with others in real time. Caching plays a crucial role in keeping the game smooth by providing sub-millisecond query response for frequently accessed data. It is also helpful to alleviate hot key issues when the same data is queried multiple times, such as “who are the current top 10 players by score?”

    To learn more about developing games on AWS click here.


    Media companies often deal with the need to transmit a large amount of static content to their customers with a constantly changing number of readers/viewers. An example is a video streaming service such as Netflix or Amazon Video, which streams a large amount of video content to the viewers. This is a perfect fit for a Content Delivery Network, where data is stored on a globally distributed set of caching servers. Another aspect of media applications is that load tends to be spikey and unpredictable. Imagine a blog on a website that a celebrity just tweeted about, or the website of a Football team during the Super Bowl. Such a large spike of demand to a small subset of content is a challenge to most databases since they are limited in their per-key throughput. Since memory has a much higher throughput than disk, a database cache would resolve the issue by redirecting the reads to the in memory cache.


    Modern eCommerce applications are becoming more sophisticated, offering personalized shopping experience, including real-time recommendations based on a user’s data and shopping history. Those often also include looking at a user’s social network and providing the recommendation based on what her friends liked or purchased. While the amount of data needed to process is increasing, customer’s patience is not. Therefore, keeping the application performing in real-time is not a luxury, but a necessity; a well-executed caching strategy is a critical aspect of the application performance, and could be the difference between an application’s success or failure, between making a sale or losing a customer.

    Social Media

    Social Media apps have taken the world by storm. Social networks like Facebook, Twitter, Instagram and Snapchat have a vast number of users who consume ever growing amount of content. When a user opens her feed, she expects to see her latest personalized content in near real-time. That is not static content since each user has different friends, images, interests, etc., exacerbating the engineering complexity needs of the underlying platform. Social Media apps are also very prone to spikes in usage around major entertainment, sports, and political events. Such spike resiliency and real-time performance are achieved through multiple layers of caching – including Content Delivery Network for the static content such as background images, session cache for keeping track of a user’s current session data, and database cache for keeping frequently accessed data such as latest news from closest friends and the last few images handy.

    Healthcare and Wellness

    The healthcare industry is going through a digital revolution, making care both available and accessible to more and more patients around the world. Some applications allow patients to see Doctors for video consultations, and most major providers have apps that allow patients to see their test results and interact with the medical staff. On the wellness side, there is a plethora of applications that range from tracking a user’s specific sensor activity (e.g. FitBit and Jawbone), to comprehensive wellness coaching and data. Given the interactive nature of those apps, the need for fast-performing applications, business, and data tiers must be addressed. With an effective caching strategy you will be able to provide fast performance, reduce the overall infrastructure costs, and scale as your usage grows.

    To learn more about building Healthcare apps on AWS click here.

    Finance and Financial Technology

    The way we consume financial services has evolved dramatically over the recent years. Applications include accessing banking and insurance services, fraud detection, investment services, optimizing capital markets via real-time algorithms, and more. Providing real-time access to a customer’s financial data, allowing him to make transactions such as transferring money, or making payments is challenging. First, similar constraints apply as in other applications where a user wants to interact with the app in near real-time. In addition, financial applications may impose additional requirements such as increased security and fraud detection. An efficient architecture, including multi-layer caching strategy, is critical to achieve the performance expected by users. Based on the application needs, the caching layers would include a session cache for storing a user’s session data, a Content Delivery Network for serving static content, and a database cache for frequently accessed data such as the customer’s 10 most recent purchases.

    To learn more about Financial Services apps on AWS click here.

Get started with Amazon ElastiCache

Step 1 - Sign up for an Amazon Web Services account

Sign up for an AWS account

Instantly get access to the AWS Free Tier.

Learn with simple tutorials

Explore how to create a Redis cluster.

Start building

Begin building with help from the user guide.