Networking & Content Delivery

Amazon CloudFront Announces Cache and Origin Request Policies

Amazon CloudFront’s new Cache and Origin Request Policies give you more control over the way CloudFront uses request data to influence both the cache key and the request that is forwarded to the origin on a cache miss. This gives you more flexibility while enabling better control and efficiency of the caching that CloudFront performs. These settings already partially exist, but the cache key configuration is now more independent from the origin forwarding settings.  Previously, most of the forwarded data would automatically modify the cache key. Now, you can forward most request elements without affecting the cache key (unless you specifically want to). You can now configure any combination of headers, cookies, and query string parameters to be included or excluded from cache key consideration, or forwarded as needed.

In addition to the core configurability improvements, these options are now set using “Policies”. A Policy allows for the same specific combination of settings to be applied across any number of distribution behaviors. This saves setup time, reduces complexity, and allows teams to manage consistency across configurations.

CloudFront also provides several preconfigured system Policies. There are system Policies set for maximum cache retention, proxying dynamic transactions, and for common use cases and integrations with other AWS services. For example, there is a system Policy for personalized video streaming with AWS Elemental MediaPackage. You can create your own Policies for different content and application profiles and then apply them to any distributions and behaviors in your account.

What is the cache key and why does it matter?

First, let’s make sure we understand what the cache key is and how it’s constructed. The cache key is the way that CloudFront uniquely identifies every resource that is cached. By default, it consists of the CloudFront distribution hostname and the resource portion of the request URL (path, file name, and extension) as in this example:

GET /content/stories/example-story.html?ref=0123abc&split-pages=false HTTP/1.1
Host: d111111abcdef8.cloudfront.net
User-Agent: Mozilla/5.0 Gecko/20100101 Firefox/68.0
Accept: text/html,*/*
Accept-Language: en-US,en
Cookie: session_id=01234abcd
Referer: https://news.example.com/

The default cache key for the above request would contain:

  • The domain name of the CloudFront distribution  (d111111abcdef8.cloudfront.net)
  • The URL path and file name of the requested object (/content/stories/example-story.html)

Other values from the viewer request are not included in the cache key, by default. Consider the following HTTP request from a web browser. The default cache key would consist of the items in bold, while other elements present (headers, query string parameters, and cookies) would only be included by adding them to the cache key using a Cache Policy.

The combination of data in the cache key uniquely identifies each resource across the entire cache footprint. But, what if you have an application that serves up content that varies based on other “metadata” that can be provided in an HTTP request, using the same base URL (path, file name, extension)? This is where headers, query strings, and cookies come in.

Many modern applications use information like this to customize or personalize the resulting responses. This could be serving alternate versions of graphics or icons based on user or device characteristics, serving up different language versions of text based on client location, or rendering different outputs on a web page based on a cookie. There are infinite ways that this data can be used, but the key consideration is the need to differentiate between the data you want to send to the origin application server, and the specific elements that actually determine whether your application serves and caches a different version of the object using the same base URL.

This is where being able to separate out the forwarding behavior from the cache key modification behavior is critical. By not including the right elements in the cache key, CloudFront may ignore legitimate variants, or it may end up caching the same file multiple times under different names (cache key values). The first scenario can result in the application not working as expected. The second scenario often results in less efficient use of CloudFront caching, which can affect performance. Also, keep in mind that every unique combination of all the values of all the elements included in the cache key becomes the number of different unique resources (or copies of the same resource) that is cached. If those permutations are not actually resulting in different resources being served by your origin, or if the permutations result in tens of thousands or more combinations, each of which is only receiving a small number of repeated requests, you should probably consider a different strategy.  There are several approaches you can take in this situation. You can exclude these high-cardinality elements from the cache key using a Cache Policy.  You could also proxy the requests by marking them non-cacheable with Default TTL = 0 or Max TTL = 0 settings in the policy. Or, you could override caching using using origin-supplied cache-control headers such as cache-control = “no-cache” or “no-store.” (See the CloudFront Developer Guide for more information on how to do this.)

What are cache and origin request policies?

Policies are a new concept for CloudFront and can be thought of as templates of configuration information that can be applied to any number of distribution behaviors in your account. Policies allow you to define standards that can be applied to similar content or application use cases where the characteristics of how you want CloudFront to cache or forward request information to your origin are the same. This reduces repetition and enforces consistency across properties, teams, and workflows. Cache Policies allow you to control how CloudFront caches content. Origin Request Policies allow you to control the types of data that are included in the request to the origin on a cache miss.

Policies are created and configured in the CloudFront console using a new set of screens. These are accessed either from the Policies menu item on the left-hand navigation panel, or by selecting the “Create a new Policy” button from within the create/edit behavior screen as described in the “Applying Policies to a Behavior” section below. Each Policy type is distinct and each has a list screen where all of the existing Policies in the account can be viewed, a view screen where the details of the Policy can be viewed but not edited, and an edit/create screen in which the values for the Policy can be configured or changed.

Policies List screen and navigation

Cache policies

Cache Policies govern how CloudFront caches content, including setting how long CloudFront caches objects before revalidating with the origin (TTLs), how CloudFront uses HTTP headers, query string parameters and cookies to cache variants of content, and how CloudFront treats caching of compressed variants of resources.

For Cache Policies, the following options are available:

Name – required. Select a unique and descriptive name for your Cache Policy. This value is what appears in the drop down selection field in the Behavior screen.

Comment – optional. Enter any additional descriptive text to help you organize your Policies. This field is not shown during selection.

TTL Settings – these values control how long CloudFront caches objects in conjunction with other explicit origin-supplied cache-control directives.

  • Minimum TTL – The minimum amount of time that an object remains in cache without a revalidation (if modified since) request back to the origin.
  • Maximum TTL – The maximum amount of time that an object remains in cache without a revalidation (if modified since) request back to the origin.
  • Default TTL – The default amount of time to keep an object in cache if no other cache control directives are supplied by the origin.

Minimum and Maximum values work with origin-supplied cache control headers (such as max-age, s-maxage, and expires) and provide a governor that regulates the minimum and maximum values that those directives can enforce in the CloudFront cache.  For more information about how TTL settings work with Origin-supplied cache-control headers, refer to this section of the CloudFront Developers Guide.

Cache key contents – the following values can be used to determine how CloudFront uses additional request metadata such as headers, query strings, and cookies to cache content variants. Indicate which of these elements your origin or application used to determine different content to serve back for the same base URL. For example, you may vary HTML page content based on an Accept-Language header. You might serve different image variants based on user-agent or device-type headers supplied by the client or by CloudFront. Remember that values specified in the Cache Key are automatically forwarded to the origin. Since it is presumed that, if you are using it as a cache key modifier, your origin must see it in order to generate the proper variants.

Headers

  • None – Do not include any additional headers in the cache key.
  • Whitelist – Specify headers to include in the cache key.  Choose from the list of predefined common headers or type your own custom headers.

Querystrings

  • All – Include values of all query string parameters in the cache key.
  • None – Do not use any of the query string parameter values in the cache key.
  • Whitelist – Include only the values of the specified query string parameters in the cache key.
  • All-except – Include all except the listed query string parameters in the cache key.

Cookies

  • All – Include all cookies present in the request in the cache key.
  • None – Do not use any of the cookies in the cache key.
  • Whitelist – Include only the specified cookies in the cache key.
  • All-except – Include all except the listed cookies in the cache key.

Compressed object caching

This check box governs how CloudFront caches GZIP compressed variants that either your origin or CloudFront can generate. When this check box is selected, if a GZIP compressed variant of the object is available, it is cached.  De-selecting the check box for a particular compression type means that CloudFront does not cache that variant. This setting is independent of (but related to) the setting for CloudFront to perform edge GZIP compression that is configured elsewhere. If edge compression is enabled, make sure that this check box is also checked if you want the CloudFront-generated compressed version to be cached. This is the recommended behavior, since if you are asking CloudFront to perform the compression you should cache the result of that operation. If you are already compressing resources at the origin, make sure you check this box if you want CloudFront to cache both the compressed and uncompressed versions.

Origin Request policies

Origin Request Policies govern how CloudFront transmits request-time metadata to your origin when an origin request is made (a cache miss or revalidation). This feature has been renamed from Header Forwarding to Origin Request, since there is certain metadata that is generated by CloudFront and is not strictly speaking a forwarding operation, since it is not directly supplied by the client. Examples of this are Geo Headers and Device Type headers that CloudFront can generate from client-supplied data like the IP address and User-Agent header. Origin Request Policies allow for the configuration of which headers, query string parameters and cookies CloudFront should send to the origin.

Accepted values for each of these fields are described in the following table:

Headers

  • None – Do not forward any optional headers to the origin.
  • Whitelist – Forward specific listed headers to the origin. Choose from the list of predefined common headers or type your own custom headers.
  • All Viewer headers and whitelisted CloudFront-* Headers – Forward all headers presented by the client and the selected CloudFront generated headers
  • All viewer headers – Forward all headers presented by the client, but not CloudFront generated headers.

Query strings

  • All – Forward all query string parameters present in the request URL .
  • None – Do not forward any of the query string parameters.
  • Whitelist – Forward only the specified query string parameters.

Cookies

  • All – Forward all cookies present in the request.
  • None – Do not forward any of the cookies.
  • Whitelist – Forward only the specified cookies.

Applying Policies to a Behavior

In order to activate a policy, you apply the policy to a Distribution Behavior. The following is a screenshot of the updated Create/Edit Behavior screen with the enablement option highlighted. You can see that the previous functionality can still be enabled with the “Use legacy object caching” setting, and the new functionality is enabled with the “Use Cache Policy and Origin Request Policy” setting. It’s important to note that this new functionality is opt-in.  Any existing CloudFront configurations continue functioning exactly as they do today, unless and until you decide to change them over to this new style. This was done to ensure that no customer applications were disturbed and no sudden changes in the way that CloudFront is caching your content are introduced unless you take explicit action. For new distributions, the Cache Policy and Origin Request Policy mode will be the default in the console workflow after launch. If you are in a mixed console/API configuration environment, make sure that if you use the console to activate the new functionality, that you also upgrade all your API/SDK implementations to the newest version so that they are compatible with the new feature. Due to the improved configurability, we highly encourage customers to actively migrate to the new method.

When you select the “Use Cache Policy and Origin Request Policy” mode, you see the Policy selection dropdown lists appear where you can select from the existing Policies configured in your account.

Policies must exist to be applied to Behaviors

A Policy must exist before it can be attached to a distribution behavior. In the case of console-based administration this means you need to use the Policy creation screens to create the policies you need before creating the distribution behaviors that will require them. If you are using the API or other automation workflows, you must ensure the Policy you intend to use in any behavior already exists. You can then either retrieve the correct Policy ID using one of the ListPolicies APIs, or maintain a separate repository of the available Policies using whatever automation tools you prefer. But, the CreateDistribution and UpdateDistribution APIs require that you identify a specific Policy ID when you perform that action.

Predefined managed policies

We have provided a predefined set of managed system Policies for common defaults, such as maximizing cache retention times and disabling caching for dynamic proxy use cases. We have also created policies implementing common defaults for other AWS services, such as Amazon S3 and AWS Elemental Media Services. You see this in the Policy drop-down list and typically uses the prefix “Managed-“ to indicate the system-supplied managed Policies.

Other examples

Over time, we’ve seen numerous cases in which the new functionality could be useful for customers. Examples such as:

  • Forwarding information such as the User-Agent to the origin for analytics/logging but without serving different content variants based on device type (now you can forward the user-agent header and exclude it from the cache-key)
  • Forwarding CloudFront’s custom device or geo headers but not including them in the cache key
  • Forwarding authentication information in headers or querystring parameters that allow you to protect your content with authentication logic but not cache different versions of the objects based on that data.
  • Including data in your URLs in a querystring or in headers that are used for redirection, URL shortening, URL rewriting, or other uses either at the origin or in Lambda@Edge functions, but ensuring that they do not affect caching of the resulting content.
  • Custom authentication logic in which querystring-based tokens are needed but do not affect the underlying content being cached.
  • Using Include or Exclude logic in establishing policies depending on which represents a more manageable list of parameters.

Conclusion

CloudFront provides flexibility in how cache keys are constructed and in how request metadata is transmitted to the origin on cache misses. With these new Policy options, you can create configurations that are highly specific in the data that you receive and process in your origin application logic and still ensure that you are not generating unnecessary duplicate cached objects.  We’ve also heard feedback that the introduction of policies, while a change to the workflow, is useful for distributed teams maintaining multiple web applications to better enforce consistency of configurations and where administration of the CDN configuration is not managed directly by development teams. In cases like this, pre-configured standards can be applied by developers without having to manage the policies themselves. For additional information on this feature, please see the CloudFront Developers Guide.

 

Ted Middletonr

About the author

Ted Middleton is the global leader of the Edge Specialized Solutions Architect team for AWS and a former Principal Product Manager in the CloudFront team. He has over 20 years of experience in CDN and Edge services.