AWS for M&E Blog

Moving your paywall to the edge with Amazon CloudFront

Publishers face two conflicting requirements when serving content: deliver content quickly by using a Content Delivery Network (CDN) and protect and monetize content using a paywall. The conflict arises because CDNs operate at the network edge, while paywall logic is usually available only at the originating website. Because the CDN does not know whether to serve content from cache, requests must go to the origin.

With Amazon CloudFront, publishers do not need to choose between a good user experience and monetization. Instead, they can combine their CDN and paywall in a single system. CloudFront makes this possible by invoking a compute resource at the network edge using Amazon Web Services (AWS) Lambda@Edge functions. Paywall logic can be applied immediately to provide a fast user experience.

The following diagram depicts the process, and this post walks you through each step.

Process diagram of the authorization and caching process using CloudFront and Lambda@Edge functions.

Figure 1. Paywall process using Lambda@Edge and CloudFront

Process overview

Every time CloudFront receives a request for content, it generates a unique identifier, or “cache key.” CloudFront then looks inside its cache for content stored under that key. If the content is cached, CloudFront returns it to the user without ever sending the request to the origin. This creates a fast user experience. If the content is not present in cache, CloudFront sends the request to the origin and, when it gets the response, stores the content under the cache key so that it can be reused later.

To support a paywall, CloudFront needs to know whether a user is authorized before serving content. Authorized users receive the full content item from cache. Unauthorized users receive only a preview. CloudFront does not create the different content versions. Instead, it retrieves the different versions from the originating web server and then stores them under different cache keys.

The next sections walk you through how CloudFront determines the user’s authorization status and creates cache keys. You can learn more about the implementation details by referring to the attached code sample.

Different paywall models

For simplicity, this post focuses on implementing a paywall based on subscriptions. However, the logic can be extended to support other paywall types. For example, the paywall could retrieve user information from a database and determine access based on a range of parameters. Review the sample code to see where the logic can be extended to meet your paywall needs.

System prerequisites

For this system to work, the publisher must do the following:

  • Create an authentication process based on JSON Web Tokens (JWTs)
  • Implement a URL convention that identifies content requests
  • Configure a CloudFront distribution

JWT-based authentication

After a user logs on, the authentication system will issue a JWT that identifies users and their subscriptions. JWTs are based on an open standard and include a digital signature to validate that they have not been modified. Publishers can select from many identity providers, including Amazon Cognito, to authenticate users and generate JWTs.

If the user has subscribed to “Product A” and “Product B,” the JWT could contain a claim that looks like this:

{
   “custom:subs”: “A,B”
} 

JSON snippet from identity token

URLs for content

The publisher must employ a URL convention that distinguishes content requests from other resource requests. For example, if the user is retrieving content item “123” from “Product A,” the URL could look like the following:

/product-a/content/1234/how-to-use-cloudfront/2022-06-22

CloudFront configuration

To handle content requests, publishers will create a distribution in CloudFront that is tied to their website domain. Within that distribution, the publisher defines a behavior that controls what CloudFront does when it receives a request. Note the following five behavior settings:

  • Path pattern: Identifies which URLs will invoke the behavior. In this case, you want the behavior to handle content requests only, so the pattern is set to “/*/content/*”. This will match URLs like “/product-a/content/1234/how-to-use-cloudfront/2022-06-22”.
    CloudFront console showing path pattern and origin settings

CloudFront console showing path pattern and origin settings

  • Origin: Specifies the source that returns content if the item is not found in the cache. For this demonstration, the origin is a Lambda function. In a real case, it will be your website.
  • Cache policy: Determines how CloudFront creates the cache key. The policy used in this demonstration bases the key on a request header that indicates whether or not the user has a subscription. This header is set by the Lambda@Edge function and is discussed in greater detail later in this post.

CloudFront console showing cache key and origin request policies

CloudFront console showing cache key and origin request policies

  • Origin request policy: Defines what information CloudFront sends to the origin if the content is not cached. Specifically, CloudFront sends the special header created by the Lambda@Edge function so that the origin knows whether the user is a subscriber.
  • Function associations: Identifies the Lambda@Edge function that executes paywall logic. The sample code will create the function for you, which will be triggered each time CloudFront receives a content request.
    CloudFront console showing Lambda@Edge function association

CloudFront console showing Lambda@Edge function association

Refer to the sample code for more details as well as documentation for creating a CloudFront distribution on the AWS website.

Request processing

Let’s follow the process that handles a request for the example URL previously provided:

/product-a/content/1234/how-to-use-cloudfront/2022-06-22

For this walk-through, let’s assume the user has authenticated and has a JWT stored in a cookie. Let’s also assume the user is requesting a piece of content that has not yet been cached in CloudFront. The following steps reflect the process shown in Figure 1.

Step 1: Viewer request

Because the URL matches the pattern “/*/content/*”, CloudFront routes the request to the Lambda@Edge function that executes custom code. The function does the following:

  • It validates the JWT to ensure it has not expired or been tampered with, then retrieves user’s subscriptions from the token.
  • It maps the product in the URL to a subscription code. In this example, all subscription codes are contained in the function. If a publisher has a large number of products that change frequently, it would be better to call a database such as DynamoDB to retrieve the information needed for the paywall logic.
  • If the JWT contains a subscription code matching the product in the URL, the function creates an “x-is-subscriber” request header and sets its value to “true.” Otherwise, it sets the header to “false.”

Step 2: CloudFront handles the updated viewer request

CloudFront checks its cache to see whether the requested content is present. For this purpose, it calculates the cache key using the domain, the path, and the “x-is-subscriber” header created in Step 1. CloudFront includes the “x-is-subscriber” header because this was defined in the cache policy. Because we are assuming this is the first request for content, CloudFront will send the request to the origin.

Step 3: Origin retrieves content

The origin uses the “x-is-subscriber” header set by the Lambda@Edge function to determine whether or not the user is authorized to view the content. The origin receives this header because the origin request policy specified that it should be included in the request.

If the “x-is-subscriber” header is set to “true,” the origin returns the full content requested. If the header is set to “false,” the origin returns only a preview.

Step 4: CloudFront handles the origin response

CloudFront now caches the response from the origin using the domain, URL path, and the “x-is-subscriber” header. Using this cache key validates that users will get only the content they are authorized to see. For example, the next time a subscriber requests this content item, CloudFront will return the version stored in the cache with “x-is-subscriber=true”. This will be the full content returned previously by the origin. If the user is not a subscriber, then CloudFront will return the preview-only version stored under “x-is-subscriber=false.”

Step 5: Client displays result

Finally, the client displays the result to the user. Note that if the same or another user requests the same content, the result will be served directly from cache, thus speeding up delivery while reducing the number of requests sent to the origin.

Screenshot of REST client. User has a subscription to “Product A”, and so the origin responds with “Full content” to emulate returning the full content item.

Screenshot of REST client. User has a subscription to “Product A”, and so the origin responds with “Full content” to emulate returning the full content item.

Code sample

To give you a sense of how to implement this solution, we provide sample code. To clean up any resources created by the sample, be sure to run the included clean up scripts.

Conclusion

This walk-through demonstrates how publishers can cache content in CloudFront while maintaining a low-latency paywall. When users request content, CloudFront immediately runs a Lambda@Edge function that checks whether the user is authorized. If the user has a subscription, CloudFront retrieves the full version of the content from cache. If the user is not authorized, CloudFront returns a limited view of the content.

CloudFront runs all the logic from the network edge and also returns content from the edge. In most cases, the only time requests are sent to the origin is when content is requested for the first time. By combining CloudFront and Lambda@Edge, content publishers do not need to make a compromise between a good user experience and monetization. To find out more about CloudFront, refer to the AWS website or talk to your AWS account representative.

Demian Hess

Demian Hess

Demian Hess is a Sr Partner Solutions Architect at AWS, focusing on Digital Publishing. With over 18 years of experience in the publishing industry, Demian has written extensively on Digital Asset Management, Content Management, and flexible metadata schemes using NoSQL and Semantic Web technologies.