How Arc XP lowered data transfer costs by $500k per year with Amazon CloudFront and Lambda@Edge on AWS
The Washington Post, an American daily newspaper company, delivers digital news content using Arc XP’s digital experience platform. Arc XP originated in The Post and has grown into a Software-as-a-Service (SaaS) business used by publishers, broadcasters, and brands to create, host, and monetize engaging content for over 1,500 websites globally.
Photo Center is an Arc XP product that enables customers to store, resize, publish, and deliver, image binaries. Fully built on AWS, Photo Center provisions a dedicated Amazon S3 bucket for each customer to store their content. As their business scaled, Arc XP’s bucket count and costs for data transfer grew commensurately and they sought a more cost efficient architecture to deliver their content with Amazon CloudFront. However, Photo Center’s requirement to maintain hundreds of buckets implied a corresponding number of CloudFront distributions to pair with each origin. That in turn presented a configuration and management challenge. So to solve that problem, Arc XP incorporated an additional AWS edge service, Lambda@Edge.
In this blog, we review the key cost considerations for AWS data transfer as it relates to Amazon S3 and how Amazon CloudFront can help reduce those costs. We describe Photo Center’s before and after architecture to see how Arc XP incorporated CloudFront with Lambda@Edge to optimize their data transfer cost while minimizing configuration overhead. Finally, we will discuss how this solution not only reduced costs but increased Photo Center’s image delivery performance.
Arc XP Photo Center’s initial architecture
Built with Amazon S3, Arc XP’s Photo Center application manages images for clients located across the globe. For compliance, multi-region, and multi-tenancy considerations, using a single Amazon S3 bucket and a prefix-per-client approach was not an optimal design option. Accordingly, Photo Center was designed to provision each client with a dedicated S3 bucket for their content. As an aspect of this initial architecture, Arc XP’s clients retrieved their images via S3 URLs for display within their web applications.
As their business has scaled, Arc XP has assumed the management of hundreds of such dedicated Amazon S3 buckets and, accompanying that business success, they have also experienced commensurate growth in data transfer to the internet. Over time, Arc XP’s S3 Data Transfer Out became a substantial cost factor to the Photo Center application.
Figure 1 illustrates Photo Center’s initial architecture. Each client’s website was served by S3-specific URLs for data downloads from their respective Amazon S3 bucket. Increasing data transfer costs were consequently incurred as these images were downloaded directly from S3 to the internet.
Figure 1: Initial architecture with direct S3 access deployment
From a performance perspective, a sample image in the following S3-based HTTP request was retrieved in 0.106 seconds total time. While this performance was within acceptable limits to Arc XP, we include it here to later compare it with the retrieval time achieved via Photo Center’s revised architecture.
% curl -w "@curl-format.txt" -o /dev/null -s "https://arc***.s3.amazonaws.com/public/SSTVFAY****.jpg" time_namelookup: 0.028715s time_connect: 0.036250s time_appconnect: 0.063145s time_pretransfer: 0.063265s time_redirect: 0.000000s time_starttransfer: 0.096473s ---------- time_total: 0.106519s
AWS data transfer considerations
There are many forms of data transfer associated with the AWS cloud and each has specific purposes and implications. For example, transfer can be within a single Availability Zone (AZ), across AZs in single region, across multiple regions, or out of a region to the internet. It is important to familiarize yourself with AWS Data Transfer pricing models in detail as you design your AWS workloads. Refer to the complete pricing information here.
Some of the key architectural considerations for data transfer costs pertaining to Photo Center’s usage include:
- There is no data transfer charge into an Amazon S3 bucket (data upload) from the internet.
- For the first 100 GB per month, you are not charged for data transferred out to the internet, aggregated across all AWS services and Regions (except China and GovCloud).
- You are not charged for data transferred between S3 buckets in the same AWS Region.
- You are not charged for data transferred from an S3 bucket to any AWS service(s) within the same AWS region as the S3 bucket (including to a different account in the same AWS Region).
- Data transferred out (data download) to the internet is charged but the rate may differ by region and service.
- Data transfer costs from Amazon CloudFront to the internet vary across geographic regions and are based on the edge location through which your content is served.
- Data transferred from Amazon CloudFront to the internet is offered at lower rates than data directly from services like Amazon S3, Amazon Elastic Compute Cloud (EC2), Amazon Relational Database Service (RDS) and others.
- AWS offers free data transfer from all AWS services to Amazon CloudFront, which means that data transferred from an Amazon S3 origin to CloudFront is not charged.
- Requests (PUT, COPY, POST, LIST, GET, SELECT), Lifecycle Transition, and Data Retrieval charges still apply to Amazon S3 data movement.
Arc XP Photo Center’s revised architecture
As Arc XP sought a solution to reduce Photo Center’s data transfer costs, they first looked to Amazon CloudFront, a low-latency, high transfer speed content delivery network (CDN) that optimizes the delivery of data, applications, video, APIs and other content. CloudFront caches content within AWS’s global network of edge locations, making it especially beneficial for applications like Photo Center to enhance performance and end-user experience. CloudFront was an appealing choice for Arc XP to reduce data transfer costs because Data Transfer Out from CloudFront distributions is offered at a lower rate than direct internet transfer from Amazon S3. Additionally, data transfer from S3 to CloudFront is free while its caching further minimizes the request volume costs to the underlying buckets.
While leveraging CloudFront as a cost-reducing edge cache for their customers’ images was an easy choice, having hundreds of individual buckets serving as origins presented another challenge. Arc XP was initially faced with the prospect of having to create a CloudFront distribution per-bucket, which would be complicated and ultimately untenable at their required scale. They instead needed a design to deploy only a limited number of distributions that could efficiently front all of their S3 buckets for their global clients.
Lambda@Edge solved this problem by providing Arc XP with a mechanism to share CloudFront distributions across multiple clients yet in a manner that could facilitate dynamic, code-driven, targeted access to any number of buckets. Lambda@Edge as a global serverless computing service can execute user-defined functions at 400+ AWS edge locations around the world. Combined with CloudFront, it enables developers to build highly-customized, low-latency applications and APIs. With Lambda@Edge functions, developers can inspect and manipulate requests and responses to improve performance, security, add dynamic content generation, and, in Arc XP’s case, achieve automated origin independence for all of their S3 buckets.
Using the Lambda@Edge onClientRequest event handler, Arc XP implemented lightweight code that dynamically inspects and helps CloudFront route each request to the appropriate Amazon S3 bucket for the calling client. With Lambda@Edge now acting as a dynamic origin proxy, Arc XP could aggregate their clients into a limited number of CloudFront distributions to produce a single caching layer over hundreds of S3 buckets.
Figure 2 illustrates the revised Photo Center architecture. Note that with the new solution, Amazon CloudFront is now positioned between the S3 buckets and end client.
Figure 2: Revised architecture with Amazon CloudFront & Lambda@Edge deployment
Following the diagram, Photo Center’s new data flow is:
- A client request for an image flows to CloudFront.
- Each request to CloudFront invokes the Lambda@Edge function.
- The function uses the customer-specific URL context of the request and determines the appropriate Amazon S3 origin path.
- CloudFront uses the dynamically assigned path to retrieve the image, caching it after it does so.
Cost savings are achieved because no data transfer charges are incurred from the Amazon S3 origin to CloudFront. Lower cost transfer rates from CloudFront to the internet are applied. Additionally, CloudFront also caches the image, which subsequently reduces the number of retrievals from the S3 bucket for the same image. This reduces the total cost of S3 GET operations.
Turning to performance, CloudFront improves end-user experience through its caching capability. For Photo Center, CloudFront enhances delivery speed by serving cached client images from edge location closer to end users. Lambda@Edge further enables Photo Center to consolidate multiple origins behind a small number of CloudFront distributions, which results in improved their cache hit ratio. As a result, overall Photo Center performance has notably improved. To demonstrate, the same image from the previous example is now retrieved in a total time of only 0.064 seconds. That represents a 40% improvement over the previous direct-from-S3 time of 0.106 seconds achieved by the initial architecture.
% curl -w "@curl-format.txt" -o /dev/null -s "https://cloudfront-us-***.arcpublishing.com/***/SSTVFAY***.jpg" time_namelookup: 0.009418s time_connect: 0.016141s time_appconnect: 0.038679s time_pretransfer: 0.038954s time_redirect: 0.000000s time_starttransfer: 0.048977s ---------- time_total: 0.064193s
Solution benefits summary
Arc XP has since experienced several important benefits from their revised architecture:
- Photo Center’s Amazon S3 Data Transfer Out costs has been eliminated. Amazon CloudFront data transfer charges now apply instead, resulting in a $500,000 reduction in its annual data transfer costs.
- With CloudFront caching images at the edge, latency of direct retrievals from client S3 buckets have been reduced by 40%.
- Managing the hundreds of S3 bucket origins and CloudFront distributions has been simplified. The solution accommodates multiple buckets fronted by a single CloudFront distribution to provide a single layer of caching for Photo Center image requests.
- Client metering was also simplified because the solution enabled the creation of a single source of access logs.
In this blog, we talked about how Arc XP incorporated multiple AWS edge services into their Photo Center architecture to not only achieve substantial cost saving but also improve end user experience. Arc XP now uses Amazon CloudFront to cache and serve images from hundreds of Amazon S3 buckets within a highly optimized CDN architecture. By further incorporating Lambda@Edge and leveraging its ability to manipulate requests, Arc XP’s Photo Center application automatically maps their clients’ incoming traffic to the appropriate Amazon S3 bucket. Finally, by achieving this origin independence through a code-driven approach, their cost-effective solution is also easy to configure and manage at scale.
To review your architecture and to help optimize your data transfer costs, contact your AWS account team to get started today.