Networking & Content Delivery

Lowering Latency by Moving OPTIONS to the Edge

At IMDb, we run a Federated GraphQL Gateway on AWS Lambda that backs our website and apps and handles over 10,000 peak TPS. For more information about how we built that, see our three posts: building GraphQL on Lambda, managing federated schemas, and monitoring and tuning. As our website adds more features that call GraphQL directly from the client’s browser, we’ve seen OPTIONS preflight calls for CORS requests grow to approximately 25% of our total traffic. CORS is a security measure on JavaScript requests to make sure that the GraphQL domain allows access from the website’s domain.

Sample ping time chart

Figure 1

When investigating the request flow, we were surprised to learn that OPTIONS requests negatively impacted our customer experience. These calls were causing an additional round-trip to our Gateway before every request from their browser. Most of IMDb’s production systems are currently in the one AWS region. However, our customers are all over the world. Even though we use Amazon CloudFront to terminate connections closer to the customer, and to utilize fast inter-region connections, we’re still limited by the speed of light.

Looking at Figure 1, the chart of ping times between other AWS Regions and our Gateway in us-east-1, a customer near London will see an additional ~80ms latency with every OPTIONS call. A customer near Mumbai is even more impacted, seeing an additional ~190ms of latency. We can’t make light go any faster, and full regionalization of our stack is a large undertaking. Therefore, we wondered if we could solve the OPTIONS latency in a different way.

After an initial change to add a max-age to the call only lowered us from ~2,400 to ~1,500 peak TPS, we decided we needed to consider more innovative options.

CloudFront Functions provided our path forward. We were able to take our TypeScript code that was handling our OPTIONS calls, convert it to vanilla JavaScript for CloudFront, and then make some minor changes in how we read and write headers to meet CloudFront’s APIs. This lets the edge construct and deliver our HTTP response without consulting our Gateway back in us-east-1.

We made a new corsfunctionCode.js file so that we could unit test the code in isolation:

var ORIGIN_PATTERN = /^.*\.((your|valid|domains)\.com)(:[0-9]+)?$/

function copyAccessControlHeaderIfPresent(req, res, headerIn, headerOut) {
	var inExpanded = 'access-control-request-' + headerIn
	var outExpanded = 'access-control-allow-' + headerOut
	if (req[inExpanded]) {
		res[outExpanded] = req[inExpanded]
	}
}

function handler(event) {
	if (event.request.method === 'OPTIONS') {
		var reqHeaders = event.request.headers
		var respHeaders = { }
		if (reqHeaders.origin && ORIGIN_PATTERN.test(reqHeaders.origin.value)) {
			// We have an origin, and it's one that we allow - add the other CORS headers
			respHeaders['access-control-allow-credentials'] = { value: 'true' }
			respHeaders['access-control-allow-origin'] = reqHeaders.origin
			respHeaders['access-control-max-age'] = { value: '600' }
            respHeaders['vary'] = { value: 'Origin' }
			// Let the client have whatever headers/methods they asked for
			copyAccessControlHeaderIfPresent(reqHeaders, respHeaders, 'headers', 'headers')
			copyAccessControlHeaderIfPresent(reqHeaders, respHeaders, 'method', 'methods')
		}
		// Since we're handling the response completely, return a response object
		return {
			statusCode: 204,
			statusDescription: 'OK',
			headers: respHeaders,
		}
	} else {
		// We're not handling this response, return a request object
		return event.request
	}
}

Although we could copy and paste into the AWS Management Console, we prefer to keep our Infrastructure-as-Code (IaC) in a TypeScript CDK package. Therefore, we load the corsFunctionCode.js into a string, turn it into a CloudFront Function, and then attach it to our functionAssociations for our CloudFront Distribution. And that’s it! It worked like a charm.

export function corsFunction(scope: Construct) {
	const corsFunctionCode = readFileSync('./lib/corsFunctionCode.js', 'utf-8');
	return new Function(scope, 'CorsFunction', {
		code: FunctionCode.fromInline(corsFunctionCode)
	});
}

const cachingDistribution = new Distribution(this, 'YourDistroName', {
	...,
	defaultBehavior: {
	...,
	functionAssociations: [
	{
		eventType: FunctionEventType.VIEWER_REQUEST,
		function: corsFunction(this)
		}
	],
	...,
	}
}

Figure 2 is a graph of our TPS of just OPTIONS calls for CORS requests. The first drop on March 2nd was the addition of the max-age header to the response. We run two CDN endpoints: one that caches and one that doesn’t. We rolled them out separately on March 17th and 31st, causing the second drop and the final lowering to zero.

Latency graph of preflight request volumes

Figure 2

As this rolled out, we wanted to make sure that customers were actually seeing benefits from this change. To accurately measure the customer experience, we rely on client-side latency reporting to verify the impact of our changes. For all of our customers, we did see a small overall change in latency between Mar 17 and 18, shown in Figure 3 on one of our client-side metrics on our heaviest page. It took approximately 50ms off of our trough latency, and made our peak latency more stable.

Click to Body Begin graph

Figure 3

However, if we zoom in to just customers in India for Figure 4, then the change was larger and easier to see. We saw an almost exactly 200ms reduction in trough latency for our customers who are farthest away from us. That is exactly what we expected, since the ping time between data centers was just under 190ms. We also saw a similar, but larger, stabilization in peak latency.

Click to Body Begin graph for India

Figure 4

Taking a 10-12 second metric and shaving 200ms off might not seem like a lot, but every bit counts when it comes to latency. The latency spikiness for our customers on the slowest connections and lowest power mobile devices was reduced even more, giving them a smoother browsing experience. A simple change shaved time off of every single OPTIONS call that we’ll ever get, and offloaded undifferentiated work from our system to CloudFront. It was a large overall win for a small amount of investment, and a low amount of added system complexity.

Give it a try! At IMDb, we’ve doubled down on moving logic like this to the edge, and continue to reap the benefits. I recommend that you do, too!

About the Author

Jeff Abshire

Jeff Abshire

Jeff Abshire is a Senior Engineer at IMDb, an Amazon Company

Thanks to Luke Xu for reviewing this blog post.