Networking & Content Delivery

Improve your website availability with Amazon CloudFront

In this blog post, you will learn about the features of Amazon CloudFront that help you avoid unexpected failures and improve website availability.

When using CloudFront with your website, one of the inherent benefits is the ability to cache content. This helps to reduce the load on the origin server (the web server from which CloudFront retrieves the content) and improves content delivery performance. It also increases your website’s availability, which is one of the major benefits of using the service. Although there are many different approaches to improving the availability of your website, such as using Elastic Load Balancing and Multiple Availability Zones if the origin server that hosts the website is within the AWS Cloud, CloudFront brings even higher availability to your website.

Website availability is most commonly impacted by a network failure or server outages, but there are a number of factors that could affect your website’s availability. For example, a website outage might be due to an unexpected hardware failure. While you can mitigate this type of risk by making all components fully redundant, the cost would also increase significantly for each additional redundancy point in a typical on-premises environment. Also, there are risks of overloading the web server and network, causing them to stop working when an unexpected increase in traffic occurs.

In addition, there is a risk of external Distributed Denial of Service (DDoS) attacks impacting a website’s availability. There are various DDoS attack types, but the most common DDoS attack vector is UDP reflection attack that generates large traffic. (You can see our observation here.) An attacker uses multiple resources (for example, distributed groups of bots, such as malware-infected computers, routers, or IoT devices) to attack their target website. These bots generate a large volume of packets and requests to overload the target website. The malicious traffic could overload the server/network capacity, causing the website to stop working.

Improving the availability and response speed is a critical factor in operating your website. Downtime for an ecommerce website directly impacts sales, while downtime for a corporate or product-related website could lead to a bad brand/product image.

In the rest of this blog post, you will be walked through key features for improving your website availability, including origin failover, custom error pages, and how to protect the origin server against a flash crowd and DDoS attacks.

For additional information on how to implement these features, please see the CloudFront Developers Guide.

Origin failover

Amazon CloudFront provides origin server failover. This feature automatically routed to a pre-configured secondary origin server when the primary origin server is unavailable. By using this feature, even if the primary origin server goes down or enters an unstable condition, CloudFront switches to the pre-configured secondary origin to provide consistent user experience. CloudFront failover allows you to switch origin servers by triggering an origin application response code and timeouts. The result is that even if an application error is caused by unexpected factors, your application will continue without interruption.

To use the failover feature, create an origin group and specify the secondary origin so that CloudFront automatically switches to it when it receives an error response with a specific HTTP status code from the primary origin. The two origins in the origin group can be any combination of Amazon S3 buckets or custom origin. CloudFront fails over to the secondary origin — when the HTTP method of the viewer request is GET, HEAD, or OPTIONS — if the primary origin returns one of the specified error status codes, or if the connection to the primary origin fails or experiences a timeout.

Timeout settings that trigger origin failover include the following (June 2020 update now provides extended timeout settings).

  • Origin Connection Attempts: Number of attempts to connect to the origin (can be changed in the range of 1-3 times)
  • Origin Connection Timeout: The amount of time to wait when establishing a connection to the origin (can be changed in the range of 1-10 seconds)
  • Origin Response Timeout: The amount of time to wait for a response from the origin (can be changed in the range of 1-60 seconds. – applies only to custom origins)

For some use cases, like live streaming content, you might want CloudFront to fail over to the secondary origin more quickly, with fewer attempts, such as within the half-life of a video segment duration. For example, if your segments are 4 seconds, set the timeout to 2 seconds with no retries. Or for other use cases, like querying a database that takes long for processing, you might want a longer timeout value. You can adjust those values depending on the nature of your application or origin server.

Origin failover with Amazon Route 53 health checks

If you use three or more custom origin servers by weighting or connect to the secondary custom origin server without triggering CloudFront’s retry or timeout on an error in your origin, there is a way to fail over using Route 53 between CloudFront and the custom origin. Route 53 health checks monitor the health and performance of your origin applications. If a health check determines that the underlying resource is unhealthy, Route 53 routes traffic away from the associated record. In this case, Route 53 is functioning like the CloudFront origin group, and CloudFront will be configured with a single origin server CNAME that Route 53 resolves to the appropriate origin through the health checks. Please note that all custom origin servers need to be set up to respond with the same host header.

Displaying custom error pages

Amazon CloudFront has a feature called “Custom Error Response” that provides alternative content when the origin server is unable to respond. You may want to have a more “branded” or consistent experience when errors occur than simply the default server or client error pages. For example, the following is a customized error message:

If you have only a single custom origin, but you want to provide a consistent user experience when your origin server goes down or enters an unstable condition, this may be a useful solution.

You can do this by preparing custom error pages that suit your website design and by configuring Custom Error Response behavior for each HTTP error status code. Custom Error Response allows you to specify a custom error page with unique content stored in the S3 bucket for each HTTP error status code returned by the origin server. For example, for a “404 Not Found” error, it can return a page indicating the specified content was not found or was deleted, and for 5xx error codes, it can return a page indicating that the server is busy or is temporarily unavailable. Alternatively, you can specify the same custom error page for all status codes or specify custom error pages only for some status codes and not for the others.

You can choose between origin failover and custom error response for each error status code. Also, if you have configured neither origin failover nor custom error pages, CloudFront serves the object as long as it is in the edge cache, even though it has expired. For the duration of the error caching minimum TTL (default: 10 seconds), CloudFront continues to respond to viewer requests by serving the object from the edge cache. You can specify a different error caching minimum TTL value for each error status code. If you want to minimize the caching time for error status code responses, set a small value for the error caching minimum TTL. However, this increases the number of requests relayed to your origin server when it is returning error status codes, so adjust the value according to your application’s or origin server’s resources.

Protecting the origin server against a flash crowd

Amazon CloudFront has connection collapsing functionality to protect against a flash crowd (i.e. an unexpected surge in visitors to a website that requests the same content repeatedly). CloudFront does not relay a large volume of the same requests to the origin server; it only relays the first request to the origin server and uses the response for the subsequent requests. This mitigates the risk of the origin server becoming overloaded and nonfunctional even when a flash crowd occurs when a new product or service is released.

Even with the preceding features, if your backend origin application has limited resources, flash crowd may affect the availability of your service. For example, there is a possibility of receiving a large number of requests that are different from usual, such as a limited time sale on an EC site or an interactive TV game show with a companion app. In this case, you can prevent backend applications from breaking by coordinating requests that forward to a custom origin with Lambda@Edge and directing overflowing users to the Amazon S3 hosted waiting room page. This is similar to the custom error page described earlier, but the difference is that you can control the transfer to the S3 bucket yourself. For example, only premium users (users with a specific cookie) can be forwarded to a custom origin, and other users can control which origin is forwarded by lottery. Please refer to this blog about the details.

Protecting against DDoS attacks

AWS Shield, a DDoS protection service, is enabled by default on Amazon CloudFront and automatically protects against Network/Transport layer DDoS attacks. The automatic protection feature by AWS Shield Standard is available to all AWS customers at no additional cost. Customers can also use AWS WAF (Web Application Firewall) to protect against application layer DDoS attacks. From a protocol perspective, HTTP GET flood, a common application layer DDoS attack, is indistinguishable from legitimate traffic, and your website could be flooded with a large volume of HTTP GET requests and stop working. AWS WAF can mitigate such a large volume of HTTP GET attacks by using rate-based rules. A rate-based rule allows you to define a condition for requests or define a threshold to limit access to your website. If the number of requests that arrive from the same IP address in any five-minute period exceeds the limit you defined, the rule can trigger an action such as blocking requests from the IP address. For more information about rate-based rules, see the Developer Guide. There is a type of DoS attack that targets the application layer called a Slow Attack (e.g. Slowloris), that deliberately reads and writes slowly to occupy TCP sessions for a long period of time in an attempt to exhaust the web server resources. CloudFront automatically closes connections to protect against such attacks.

In addition, AWS Shield Advanced enables you to visualize the status of DDoS attacks or to contact the AWS DDoS Response Team (DRT) for assistance. For more information about the best practices against DDoS attacks, see this whitepaper.

Conclusion

In this blog post, I discussed Amazon CloudFront features that help you improve the availability of your website. Even if your origin server is hosted on-premises and not in the AWS environment, you can still use CloudFront to take advantage of site failover, protect against DDoS attacks, and increase the availability of your website.
Although I focused on improving your website’s availability, the benefits of using CloudFront are not limited to increased availability — there are many other benefits, such as faster website load time and a reduced load on the origin server by using the cache feature. I hope CloudFront will help you maintain the high availability and stable performance of your website.

Yutaka Oka

Yutaka Oka

Yutaka Oka is a Specialist Solutions Architect based in Tokyo. His technical focus areas are CDN, Web Application Firewall and DDoS mitigation.