Saving money a pod at a time with EKS, Fargate, and AWS Compute Savings Plans
At re:Invent 2019, we announced the ability to deploy Kubernetes pods on AWS Fargate via Amazon Elastic Kubernetes Service (Amazon EKS). Since then we’ve seen customers rapidly adopt the Kubernetes API to deploy pods onto Fargate, the AWS serverless infrastructure for running containers. This allows them to get rid of a lot of the undifferentiated heavy lifting aspects associated with maintaining a Kubernetes cluster such as management, patching, security, isolation, scaling, etc.
If you want to learn more about how Amazon EKS and Fargate work together, watch my re:Invent breakout session.
Until today, AWS Fargate was eligible to AWS Compute Savings Plans but only for tasks launched in the context of ECS. Today, we are announcing that EKS pods launched on Fargate are now eligible for the AWS Compute Savings Plans. You can read the What’s New post here. If you have an active Compute Savings Plan, there is nothing you need to do, the system will apply the proper discounts to your Fargate pods according to the details of your Saving Plan. If you don’t have an active Saving Plan, you can start from the AWS Compute Savings Plans FAQ to get more information about how they can allow you to save money. Customers can save up to 52% on Fargate in exchange for making a commitment to a consistent amount of compute usage for a 1 or 3 year term.
Compute Savings Plans aren’t just about committing to a specific technology. Compute Savings Plans enable flexibility as they allow you to save money regardless of the compute services you opt to use. For instance, you can choose among EC2, Fargate, or Lambda; if you choose Fargate, you are free to use the orchestrator you prefer between ECS and EKS. No matter what you pick, Compute Savings Plans will allow you to benefit from discounted prices.
Compute Savings Plans are one of the most direct and tangible way to reduce and optimize costs through the combination of EKS and Fargate. However, it’s not the only aspect. In this blog post, we outline and recap some considerations about when it makes sense to consider Fargate vs a traditional fleet of EC2 instances.
The acquisition cost of AWS Fargate
We sometimes see customers comparing Fargate compute costs to EC2 compute costs. Let’s run a quick example by comparing on-demand pricing for an m5.large Linux instance (with 2 vCPUs and 8GB of memory) in us-east-1 with comparable Fargate capacity in the same region. As of today, the cost of the EC2 instance above is $0.096/hour. The cost of an equivalent Fargate task in the same region is the result of the following formula: ($0.04048 x 2 vCPU) + ($0.004445 x 8GB) = 0.08096 + 0.03556 = $0.11652/hour.
It must be noted that Fargate uses a fleet of mixed EC2 instance types and their performance may vary compared to the latest generation EC2 instances (such as m5 or c5). So while the example above shows a best case scenario where Fargate only costs 20% more than EC2, the real “value for the money” may be different if this instance generations variable is considered. We suggest customers to test out their specific setup and make their own considerations. We are always updating the compute that powers Fargate underneath and aim to reduce the situations where customers note these performance gaps and inconsistencies.
As we have already mentioned, EKS pods launched on Fargate were not covered by Compute Savings Plans. This resulted in a notable gap between the cost of Fargate and EC2 instances when Compute Savings Plans were applied to EC2 instances. With this announcement, we made Compute Savings Plans available to EKS/Fargate, thus reducing the gap between the cost of Fargate and EC2 regardless of the purchasing model being used.
We believe, however, that a proper comparison approach needs to consider the full picture. With Fargate, we take on a lot of the management burden as well as we absorb a lot of the idle and unused capacity that often customers have with traditional virtual machine clusters. In fact, we address infrastructure related needs for customers in ways that are typically cheaper at the scale we operate than how much they would need to spend at their scale. Serving millions of customers has its benefits.
The hidden costs of operations and the value of AWS Fargate
One of the benefits of using managed services is that customers can save time by not having to perform operations that are considered undifferentiated heavy lifting. Fargate isn’t different; it allows you to NOT think about infrastructure details and focus more on building your applications to drive business outcomes.
Below is a comprehensive, yet not exhaustive, list of things you no longer need to worry about when you opt to use a managed service like Fargate. All these things have a “cost of ownership” associated to them that needs to be taken into account in parallel to the “acquisition cost.”
While containers offer unmatched value in isolating and package application dependencies (i.e. container images), they do not provide the same level of run time isolation and security as virtual machines. The problem with “containers escapes” is that malicious users could gain control of the host and effectively access to all other containers running on the same hosts.
Various techniques are available to mitigate these problems including creating separate hosts enclaves to run particularly sensitive workloads, configure containers to run with non-root users and more. Some of these users are embracing advanced Kubernetes configurations that involves using taints and tolerations or affinity and anti-affinity rules here and they find out the hard way the additional complexity these mechanisms introduce. These mechanisms can introduce more costs in the form of a less optimized infrastructure or operational burdens. Fargate solves this problem at the root by allocating a dedicated right sized virtual machine to run any given pod. At any point in time, no two pods will run on the same VM. With Fargate, you get the packaging and flexibility advantages of using containers with the security advantages of running code within the hard boundaries of a virtual machine. Kubernetes pods running on Fargate do not share the same operating system thus mitigating the problems associated with containers escapes.
In addition, also note that the Fargate ephemeral storage is encrypted by default without the user having to perform any additional configuration.
Generally speaking, Fargate moves the needle of the AWS shared responsibility model. With Fargate, AWS is responsible for delivering security related operational tasks such as updating the operating systems of the virtual machines used to run pods. The Amazon EKS Best Practices Guide for Security is another good source of information that calls out some of the security related differences between using EC2 and Fargate to run Kubernetes pods.
Customers that operate in highly regulated industries spend a lot of time in making sure the stack they are running is compliant. Be it ISO compliance, HIPAA compliance, PCI compliance, or any other type of compliance, the amount of engineering cycles to produce the documentation required is very expensive. One of the many advantages of using managed services such as AWS Fargate is that you can offload that burden to AWS and just point the auditor to the relevant AWS documentation for a particular (compliant) service. The alternative is to use compute primitives (such as Amazon EC2) and invest time and money to make sure that the setup is compliant (and document it properly). At the time of this post, most of the Fargate compliance certifications apply to ECS running on Fargate. We are working to expand that coverage to EKS/Fargate as well and you should check the AWS compliance documentation for latest information.
We introduced EKS managed node groups last year as a way to mitigate the burden of having to manage Kubernetes worker nodes. Managed nodes still run in your AWS account and you are responsible for securing and patching them, even if that is simplified by replacing instances using the updated AMIs that AWS provides. You retain root access to those instances and, while EKS helps with the lifecycle management, they are not considered fully managed by AWS. Unlike with managed nodes, when using Fargate, there is no additional work necessary when using Fargate as you don’t have to be concerned with AMIs or patching underlying host OS. Similarly, with Fargate, you know that every new pod launched will be running on fully patched infrastructure. You do not have to think about which AMI is used on the node running a pod.
Another aspect to consider is that, even when managed, node updates require them to be recycled with a rolling deployment of the new AMI. This has a side effect on the pods because they are sent a termination signal to evacuate the node before it is terminated. This is not often a problem for pure scale out and stateless applications but it may introduce some turbulence for other types of applications as the infrastructure get recycled for the updates. If you consider that CISOs at financial institutions often suggest a 30-day AMI rotation cadence, you can have a sense of how this may become regular burden for these organizations.
Generic K8s worker nodes management
In addition to AMI management, per the section above, generic worker nodes management is something that needs to be considered and its cost accounted for. When using managed nodes and the auto scaling group (ASG), a lot of these efforts are mitigated but yet the Kubernetes ecosystem abounds of tools that are suggested to run on your instances for proper infrastructure operations. One such tool is the node problem detector. This is nothing earth shaking but it piles up on the things you need to do when you are in charge of operating the infrastructure backing your pods.
With Fargate, managing the infrastructure is completely on AWS. The infrastructure receives regular updates. When a pod launches, a brand new virtual machine is provisioned with all the latest software versions and the pod is started on an always up-to-date stack.
Cluster Autoscaler (CA) is a commonly used Kubernetes add-on that is used to scale out and in worker nodes in a cluster based on the load from the pods running in the cluster. CA is very rich in functionality and can be correspondingly complicated to configure appropriately for your scenario. For example, the settings used to determine when nodes should be added to the cluster and when they should be removed greatly impacts the cost of running the cluster. The CA FAQ can give you a hint about how rich and flexible its configuration can be. This is a list of supported parameters you can use to tune the CA behavior.
Another area that needs to be considered is the impact on running pods when CA determines that a scale-in operation should be performed. The pods running on the node that has been targeted for scale-in will be evicted and relaunched on a different node. This specific topic is covered in this FAQ section. Depending on what your pods are doing, this can be disruptive to the tasks, especially if they aren’t completely stateless.
With Fargate, CA isn’t needed, and as a result none of these concerns are relevant. When you use Fargate, each pod is launched on a right-sized VM that has the same lifecycle as the pod itself. There are no nodes involved and hence no need to scale the cluster.
Worker nodes sizing and usable capacity
Your Kubernetes pods usually need to run on a fleet of EC2 instances. In aggregate, these instances determine the total cluster capacity but picking the right instance size isn’t trivial. Also, picking a specific instance size may lead to unbalanced capacity given a single node group only support a single instance type. You can have Cluster Autoscaler work across different node groups to optimize your capacity but this definitely increases the complexity of the cluster setup. With Fargate, each pod runs on a right-sized VM and you only pay for the resources required to run the pod. You don’t have to think about instances sizes, types, or your instance utilization.
Another aspect of node sizing is related to how much capacity is available to pods out of the total amount of the host specifications. If your worker node has 8GB of memory, only a portion of it can be used to run actual applications. For example, there are resources reserved to services running in the operating system itself, resources reserved for the Kubelet and resources reserved for Kubernetes eviction thresholds. In aggregate, all of these “system reservations” alone may account for up to between 10% and 30% of the host resources. In this external article, there are a few examples of this system reserved capacity. This also has ramifications regarding how you need to classify workloads priorities in case something needs to be evicted from the node. This does not mean that the remainder of the resources are fully utilized by the pods because there may be other intrinsic inefficiencies in terms of how many pods you can end up running on a host. With the exception of the resources for the Kubelet, the Fargate capacity you pay for is net compute capacity and doesn’t account for these additional “system reservations.” We will explore more about this later.
In addition to these system reservations by design, a lot of customers tend to artifically over-provision their clusters. They do so for various reasons including fast pods scaling and better high availability by working with the very many options that Cluster Autoscaler makes available. In essence, it is possible to tell CA to keep the size of the cluster artificially inflated in anticipation of some potential future pods being launched. While doing this provides a great level of flexibility, it also incurs into additional infrastructure costs because you are paying for capacity you are not actually using (at least not all the time).
A lot of EKS customers are using multi-tenant clusters. Because of this, being able to repartition costs across internal users (the tenants of the cluster) is very important for the centralized IT team. However, there is disconnect here. The unit of cost of an EKS cluster administrator is the instance type used as the worker node. The unit of cost of an EKS cluster user is the pod(s) they are running. Customers are trying to normalize this by being able to track “who’s using what” and then come up with a chargeback mechanism. This isn’t trivial for a number of reasons. Kubernetes pods are not AWS first class objects (they are Kubernetes objects) and thus it is not possible to use the native AWS cost allocation mechanisms to track pods.
Because of this, customers often use third party tools to track this usage and assemble reporting based on which they manage their chargeback reports. However, these tools need to compensate for the dichotomy above: who’s paying for unused resources? As we learned above, it is very likely that your clusters are never really running at 100% utilization. It’s not uncommon to find clusters that are less than 50% utilized. There is, however, a subset of customers that are spending a lot of time and engineering resources to fine tune these clusters. Even these customers typically do not go beyond 80% utilization. There isn’t a specific science behind these numbers, they are coming from anecdotal evidence talking to customers. Given the dichotomy above, who’s paying for these 50% or 20% of idle resources? Are they being repartitioned across all tenants? Or are they absorbed by the central IT organization that owns the cloud bill?
Using EKS/Fargate allows central IT organizations building multi-tenant clusters to eliminate this dichotomy at the root.
They could proxy the costs of the cloud bill onto their internal users with a 1:1 mapping.
Fargate pod right sizing and reducing waste
While all of the aspects above are important and they contribute heavily to the business case for using Fargate, one of the key elements to consider is how much capacity you are actually using vs. how much capacity you are paying for.
For example, a common misinterpretation when exploring Fargate is that customers compare Fargate pods capacity to traditional worker nodes capacity. As we have seen above, pod capacity is the net capacity your containers can consume whereas worker nodes capacity is the gross amount of CPU and memory you pay for but only a portion of it can be used to effectively run your containers.
We have mentioned above that, in a traditional Kubernetes cluster, depending on your particular requirements and depending on your operational maturity, you are anecdotally wasting, at any point in time, between 20% and 50% of the total compute capacity in your cluster. By using Fargate, you are automatically turning your theoretical capacity usage to (almost) 100% thus avoiding any waste.
This obviously assumes that you are taking fully advantage of the capacity you are requesting for your pods. It goes without saying that, if you end up in a situation where your EKS pods are utilized at 50% while paying for the full pod capacity, using Fargate isn’t going to be more cost effective than using traditional EC2 instances. That is why “right sizing” your Fargate pods is very important for your economics. This is in addition (and complementary) to using AWS Compute Savings Plans.
Let’s discuss in more details how you can right size your Kubernetes pods.
On a traditional Kubernetes cluster worker nodes are the compute atomic unit. These atomic units define the total capacity available to the totality of your pods as well as the cost of your running cluster. In this context, the usage of Kubernetes requests and limits is advised but technically optional. If you set CPU requests and limits and Memory requests and limits on your pods, these requests and limits will define a virtual resource sandbox for pods. If you don’t set these parameters on the pods, all the pods will compete for the total cluster resources available (and specifically for the resources of the worker node they will be scheduled on).
On EKS/Fargate, the resource model is different. Since there are no worker nodes available in the cluster, the pods themselves will size and dictate the underling capacity required. It is important to get this step right for an efficient usage of Fargate. This is why properly configuring the pods is extremely important to achieve good performance and avoid waste.
You can learn more about how we size the Kubernetes pods in a Fargate deployment by referring to this page in the documentation or by watching this section of the EKS/Fargate re:Invent breakout session. The net-net of it is that we read the “requests” in each container and we apply the logic linked above.
It is very important that the size you pick (via explicit requests) is such that fits properly your workload patterns. Ideally, you want to use as much capacity configured as possible. Monitoring the utilization of your pods becomes of paramount importance because you want to make sure they are not underutilized. You can monitor your pods using either tools like Datadog (that supports EKS/Fargate out of the box) or leveraging open source technologies such as Prometheus and Grafana as suggested in this post.
Please note that pods running on EKS/Fargate fully support both the Kubernetes horizontal auto scaler as well as the Kubernetes vertical pod autoscaler. Developers use the former to scale-in and out the number of pods based on workloads while they use the latter to scale up and down a pods to right size the resources assigned to them. They do not work well together so, if you intend to let Kubernetes automate the sizing of your pods, you would need to pick the approach that best fit your consumption patterns keeping in mind that you should strive to get as close as possible to consuming 100% of the configured pod(s) capacity.
In this post, we have introduced a new feature that allows EKS customers to take full advantage of AWS Compute Savings Plans when using Fargate for deploying Kubernetes pods. Previously, only Fargate tasks launched via Amazon ECS were eligible for Compute Savings Plans. We also took the opportunity to talk more generally about the value proposition of Fargate in the context of EKS, specifically, we discussed the ways in which you may be able to reduce total cost of ownership (TCO) when using a serverless approach.
It is also important to understand that Fargate may not be the right answer in every situation. There are many reasons for which customers keep using traditional EC2 instances, including the need for specific hardware support such as GPUs or pick a particular instance type for fine tune the deployment. In addition to this, there are also some EKS deployment considerations when using Fargate that we detail in this documentation page and that may prevent you from being able to leverage it based on your application needs. Check out the documentation page linked above regularly because these considerations will be updated as the EKS and Fargate introduce more features over time.
If you are interested in achieving the benefits discussed in this post and want to start experimenting with EKS and Fargate together, you can start by leveraging our getting started user guide.