AWS Open Source Blog

Right-size your Kubernetes Applications Using Open Source Goldilocks for Cost Optimization

In the last few years as companies have modernized their business applications, many have moved to microservices based architectures using containers on Kubernetes. A lot of the initial focus was on designing and building new cloud native architectures to support the applications. As environments have grown, we’ve seen a shift in focus to optimize resource allocation and right-size workloads to reduce costs.

In this blog post we will share guidance on how to optimize resource allocation and right-size applications in Kubernetes environments using Goldilocks. We’ll walk through how to install Goldilocks as well as a sample application to view the suggested resource recommendations. This applies to all Kubernetes applications, including those running on Amazon Elastic Kubernetes Service (Amazon EKS), that are deployed with managed node groups, self-managed node groups, and AWS Fargate.

Right-sizing applications on Kubernetes

In Kubernetes, resource right-sizing is done through setting resource specifications in the application manifest. These settings directly impact:

  • Performance — Kubernetes applications running on the same node will arbitrarily compete for resources without proper resource specifications. This can adversely impact application performance.
  • Cost Optimization — Applications deployed with oversized resource specifications will result in increased costs and underutilized infrastructure.
  • Autoscaling — The Kubernetes Cluster Autoscaler and Horizontal Pod Autoscaling require resource specifications to function.

The most common resource specifications in Kubernetes are for CPU and memory requests and limits.

Requests and Limits

Containerized applications are deployed on Kubernetes as Pods. CPU and memory requests and limits are an optional part of the Pod definition. CPU is specified in units of Kubernetes CPUs while memory is specified in bytes, usually as mebibytes (Mi).

Requests and limits each serve different functions in Kubernetes and affect scheduling and resource enforcement differently.

Scheduling

The Kubernetes scheduler only considers requests when determining where to place Pods in your cluster. Acceptable nodes are those that have enough available resources to satisfy the Pod’s resource requests.  Limits are not considered by the scheduler.

Resource Enforcement

The container runtime on the node where your Pods are running is responsible for resource enforcement.  Both requests and limits are factors in ensuring applications have access to their required compute resources. Their effect on CPU and memory is different:

  • CPU — If no limits are specified, then each Pod on a node can use all the available CPU on the host. As soon as available CPU is exhausted, Pods are throttled using a Linux primitive called cgroups. This is a resource sharing primitive that ensures each Pod gets its fair share of CPU time. CPU requests determine that fair share and are weighted to give more CPU time to Pods with larger CPU requests. If a limit is specified then CPU time will not exceed the specific limit.
  • Memory — Just like CPU, if no memory limits are specified, then each Pod can use all the available memory on the host. Unlike CPU, when memory is exhausted, there is no sharing mechanism. The Pod will either be terminated by the Linux Out-of-memory (OOM) killer or the kubelet will evict the Pod. The same process will happen if a Pod’s memory usage exceeds its limit.

Vertical Pod Autoscaler

So how do application owners choose the “right” values for their CPU and memory resource requests? An ideal solution is to load test the application in a development environment and measure resource usage using observability tooling. While that might make sense for your organization’s most critical applications, it’s likely not feasible for every containerized application deployed in your cluster.

Fortunately, there is a Kubernetes project that has a feature specifically designed to help provide resource recommendations — the Vertical Pod Autoscaler (VPA). VPA is a Kubernetes sub-project owned by the Autoscaling special interest group (SIG). It’s designed to automatically set Pod requests based on observed application performance. VPA collects resource usage using the Kubernetes Metric Server by default but can be optionally configured to use Prometheus as a data source.

VPA has a recommendation engine that measures application performance and makes sizing recommendations. The VPA recommendation engine can be deployed stand-alone so VPA will not perform any autoscaling actions. It’s configured by creating a VerticalPodAutoscaler custom resource for each application and VPA updates the object’s status field with resource sizing recommendations.

Creating VerticalPodAutoscaler objects for every application in your cluster and trying to read and interpret the JSON results is challenging at scale. Goldilocks is an open source project that makes this easy.

Goldilocks

Goldilocks is an open source project from Fairwinds that is designed to help organizations get their Kubernetes application resource requests “just right”. It takes its name, very appropriately, from the well known fairly tale Goldilocks and the Three Bears. Goldilocks builds on top of the Kubernetes Vertical Pod Autoscaler and provides:

  • A controller that automates the creation of VerticalPodAutoscaler objects for workloads in your cluster.
  • A dashboard that displays resource recommendations for all the monitored workloads.

The default configuration of Goldilocks is an opt-in model. You choose which workloads are monitored by adding the goldilocks.fairwinds.com/enabled: true label to a namespace.

Solution Overview

Let’s walk through how to install Goldilocks, including its dependencies Metrics Server and Vertical Pod Autoscaler. Then we’ll install a sample application to view the suggested resource recommendations. The diagram shown here illustrates all of the components on an Amazon EKS cluster and their interactions.

components in a Goldilocks namespace

The Metrics Server collects resource metrics from the Kubelet running on worker nodes and exposes them through Metrics API for use by the Vertical Pod Autoscaler. The Goldilocks controller watches for namespaces with the goldilocks.fairwinds.com/enabled: true label and creates VerticalPodAutoscaler objects for each workload in those namespaces.

In this blog post, we will be creating a namespace called javajmx-sample and will be creating a tomcat deployment. We will label this namespace in order to get a recommendation from Goldilocks. As soon as we label the namespace, we will be able to see a VPA object called goldilocks-tomcat-example created.

Prerequisites

You will need the following to complete the steps in this post:

Step 1: Deploying the Metrics Server

In this step, we will be deploying the Metrics server which provides the resource metrics to be used by Vertical Pod Autoscaler.

helm repo add metrics-server https://kubernetes-sigs.github.io/metrics-server

helm upgrade --install metrics-server metrics-server/metrics-server

Let’s verify the status of the metrics-server. Once successfully deployed, you should be able to see the resource utilization of the deployments within seconds:

kubectl top pods  -n kube-system

NAME                     CPU(cores)   MEMORY(bytes)  
aws-node-czlb8           2m           35Mi            
aws-node-fs22v           3m           35Mi            
aws-node-nl4js           2m           60Mi            
aws-node-vth4m           2m           59Mi            
coredns-d5b9bfc4-lbhb7   4m           13Mi            
coredns-d5b9bfc4-ngtf9   4m           14Mi            
kube-proxy-5gq76         1m           12Mi            
kube-proxy-mvp6g         1m           12Mi            
kube-proxy-vxpw9         1m           33Mi            
kube-proxy-zsfs4         1m           34Mi           

Step 2 : Enable namespaces which needs resource recommendation from Goldilocks

We will be deploying sample workloads in the javajmx-sample namespace and we will get the resource recommendation for the applications running on it. Let’s go ahead and create the namespace and label it.

kubectl create ns javajmx-sample
kubectl label ns javajmx-sample goldilocks.fairwinds.com/enabled=true

To ensure the label was applied successfully, run describe on the javajmx-sample namespace

kubectl describe ns javajmx-sample

Name:         javajmx-sample
Labels:       goldilocks.fairwinds.com/enabled=true
              kubernetes.io/metadata.name=javajmx-sample
Annotations:  <none>
Status:       Active

No resource quota.

No LimitRange resource.

Step 3 : Deploy Goldilocks

We will be using a helm chart to deploy Goldilocks. The deployment creates three objects :

  • Goldilocks-controller: responsible for creating the VPA objects for the workloads whose namespace is enabled for a Goldilocks recommendation
  • Goldilocks-vpa-recommender:  responsible for providing the resource recommendations for the workloads
  • Goldilocks-dashboard: summarizes the resource recommendation of the workloads and will also provide the yaml manifest for implementing the recommendation.

To deploy Goldilocks, run the following helm commands:

helm repo add fairwinds-stable https://charts.fairwinds.com/stable
helm upgrade --install goldilocks fairwinds-stable/goldilocks --namespace goldilocks --create-namespace --set vpa.enabled=true

Now, we will use kubectl to verify if the deployment was successful:

NAME                                          READY   STATUS    RESTARTS   AGE
goldilocks-controller-7bc5788596-q752s        1/1     Running   0          18h
goldilocks-dashboard-7ffff8966b-dphmj         1/1     Running   0          18h
goldilocks-dashboard-7ffff8966b-s2dgf         1/1     Running   0          18h
goldilocks-vpa-recommender-5ddf6dcd66-njgt4   1/1     Running   0          18h

Step 4 : Deploy the sample application

In this step, we will be deploying a sample application in the javajmx-sample namespace to get recommendations from Goldilocks. The application tomcat-example  is initially provisioned with a CPU and Memory request of 100m and 180Mi respectively and limits of 300m CPU and 300 Mi Memory.

kubectl apply -f https://raw.githubusercontent.com/aws-observability/aws-o11y-recipes/main/sandbox/javajmx/example/sample-javajmx-app.yaml

nht-admin:~/environment $ kubectl get pods -n javajmx-sample
NAME                              READY   STATUS    RESTARTS   AGE
tomcat-bad-traffic-generator      1/1     Running   0          127m
tomcat-example-5c874c8b8b-zt2tv   1/1     Running   0          127m
tomcat-traffic-generator          1/1     Running   0          127m

As mentioned earlier, Goldilocks will be creating VPAs for each deployment in a Goldilocks enabled namespace. Using the kubectl command, we can verify that a VPA was created in thejavajmx-sample namespace for the goldilocks-tomcat-example:

nht-admin:~/environment $ kubectl get vpa -n javajmx-sample
NAME                        MODE   CPU   MEM         PROVIDED   AGE
goldilocks-tomcat-example   Off    15m   109814751   True       127m

Step 5 : Review the Goldilocks recommendation dashboard

Goldilocks-dashboard will expose the dashboard in the port 8080 and we can access it to get the resource recommendation.  We now run this kubectl command to access the dashboard:

kubectl -n goldilocks port-forward svc/goldilocks-dashboard 8080:80

We can now open a browser to http://localhost:8080 to display the Goldilocks dashboard.

Goldilocks dashboard

Let’s analyze the javajmx-sample namespace to see the recommendations provided by Goldilocks. We should be able to see the recommendations for the goldilocks-tomcat-example deployment.

Goldilocks namespace details

Here the screen shows the request and limit recommendations for the javajmx-sample workload. The Current column under each Quality of Service (QoS) indicates the currently configured CPU and Memory request and limits. The Guaranteed and Burstable column under each QoS indicates the recommended CPU and Memory request limits for the respective QoS.

namespace

We can clearly notice  that we have over provisioned the resources and Goldilocks has made the recommendations to optimize the CPU and Memory request. The recommended level for CPU request and CPU limit is 15m and 15m compared to the current setting of 100m and 300m for Guaranteed QoS.  Memory request and limits are recommended to be 105M and 105M, compared to the current setting of 180Mi and 300 Mi.

Notice that the recommendations are available for two different Quality of Service (QoS) types: Guaranteed and Burstable. Kubernetes provides different levels of Quality of Service to pods depending on what they request and what limits are set for them. Pods that need to stay up and consistently good can request guaranteed resources, while pods with less exacting requirements can use resources with less or no guarantee.

Guaranteed (QoS) pods are considered top priority and are guaranteed to not be killed until they exceed their limits. If limits, and optionally requests, (not equal to 0) are set for all resources across all containers and limits and requests  are equal, then the pod is classified as Guaranteed.

Burstable (QoS) pods have some form of minimal resource guarantee, but can use more resources when available. Under system memory pressure, these containers are more likely to be killed once they exceed their requests and no Best-Effort pods exist. If requests, and optionally limits, are set (not equal to 0) for one or more resources across one or more containers, and they are not equal, then the pod is classified as Burstable.

To follow the recommended resource specification, customers can simply copy  the respective manifest file for the QoS class they are interested in and deploy the workloads which will then be right-sized and optimized.

For example, if we decide to apply the recommendations for the Guaranteed QoS, we could copy the YAML from the dashboard as shown here and apply them to the deployment object:

Let’s run the kubectl edit command to the deployment to apply the recommendations:

kubectl edit deployment tomcat-example -n javajmx-sample

The resources section in the containers spec  shows that we have successfully applied the recommendation of request and limits for CPU, and memory:

Once we apply the recommendations, we should be able to verify that the pod is trying to restart and come online with the updated resource configuration. Let’s verify the same by running the kubectl describe  command on the tomcat-example deployment:

kubectl describe deployment tomcat-example -n javajmx-sample

The output should look like the following:

Name:                   tomcat-example
Namespace:              javajmx-sample
CreationTimestamp:      Mon, 06 Feb 2023 17:41:38 +0000
Labels:                 <none>
Annotations:            deployment.kubernetes.io/revision: 2
Selector:               app=tomcat-example-pods
Replicas:               1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:           RollingUpdate
MinReadySeconds:        0
RollingUpdateStrategy:  25% max unavailable, 25% max surge
Pod Template:
  Labels:  app=tomcat-example-pods
  Containers:
   tomcat-example-pod:
    Image:       public.ecr.aws/u6p4l7a1/sample-java-jmx-app:latest
    Ports:       8080/TCP, 9404/TCP
    Host Ports:  0/TCP, 0/TCP
    Limits:
      cpu:     15m
      memory:  105Mi
    Requests:
      cpu:        15m
      memory:     105Mi
    Environment:  <none>
    Mounts:       <none>
  Volumes:        <none>

Cleanup

To delete the deployments and sample workloads we created in the blog, execute the following commands:

helm delete metrics-server
helm delete goldilocks -n goldilocks
kubectl delete -f https://raw.githubusercontent.com/aws-observability/aws-o11y-recipes/main/sandbox/javajmx/example/sample-javajmx-app.yaml

Conclusion

This post demonstrated how Goldilocks can be used to efficiently rightsize the resource requests for Kubernetes applications. Customers in modernization efforts often have minimal time to decide the resource requirements for their applications, which usually involves a complex process of reviewing monitoring dashboards. By adopting the recommendations from Goldilocks, customers can shorten the time to market for their applications and optimize their Amazon EKS costs.

Further reading

Vikram Venkataraman

Vikram Venkataraman

Vikram Venkataraman is a Principal Solution Architect at Amazon Web Services and also a container enthusiast. He helps organization with best practices for running workloads on AWS. In his spare time, he loves to play with his two kids and follows Cricket.

Aaron Miller

Aaron Miller

Aaron Miller is a Principal Specialist Solutions Architect at Amazon Web Services. He helps customers modernize, scale and adopt best practices for their containerized workloads. Prior to AWS, Aaron worked at Docker and Heptio helping customers move workloads to Kubernetes.