VMware Carbon Black cuts workload costs using Amazon EBS gp3 volumes

VMware Carbon Black is a leader in global cybersecurity specializing in endpoint detection, application control, and next-generation antivirus. They currently support over 8,000 customers using Amazon Elastic Kubernetes Service (Amazon EKS) to orchestrate containers in their microservice architecture and Amazon Elastic Block Store (Amazon EBS) for their Amazon EKS data volumes. Using Amazon EBS gp2 volumes, which scale performance with storage capacity, VMware Carbon Black had to allocate larger than required EBS volumes to support their high-performance requirements, resulting in additional cost and storage. By migrating to Amazon EBS gp3 volumes, the latest generation of general purpose solid state drive (SSD)-based storage that allows customers to scale performance independently of storage, VMware Carbon Black was able to reduce their Amazon EKS cluster size by 20%. This saved over $25,000 per month on Amazon Elastic Compute Cloud (Amazon EC2) instance costs and Amazon EBS storage costs.

In this blog, we share the benefits of Amazon EBS gp3 volumes, why VMware Carbon Black decided to migrate to gp3, and how they were able to realize their savings.

Amazon EBS

Since 2014, Amazon EBS has offered an SSD-based general purpose volume type that offers lower latency and higher input/output operations per second (IOPS) performance compared to magnetic or HDD solutions at a lower cost than the Provisioned IOPS family. This storage class was quickly and widely adopted due to its ability to deliver consistent low-latency and high input/output operations per second (IOPS) performance for customers. AWS always continues to innovate on behalf of customers, and we launched the latest generation of the general purpose family, gp3, at re:Invent 2020. Gp3 offers a higher baseline IOPS performance than gp2, a 20% lower storage price than gp2, and gives customers the option to provision performance (IOPS and throughput) and storage independently. With gp3, customers can scale IOPS and throughput without needing to over provision capacity, thereby paying only for the resources they need.

The new gp3 volume type comes with a baseline performance of 3000 IOPS and 125 MiBps regardless of volume size. Customers who need higher performance can scale up to 16,000 IOPS and 1,000 MiBps for an additional fee. Using Elastic Volumes, an existing feature of Amazon EBS, customers can easily and non-disruptively migrate from gp2 to gp3 without stopping their Amazon EC2 instances.

Amazon EKS

Kubernetes is an open-source system for automating the deployment, scaling, and management of containerized applications. Amazon EKS is a managed service that you can use to run Kubernetes on AWS without needing to install, operate, and maintain your own Kubernetes control plane or nodes. With Amazon EKS, you have a choice to build both stateless and stateful containerized workloads. While containers by themselves are ephemeral, Kubernetes supports running stateful workloads by attaching persistent volumes to pods. A pod with a persistent volume attached can store data that can outlive the pod itself. If the pod crashes or terminates, another pod attaches the volume and resumes the work without losing data.

The Amazon EBS Container Storage Interface (CSI) driver provides a CSI interface that allows Amazon EKS clusters to manage the lifecycle of Amazon EBS volumes for persistent volumes. You can configure Kubernetes storageclass to provision any of the Amazon EBS volume types. In addition, the Kubernetes Volume Snapshots feature lets you create a copy of your Amazon EBS volume at a specific point in time. You can use this copy to bring a volume back to a prior state or to provision a new volume. For an in-depth look at how to use volume snapshots, see the blog post “Using EBS Snapshots for persistent storage with your EKS cluster.”

VMware Carbon Black migration from gp2 to gp3

VMware Carbon Black uses a large Amazon EKS cluster as an event forwarder, reading millions of records per second from Amazon Kinesis, sorting and compressing the records, and then writing them to S3 for future analysis and long-term storage. Since this service does not store the data, the Amazon EBS volumes used are small – 40 GB, but need consistent IOPS performance to keep up with the steady workload. They follow infrastructure as code principles to provision and update this cluster, using Launch Templates in AWS CloudFormation to manage worker nodes.

This is an ideal use case for gp3 volumes, which offer the ability to scale performance separately from volume size. The Amazon EKS cluster 40 GB volume’s performance increased from a baseline of 120 IOPS (burstable to 3000 IOPS) and 128 MiB/s throughput on gp2 to a consistent 3000 IOPS and 125 MiB/s throughput on gp3. With gp3, you no longer need to accept limited burstable performance, instead the volumes consistently deliver the performance they are configured to provide. This makes them ideal for steady-state workloads. The migration to gp3 was seamless to the end user as VMware Carbon Black utilized both Elastic Volumes to convert existing gp2 volumes in real-time and updated the volume type to gp3 in their infrastructure code to ensure that future nodes would continue to use gp3 volumes.

Results

By changing to gp3 volumes, VMware Carbon Black reduced their storage costs for the Amazon EKS cluster by 20% but more importantly from a cost-savings perspective, they were also able to reduce the cluster size by 20%. This resulted in an overall Amazon EKS worker cost reduction of over $25,000 per month. When asked about the adoption of gp3, VMware Carbon Black commented:

The GP3 conversion allowed us to take advantage of the improved I/O performance of the GP3 storage class. This service is traditionally I/O bound due to very large stateful sets (800 pods Amazon EKS, stable workload).

The immediate result was a removal of the I/O bottleneck that caused the event-forwarder service to scale up to accommodate workloads. This meant that event-forwarder was able to process the same amount of work with considerably less pods or containers. The result was a scale down event that released approximately 50 c5.4xlarge EC2 instances from the Auto Scaling group.

This means a stable long running service is now costing several thousands of dollars less per month to operate.¹

[1] The VMware Carbon Black Observability and Tooling group enables internal engineering teams to adopt standardized tooling and procedures to deliver consistent and measurable environments, strategic views, and accountability for cloud-based services. The team partners with Finance and Engineering teams along with AWS to identify and drive cost effective and efficient cloud environments without sacrificing capability, cost, and performance.

After enabling gp3 a significant reduction in Amazon EKS worker nodes were observed, as shown in Figure 1. The cluster size scaled down from a steady state of 254 c5.4xlarge instances to 204 c5.4xlarge instances, to achieve savings of over $25,000 per month.

Figure 1 - Amazon EKS nodes count by state

Figure 1: Amazon EKS nodes count by state

Summary

Gp3 volumes allow customers to provision storage and performance independently so customers can right-size storage and performance at a 20% lower storage cost than gp2. VMware Carbon Black was able to right-size their volume performance and decrease their total Amazon EKS cluster size, resulting in a 20% reduction in Amazon EKS nodes and 20% cost savings by migrating to gp3.

The gp3 volume type is available for all AWS Regions. You can access the AWS Management Console to launch your first gp3 volume or convert existing volumes by following this guide. If you have any comments or questions, feel free to leave them in the comments section.