How Sentra manages data workflows using Amazon EKS, Dagster, and Karpenter to maximize cost-efficiency with minimal operational overhead
By Yael Grossman Sr Compute Specialist Solutions Architect at AWS, Roei Jacobovich Software Engineer at Sentra
In this post, we’ll illustrate how Sentra utilizes Amazon Elastic Kubernetes Service (Amazon EKS), AWS Fargate , EC2 Spot, Karpenter, and an open-source version of Dagster, a cloud-native orchestrator, to run efficient and scalable data processing workloads on AWS. This process maximizes cost-efficiency, while minimizing the operational overhead that might be associated with such optimization.
Sentra is a cloud data security company that enables security teams to gain full visibility and control of their data, as well as to protect against sensitive data breaches across the entire public cloud stack. Sentra reduces the attack surface of critical data by automatically detecting and remediating the highest risks.
Available on AWS Marketplace, Sentra’s SaaS multi-tenant platform continuously discovers and scans hundreds of petabytes of data simultaneously on thousands of cloud accounts. By accurately discovering and classifying sensitive data in their customer’s AWS environment, they are able to provide previously unachievable levels of visibility for security and data teams. Sentra allows every AWS user to understand where their sensitive data is located and how it’s secured. Users can be confident that as data moves throughout their AWS infrastructure, it will be tracked and properly secured. This classification is used, along with security properties of the data, to enforce customizable data security policies as well as out of the box policies to cover privacy, security, and compliance regulations.
Sentra needed a way to deploy and maintain dozens of microservices using Infrastructure-as-Code best practices, with cost optimization and operational excellence in mind. Their platform runs on Amazon EKS with compute capacity managed by AWS Fargate, combined with Amazon EC2 Spot instances, using Karpenter as an autoscaler. Karpenter is an open-source flexible Kubernetes cluster autoscaler that helps take optimal use of Amazon EC2 Spot capacity, with minimal operational overhead. AWS Fargate is a serverless, pay-as-you-go compute engine that lets you build applications without managing servers. Amazon EC2 Spot instances are spare AWS compute capacity offered at up to 90% discount compared to the On-Demand prices. Amazon EC2 Spot instances are reclaimed by Amazon EC2, if it needs the capacity back. Since modern microservices are commonly elastic, fault-tolerant, and stateless, their flexibility allows them to take advantage of the reduced pricing without service interruptions.
Combining these compute options minimizes the operational overhead associated with maintaining an Amazon EKS cluster, while being in control of its spend, using cost effective compute options and minimizing under-utilized resources.
Sentra uses AWS Fargate for its external-facing microservices, and Amazon EC2 instances for processing, service integrations, and alerts. With security as a primary concern, Sentra chose AWS Fargate mainly due to its out of-the-box security isolation capabilities. With AWS Fargate, each Kubernetes pod has its own isolation boundary, and different pods don’t share the underlying kernel, CPU, memory, or networking resources.
The Sentra team chose Dagster, a new open-source big-data orchestration framework, to run efficient, scalable, and sophisticated workflows that require dependency and stage management. Sentra deployed Dagster’s control plane on AWS Fargate. Using Dagster and Amazon EKS, the Sentra team was able to reduce the effort needed to run and maintain their jobs.
The Dagster control plane executes Dagster jobs, also known as DAGs (direct acyclic graphs), which represent dependency trees. A DAG is invoked by a Kubernetes Job that launches a Run Manager. The Run Manager takes care of the orchestration for a specific run. Each step in the graph is executed as a Kubernetes Job and performs a specific task, while together they execute a broader use case. These Kubernetes jobs run on Amazon EKS self-managed nodes. This speeds-up the process runtime, because it allows provisioning of all resources required for a workflow at once and avoids wait time between steps.
The following diagram illustrates how this solution is deployed into Amazon EKS, AWS Fargate, and Amazon EC2 Spot instances:
Dagster allows defining different resources and configurations per step in the workflow, which are translated to Kubernetes requests and limits per container. The resources required for all steps are aggregated and containers are scheduled into shared Kuberentes nodes managed by Karpenter. Karpenter terminates under-utilized nodes, and when a step requires intense CPU or memory that isn’t available for it, this can be met by quickly scaling up. With consolidation enabled, Karpenter looks for opportunities to reschedule workloads onto a set of more cost-efficient Amazon EC2 instances, which is useful for dynamic workloads when there are changes to the resource requirements of the applications. To avoid running the Karpenter controller on a node that it manages, the Karpenter controller runs on AWS Fargate.
The following diagram demonstrates a Dagster graph. Each step is invoked as the previous step completes, allowing to configure complex dependencies between steps. As explained previously, each step runs as a Kubernetes job and can be configured with different resource requirements. In the next graph, most of the steps are configured with an average of 200 milicores and 300 MiB with a limit of 500 milicores and 600 MiB. Some heavy steps require 700 MiB with a limit of 1 GiB memory.
Sentra explored ways to fine-tune their Kubernetes node sizes and types, in order to reduce their overall compute costs. Dagster executors are fault tolerant, and therefore can take advantage of Spot instances in order to get the best pricing offering for the compute needed. One of the Amazon EC2 Spot best practices is to be flexible across a wide range of instance types to increase the chances of getting the aggregate compute capacity needed. Karpenter seamlessly applies the Amazon EC2 Spot best practices, and provisions Amazon EC2 Spot instances from different instance types, sizes and families, using the Price Capacity optimized allocation strategy. Karpenter also takes care of Amazon EC2 Spot interruption handling, and replaces nodes that are identified as going to be interrupted with other available Amazon EC2 Spot capacity pools. This ensures the Sentra team is able to use Spot instances in a reliable way, reduce their compute spend without compromising on availability and reliability. Sentra was able to reduce costs by 57% compared to on-demand instances.
“Amazon EKS unlocked the full capabilities of Dagster, enabling us to spin up a scalable and fault-tolerant orchestration platform at the core of our product.” – Ron Reiter, CTO and Co-Founder, Sentra
In this post, we showed you how Sentra leverages several compute services including Amazon EKS, AWS Fargate, Amazon EC2 Spot, and Karpenter with minimal effort. This allows them to reach their goals of minimal operational overhead while taking advantage of cost-effective infrastructure practices. The scalable infrastructure Sentra built on AWS allows AWS customers of any size to scan and classify their entire environment within a matter of days and in a cost-effective manner. Sentra continues to evaluate how to optimize and improve their usage of AWS services. Sentra is planning to evaluate the use of AWS EMR Serverless for fully server-less Dagster steps to reduce the operational overhead further, by combining different compute options for different workload requirements.