AWS Partner Network (APN) Blog
Enhancing Availability and Amazon EC2 Spot Utilization of Databricks Workloads on AWS
By Kinnar Sen and Dan Kelly, EC2 Flexible Compute Team – AWS
By Venkat Viswanathan, Sr. Solutions Architect Data and Analytics – AWS
By Andrew Wiegand, Sr. Technical Account Manager – AWS
By Piyush Singh, Staff Product Manager – Databricks
Databricks |
Databricks allows customers to run analytics workloads on a simple, open platform which is optimized for cost and performance. Amazon EC2 Spot instances, which are available at up to 90% off the On-Demand prices, can help optimize compute costs as they allow customers to utilize spare Amazon Elastic Compute Cloud (Amazon EC2) capacity while running flexible and fault tolerant workloads.
Databricks fleet clusters enable customers to launch clusters using EC2 Spot instances while adopting EC2 Spot best practices by using diversified node types. This increases a cluster’s chances of getting EC2 Spot instances which in turn further reduces costs.
In this post, we will discuss the new Databricks fleet clusters feature and steps to launch clusters on Amazon Web Services (AWS). Databricks fleet instances were developed for large Databricks workloads on AWS to improve cluster provisioning process, increase EC2 Spot utilization, and drive overall cost optimization.
Maximizing availability and EC2 Spot utilization is key to running cost-optimized Databricks workloads on AWS. Previously, customers launching Databricks clusters would only be able to select a single AWS instance type which could cause availability issues or low Spot utilization when running large Databricks clusters.
Databricks is an AWS Specialization Partner and AWS Marketplace Seller with the Amazon EC2 Spot Ready designation. Databricks allows you to handle all of your data, analytics, and artificial intelligence (AI) on one simple platform.
Databricks Fleet Clusters
Databricks architected fleet clusters on AWS to improve cluster launches by integrating Amazon EC2 Spot best practices. Instead of being tied to a single instance type in a random AWS Availability Zone (AZ), you can now launch a cluster with multiple instance types in the AZ with the best capacity.
A Databricks fleet cluster is composed of fleet instance types defined by Databricks which map to multiple EC2 instance types under the hood. When you create a fleet cluster, multiple predefined instances are evaluated and the instance from the deepest Spot pool with the lowest price are chosen.
There are currently four different fleet instance types:
- General purpose
- General purpose + local disk
- Memory optimized
- Memory optimized + local disk
Databricks is using the price capacity optimized (PCO) allocation strategy to further determine which among the mapped instance types is launched based on cost and availability. The PCO allocation strategy identifies the EC2 Spot instances with the most capacity and lowest price. Databricks uses this strategy to minimize costs for customers: lower prices and fewer evictions result in direct cost and time savings.
Databricks also improved the AutoAZ feature with integration of Spot Placement Score API. When a fleet cluster is created, Databricks immediately selects the AZs which have the best capacity for the fleet type selected. This improves cluster launch latency by avoiding cluster creation attempts in AZs that don’t have enough capacity.
Overall, fleet clusters simplify the cluster creation process as Databricks (and AWS) does the heavy lifting to optimize cost and availability. This reduces the cost of running workloads on Databricks while also improving reliability.
Getting Started
When creating new Databricks clusters, you can select the new fleet instance types. To get started using Databricks fleet clusters, make sure your Databricks IAM role has the right permissions. Remember, when you set permissions with identity and access management (IAM) policies, grant only the specific permissions required to perform specific tasks, also known as least-privilege permissions.
After confirming permissions, enter “fleet” in the worker type or driver type fields, and you should see the new fleet instance types appear.
Figure 1 – Selecting fleet instance types.
If you don’t see the instance types appear as above, make sure your cluster policy is set to “Unrestricted” (that is, don’t set a cluster policy). Cluster policies can restrict the node types available for use in a cluster, which may prevent the fleet types from appearing. Alternatively, you can edit a cluster policy to allow fleet node types.
Here, we’ve selected a worker and driver type of “md-fleet.xlarge”.
Figure 2 – Selecting md-fleet.xlarge.
You can see Databricks checked AWS instance type availability and selected the “m5d.xlarge” instance type, utilizing Spot instances for two of the workers.
Figure 3 – The m5d.xlarge instance type is selected.
Conclusion
In this post, we showed how using Databricks fleet clusters can improve the availability of Amazon EC2 Spot instances and reduce your costs with enhanced Spot utilization on AWS.
Instance type flexibility and diversity is key to successfully running workloads on AWS with EC2 Spot. Databricks fleet clusters unlock the savings from Spot while following best practices so you can avoid the complexities of managing your cloud infrastructure and spend time on what truly matters: your data-driven insights.
By taking advantage of EC2 Spot best practices through Databricks fleet clusters, customers have seen a significant reduction in fallback to On-Demand instances. Customers using Spot without fallback to On-Demand have also seen a reduction in capacity failures when using fleet instance types compared to picking a single instance type.
If you wish to explore this topic further, contact your representative from Databricks or AWS. If you have not yet tried Databricks on AWS, you can begin a free 14-day trial in AWS Marketplace.
Databricks – AWS Partner Spotlight
Databricks is an AWS Partner that allows you to handle all of your data, analytics, and AI on one simple platform.
Contact Databricks | Partner Overview | AWS Marketplace | Case Studies