reference deployment

Databricks on AWS

A collaborative workspace for data science, machine learning, and analytics

This Partner Solution is for IT infrastructure architects, administrators, and DevOps professionals who want to use the Databricks API to create Databricks workspaces on the Amazon Web Services (AWS) Cloud. This Parter Solution creates a new workspace in your AWS account and sets up the environment for deploying more workspaces.

Databricks is a unified data-analytics platform for data engineering, machine learning, and collaborative data science. A Databricks workspace is a software-as-a-service (SaaS) environment for accessing all Databricks assets. The workspace organizes objects (for example, notebooks, libraries, and experiments) into folders and provides access to data and computational resources, such as clusters and jobs.

Important: This AWS Partner Solution deployment requires your Databricks account to be the E2 version of the platform. For more information, contact Databricks.
View related content

This Partner Solution was created by Databricks in collaboration with AWS. Databricks is an AWS Partner.

  •  What you'll build
  • The Partner Solution sets up the following, which constitutes the Databricks workspace:

    • A highly available architecture that spans at least three Availability Zones.
    • A Databricks-managed or customer-managed virtual private cloud (VPC) in the customer's AWS account. This VPC is configured with private subnets and a public subnet, according to AWS best practices, to provide you with your own virtual network on AWS.
    • In the private subnets:
      • Databricks clusters of Amazon Elastic Compute Cloud (Amazon EC2) instances.
      • One or more security groups to enable secure cluster connectivity.
    • In the public subnet:
      • A network address translation (NAT) gateway to allow outbound internet access.
    • Amazon CloudWatch for the Databricks workspace instance logs.
    • (Optional) A customer-managed AWS Key Management Service (AWS KMS) key to encrypt notebooks.
    • An Amazon Simple Storage Service (Amazon S3) bucket to store objects such as cluster logs, notebook revisions, and job results.
    • AWS Security Token Service (AWS STS) to enable you to request temporary, limited-privilege credentials for users to authenticate.
    • A VPC endpoint for access to S3 artifacts and logs.
    • A cross-account AWS Identity and Access Management (IAM) role to enable Databricks to deploy clusters in the VPC for the new workspace. Depending on the deployment option you choose, you either create this IAM role during deployment or use an existing IAM role.
  •  How to deploy
  • To deploy Databricks, follow the instructions in the deployment guide. Databricks needs access to a cross-account IAM role in your AWS account to launch clusters into the VPC of the new workspace. The deployment process, which takes about 15 minutes, includes these steps:

    1. If you don't already have an AWS account, sign up at https://aws.amazon.com, and sign in to your account.
    2. Launch the Partner Solution, choosing from the following options:

    Amazon may share user-deployment information with the AWS Partner that collaborated with AWS on this solution.  

  •  Costs and licenses
  • You are responsible for the cost of the AWS services used while running this Partner Solution. There is no additional cost for using this Partner Solution.

    The AWS CloudFormation template for this Partner Solution includes configuration parameters that you can customize. Some of the settings, such as the instance type, affect the cost of deployment. For cost estimates, refer to the pricing pages for each AWS service you use. Prices are subject to change.

    Tip: After you deploy the Partner Solution, enable the AWS Cost and Usage Report to deliver billing metrics to an Amazon S3 bucket in your account. It provides cost estimates based on usage throughout each month and aggregates the data at the end of the month. For more information, refer to  What are AWS Cost and Usage Reports?

    For Databricks cost estimates, refer to the Databrick pricing page for product tiers and features.

Partner success story
Databricks Simplifies Deployment Using AWS Partner Solution

When Databricks was faced with the challenge of reducing complex configuration steps and time to deployment of Databricks workspaces to the AWS Cloud, it worked with the AWS Integration and Automation team to design an AWS Partner Solution, an automated reference architecture built on AWS CloudFormation templates with integrated best practices. 

Read the full partner reference
Back to top