AWS Marketplace

Start using Databricks Data Intelligence Platform with AWS Marketplace

Databricks Data Intelligence Platform in AWS Marketplace is a unified, open analytics platform for building, deploying, sharing, and maintaining enterprise-grade data, analytics, and AI solutions at scale. The Databricks Data Intelligence Platform on Amazon Web Services (AWS) integrates with and manages cloud storage and compute services in the customer’s own AWS account. This makes working with cloud infrastructure easier, without limiting the customizations and control that experienced data, operations, and security teams require.

In this post, we show you how to use the new launch experience in AWS Marketplace to create your own Databricks workspace. We then walk you through a demonstration that runs a classification model to make annual income predictions from census data.

Solution overview

Previously, deploying Databricks on AWS required manual configuration and knowledge of infrastructure provisioning services. Since then, Databricks and AWS have collaborated to create an enhanced version of SaaS Quick Launch for Databricks Data Intelligence Platform in AWS Marketplace, delivering a streamlined deployment experience so customers can quickly deploy and access a Databrick workspace in just minutes.

Databricks Data Intelligence Platform architecture

The Databrick Data Intelligence Platform is a web application and management service that uses Amazon Elastic Compute Cloud (Amazon EC2) resources running in a customer AWS account to execute jobs from customer data sources such as Amazon Simple Storage Service (Amazon S3). The customer’s Databricks infrastructure is fully managed by Databricks through the use of AWS Identity and Access Management (IAM) cross-account roles. For more information about the Databricks Data Intelligence Platform architecture, refer to the Databricks architecture overview.

Solution walkthrough: Launch a Databricks workspace

In the following steps, we show you, at a high level, how to subscribe to the Databricks Data Intelligence Platform in AWS Marketplace. We then use the new streamlined launch experience to create a Databricks workspace.

Step 1 – Start your Databricks Data Intelligence Platform subscription

Start by subscribing to Databricks Data Intelligence Platform in AWS Marketplace. Follow these steps.

  1. Open the Databricks Data Intelligence Platform product detail page and choose Try for free or View purchase options.
  2. On the next screen, choose Subscribe.
  3. Your subscription may take a couple minutes to process. In the meantime, choose Set up your account (Figure 1) to begin the launch process.

Your subscription is in progress. You can now setup your account in your vendor's website.

Figure 1 – Subscription progress message and Set up your account button

Step 2 – Launch a Databricks workspace

In the next screen, you enter the new streamlined experience that will guide you through Databricks authentication, workspace configuration, and launch. Follow these steps.

  1. Choose Create account. You will be redirected to the Databricks account registration page. Follow the on-screen prompts to register with Databricks.
  2. Return to AWS Marketplace and notice the success message (Figure 2) indicating that your Databricks account has been linked. Choose Next.

You have successfully linked a Databricks account

Figure 2 – Databricks account linking confirmation message

  1. In the Configure workspace section, keep the default parameters and check the box under Acknowledgement to confirm that you acknowledge granting permission to Databricks to create resources in your AWS account. Choose Next.
  2. In the Review and launch section, choose Launch product. Over the next few minutes, the network and identity resources necessary to create a Databricks workspace are deployed in your AWS account. When the deployment is complete, you will automatically be redirected to the Databricks Data Intelligences Platform console. Follow the on-screen prompts to access your new Databricks workspace (Figure 3).

Databricks Workspace is being provisioned

Figure 3 – Databricks Workspaces screen

Create a binary classification model in your new Databricks workspace

Binary classification is a machine learning (ML) algorithm that categorizes data into one of two groups or classes. It’s used to predict a binary outcome, which can be either positive or negative. In this post, the binary classification model predicts if the person will make an income greater than $50,000 or less than or equal to $50,000.

In the next steps, you run an example Databricks notebook that demonstrates binary classification using the Spark Machine Learning library (MLlib). You first need to configure compute resources for your workspace before running the notebook. Follow these steps.

Step 1 – Configure compute for your workspace

To configure compute for your workspace, follow these steps.

  1. Once the new workspace status shows Ready, choose Open to open your workspace.
  2. Select Compute and then choose Create compute.
  3. Configure the desired compute. The default selections are sufficient for this use case. However, you can choose to have the cluster terminate after 10 minutes of idle to avoid unnecessary costs.
  4. Start the compute and wait for the cluster to show in a running state.

Step 2 – Run the binary classification example Databricks notebook

To run the binary classification example Databricks notebook, follow these steps.

  1. On the workspace menu, choose a folder to run the notebook in. You can use the home folder or create a new folder. Refer to the Organize workspace objects into folders documentation for more information.
  2. Right click on the folder and choose Import. Select Import from URL and provide the following URL: https://docs.databricks.com/en/_extras/notebooks/source/binary-classification.html
  3. With your notebook open, choose the Connect button and select the compute resource you created in the previous step.
  4. Choose Run all to run the notebook.

Follow the notebook for an introduction on binary classification algorithms and use them to make predictions on income from a sample dataset.

Conclusion

In this post, we showed you how straightforward it is to use the Databricks Data Intelligence Platform to create a SparkMLLib based binary classification model to make predictions. Getting started with Databricks workspaces is just a few clicks away when you subscribe and launch using AWS Marketplace. For more information, visit Get started with Databricks.


About the authors

Leno Piperi Leno Piperi is a Specialist Solutions Architect supporting AWS Marketplace. His professional interests include cloud governance and serverless computing on AWS. When he’s not in the office delighting customers, you’ll find him skiing or at the Home Depot.
Venkat Viswanathan Venkatavaradhan (Venkat) Viswanathan is a Global Partner Solutions Architect at Amazon Web Services. Venkat is a Technology Strategy Leader in data, AI, ML, generative AI, and advanced analytics. Venkat is a Global SME for Databricks and helps AWS customers design, build, secure, and optimize Databricks workloads on AWS.
Sabha Parameswaran Sabha Parameswaran is a Senior Solutions Architect at AWS with over 20 years of deep experience in enterprise application integration, microservices, containers and distributed systems performance tuning, prototyping, and more. He is based out of the San Francisco Bay Area. At AWS, he is focused on helping customers in their cloud journey and is also actively involved in microservices and serverless-based architecture and frameworks.