Deploy on AWS into a new VPC

or deploy into your existing VPC
(deployment requires a Qubole account; see guide)

This reference architecture is automated by AWS CloudFormation templates that you can customize to meet your specific requirements. For detailed information about the architecture and step-by-step instructions, see the deployment guide.

See also: If this architecture doesn't meet your specific requirements, see the other data lake deployments in the Quick Start catalog.

If you need assistance
To get assistance setting up your data lake, see the consulting offers by APN consulting partners.


This Quick Start configures a production-ready Qubole Data Service (QDS) environment that is built on a data lake foundation in the AWS Cloud. You can use this Qubole environment to process and analyze your own datasets, and extend it for your specific use cases. The Quick Start also deploys an optional environment with prepopulated data, notebooks, and queries to analyze structured and semi-structured data, in order to gain key business insights into product sales performance.

QDS is a cloud-native, autonomous data platform for analyzing and processing big data. Qubole self-manages and constantly analyzes and learns about the platform’s usage through a combination of heuristics and machine learning, and provides insights and recommendations to optimize reliability, performance, and costs. Qubole works in concert with AWS services such as Amazon Simple Storage Service (Amazon S3), Amazon Elastic Compute Cloud (Amazon EC2), and Amazon Redshift.

This Quick Start uses a data lake foundation built with AWS services for the QDS deployment, to enable users to take advantage of additional AWS big data services such as Amazon Kinesis. (Read more about the underlying data lake foundation.)

  • What you'll build

    The Quick Start adds the following components and capabilities to the underlying data lake environment:
    • Standard VPC and Linux bastion infrastructure, which is extended to support communications between instances in the private subnets and Qubole SaaS, and to provide access to the metastore within Qubole SaaS.
    • Preconfigured Apache Spark and Hadoop clusters. These clusters are managed by Qubole and are automatically started and scaled depending on the user’s workloads.
    • Preconfigured data sources that provide access to Amazon Relational Database Service (Amazon RDS), Amazon Redshift, and S3 buckets in the data lake.
    • Preconfigured Qubole metastore, notebooks, and queries to show business insights.
    • A basic wizard that helps you with Qubole account creation and data source installation, introduces features, and provides examples.
    • Data analysis and visualization, using Qubole’s Analyze and Notebooks interfaces.

    For details, see the Quick Start deployment guide.
  • Deployment details

    You can build the Qubole environment on AWS in about 50 minutes, by following these steps:

    1. Sign up for an AWS account, if you don't already have one, at
    2. Create a Qubole account.
    3. Get a Qubole ID token, AWS account ID, and external ID.
    4. Launch the Quick Start into a new VPC, if you want to build a new AWS infrastructure.
      Launch the Quick Start into an existing VPC, if you already have your AWS environment set up.
      Each deployment takes about 50 minutes.
    5. Finish the Qubole configuration.
    6. Test the deployment.

    The Quick Start includes parameters that you can customize. For example, you can configure your network or customize the settings for Qubole, Hadoop, Spark, and AWS services. 

    For complete details, see the Quick Start deployment guide.

  • Cost and licenses

    You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. The AWS CloudFormation templates for this Quick Start include configuration parameters that you can customize. Some of these settings, such as instance type, will affect the cost of deployment. See the pricing pages for each AWS service you will be using for cost estimates.

    The Quick Start deploys QDS Business Edition, which allows you to consume up to 10,000 Qubole Compute Usage Hours (QCUH) per month at no cost. However, you are responsible for the cost of AWS resources that Qubole manages on your behalf. To learn more about QDS Business Edition, see the Qubole FAQ.

    After you deploy the Quick Start, you can upgrade to QDS Enterprise Edition and use Qubole Cloud Agents, which provide actionable Alerts, Insights, and Recommendations (AIR) to optimize reliability, performance, and costs. To upgrade your license to QDS Enterprise Edition, see the Enterprise Edition upgrade webpage on the Qubole website.

AWS big data competency partners offer consulting services to help you quickly discover value from this data lake solution. Follow these links to find more about these partners and their consulting offers, and to request more information or support.