AWS Quick Starts — Customer Ready Solutions

Data Lake Foundation on AWS

Using Apache Zeppelin, Amazon RDS, Amazon S3, and other AWS services

This Quick Start deploys a data lake foundation that integrates various AWS Cloud services and components to help you migrate data to the AWS Cloud, and store, monitor, and analyze the data.

The deployment uses Amazon Simple Storage Service (Amazon S3) as a core service to store the data, and deploys Apache Zeppelin and Kibana for analyzing and visualizing the data. It also integrates with Amazon Relational Database Service (Amazon RDS), AWS Data Pipeline, Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), Amazon Kinesis Firehose, and AWS CloudTrail.

This reference architecture is automated by AWS CloudFormation templates that deploy the data lake environment in about 20 minutes. You can customize the templates to meet your specific requirements.



This Quick Start was developed by Cloudwick Technologies Inc. in partnership with AWS. Cloudwick is an APN Partner.

  •  What you'll build
  •  How to deploy
  •  Cost and licenses
  •  Resources
  •  What you'll build
  • The Quick Start architecture for the data lake includes the following infrastructure:

    • A virtual private cloud (VPC) with multiple public and private subnets across multiple Availability Zones, so that AWS resources can be deployed in highly available configurations.
    • In the public subnets, optional Linux bastion hosts in an Auto Scaling group to provide secure access to Linux instances located in the public and private subnets.
    • In the public subnets, managed NAT gateways to provide outbound Internet connectivity for instances in the private subnets.
    • AWS Identity and Access Management (IAM) roles to enable AWS resources created through the Quick Start to access other AWS resources when required. For example, these IAM roles control access to data in Amazon S3, and enable Amazon Redshift to copy data from Amazon S3 into its tables.

    The Quick Start gives you the option to build a new VPC infrastructure with these components or use your existing VPC infrastructure. Within this infrastructure, the Quick Start deploys:

    • In a private subnet, a web server instance (Amazon Machine Image, or AMI) in an Auto Scaling group to host the data lake portal. This web server also installs Apache Zeppelin to run analytics on the data loaded into Amazon S3.
    • In the private subnets, Amazon RDS to enable migrating data from a relational database to Amazon Redshift using AWS Data Pipeline.
    • Integration with Amazon S3 as the core service for storing data.
    • Integration with additional AWS services such as AWS Lambda, Amazon ES with Kibana, Amazon Kinesis Firehose, and Amazon CloudTrail for data analysis.
  •  How to deploy
  • You can build your data lake environment on AWS in about 20 minutes, by following a few simple steps:

    1. If you don't already have an AWS account, sign up at
    2. Launch the Quick Start. You can choose from two options:
    3. Log in to the data lake portal to test your deployment.
    4. Use the portal to manage your data in Amazon S3 or Kinesis Firehose, check your cloud resources for the data lake, and migrate your data to Amazon Redshift for analysis in Zeppelin or Kibana.

    The Quick Start includes parameters that you can customize. For example, you can change instance types, and configure your settings for Amazon RDS, Amazon Redshift, and Elasticsearch.

    Amazon may share user-deployment information with the AWS Partner that collaborated with AWS on the Quick Start.  

  •  Cost and licenses
  • You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using the Quick Start. See the pricing pages for each AWS service you will be using for cost estimates.

    The AWS CloudFormation templates for this Quick Start include configuration parameters that you can customize. Some of these settings, such as instance type, will affect the cost of deployment. See the pricing pages for each AWS service you will be using for cost estimates.

    This Quick Start also deploys the Kibana and Apache Zeppelin open-source software, which are both free of charge.

  •  Resources
  • This Quick Start reference deployment is related to a solution featured in Solution Space that includes a solution brief, optional consulting offers crafted by AWS Competency Partners, and AWS co-investment in proof-of-concept (PoC) projects. To learn more about these resources, visit Solution Space.