Deploy on AWS into a new VPC

See also: If this architecture doesn't meet your specific requirements, see the data lake foundation that uses Apache Zeppelin and Amazon RDS, also available in the Quick Start catalog.

datalake_icon_crs

This Quick Start deploys a data lake foundation that integrates Amazon Web Services (AWS) services such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Kinesis, Amazon Athena, Amazon Elasticsearch Service (Amazon ES), and Amazon QuickSight.

The data lake foundation uses these AWS services to provide data submission, ingest processing, dataset management, data transformation, aggregation, and analysis, search, publishing, and visualization capabilities. Once this foundation is in place, you may choose to augment the data lake with ISV and SaaS tools.

The deployment also includes an optional wizard and a sample dataset that is loaded into Amazon Redshift and Kinesis streams to demonstrate data lake capabilities.

This reference architecture is automated by AWS CloudFormation templates that you can customize to meet your specific requirements. For detailed information about the architecture and step-by-step instructions, see the deployment guide.

  • What you'll build

    The Quick Start architecture for the data lake includes the following infrastructure:
    • A virtual private cloud (VPC) that spans two Availability Zones and includes two public and two private subnets.*
    • An Internet gateway to allow access to the Internet.*
    • In the public subnets, managed NAT gateways to allow outbound Internet access for resources in the private subnets.*
    • In the public subnets, Linux bastion hosts in an Auto Scaling group to allow inbound Secure Shell (SSH) access to EC2 instances in public and private subnets.*
    • In a private subnet, a web application instance that hosts an optional wizard, which guides you through the data lake architecture and functionality.
    • IAM roles to provide permissions to access AWS resources; for example, to permit Amazon Redshift and Amazon Athena to read and write curated datasets.
    • In the private subnets, Amazon Redshift for data aggregation, analysis, transformation, and creation of new curated and published datasets. When you launch the Quick Start with the optional wizard and sample data, Amazon Redshift is launched in a public subnet.
    • Integration with other Amazon services such as Amazon S3, Amazon Athena, AWS Lambda, Amazon ES with Kibana, Amazon Kinesis, and Amazon QuickSight.
    • Your choice to create a new VPC or deploy the data lake components into your existing VPC on AWS. The template that deploys the Quick Start into an existing VPC skips the components marked by asterisks above.

    For details, see the Quick Start deployment guide.
  • Deployment details

    You can build your data lake environment on AWS in about 50 minutes, by following a few simple steps:

    1. Sign up for an AWS account, if you don't already have one, at https://aws.amazon.com.
    2. Launch the Quick Start into a new VPC, if you want to build a new AWS infrastructure.
      —or—
      Launch the Quick Start into an existing VPC, if you already have your AWS environment set up.
    3. Test the deployment by checking the resources created by the Quick Start.
    4. If you've included the wizard and the sample dataset in your deployment, use the wizard to explore data lake features.


    The Quick Start includes parameters that you can customize. For example, you can configure your network or customize the Amazon Redshift, Kinesis, and Elasticsearch settings. You can also extend the sample dataset or use your own dataset.

    For complete details, see the Quick Start deployment guide.

  • Cost and licenses

    You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using the Quick Start. 

    The AWS CloudFormation templates for this Quick Start include configuration parameters that you can customize. Some of these settings, such as instance type, will affect the cost of deployment. See the pricing pages for each AWS service you will be using for cost estimates.

    Because this Quick Start uses AWS-native solution components, there are no costs or license requirements beyond AWS infrastructure costs. This Quick Start also deploys Kibana, which is an open-source tool that’s included with Amazon ES.