Deploy on AWS into a new VPC

If you need assistance
To get assistance setting up your data lake, see the consulting offers by APN consulting partners.

datalake_icon_crs

This Quick Start deploys a data lake foundation that integrates various AWS Cloud services and components to help you migrate data to the AWS Cloud, and store, monitor, and analyze the data. 

The deployment uses Amazon Simple Storage Service (Amazon S3) as a core service to store the data, and deploys Apache Zeppelin and Kibana for analyzing and visualizing the data. It also integrates with Amazon Relational Database Service (Amazon RDS), AWS Data Pipeline, Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), Amazon Kinesis Firehose, and AWS CloudTrail. 

This reference architecture is automated by AWS CloudFormation templates that deploy the data lake environment in about 20 minutes. You can customize the templates to meet your specific requirements. For detailed information about the architecture and step-by-step instructions, see the deployment guide.

  • What you'll build

    The Quick Start architecture for the data lake includes the following infrastructure:
     
    • A virtual private cloud (VPC) with multiple public and private subnets across multiple Availability Zones, so that AWS resources can be deployed in highly available configurations.
    • In the public subnets, optional Linux bastion hosts in an Auto Scaling group to provide secure access to Linux instances located in the public and private subnets.
    • In the public subnets, managed NAT gateways to provide outbound Internet connectivity for instances in the private subnets.
    • AWS Identity and Access Management (IAM) roles to enable AWS resources created through the Quick Start to access other AWS resources when required. For example, these IAM roles control access to data in Amazon S3, and enable Amazon Redshift to copy data from Amazon S3 into its tables.

    The Quick Start gives you the option to build a new VPC infrastructure with these components or use your existing VPC infrastructure. Within this infrastructure, the Quick Start deploys:
     
    • In a private subnet, a web server instance (Amazon Machine Image, or AMI) in an Auto Scaling group to host the data lake portal. This web server also installs Apache Zeppelin to run analytics on the data loaded into Amazon S3.
    • In the private subnets, Amazon RDS to enable migrating data from a relational database to Amazon Redshift using AWS Data Pipeline.
    • Integration with Amazon S3 as the core service for storing data.
    • Integration with additional AWS services such as AWS Lambda, Amazon ES with Kibana, Amazon Kinesis Firehose, and Amazon CloudTrail for data analysis.

    For details, see the Quick Start deployment guide.
  • Deployment details

    You can build your data lake environment on AWS in about 20 minutes, by following a few simple steps:

    1. Sign up for an AWS account, if you don't already have one, at https://aws.amazon.com.
    2. Launch the Quick Start into a new VPC, if you want to build a new AWS infrastructure.
      —or—
      Launch the Quick Start into an existing VPC, if you already have your AWS environment set up.
    3. Log in to the data lake portal to test your deployment.
    4. Use the portal to manage your data in Amazon S3 or Kinesis Firehose, check your cloud resources for the data lake, and migrate your data to Amazon Redshift for analysis in Zeppelin or Kibana.


    The Quick Start includes parameters that you can customize. For example, you can change instance types, and configure your settings for Amazon RDS, Amazon Redshift, and Elasticsearch.

    For complete details, see the Quick Start deployment guide.

  • Cost and licenses

    You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using the Quick Start. See the pricing pages for each AWS service you will be using for cost estimates.

    The AWS CloudFormation templates for this Quick Start include configuration parameters that you can customize. Some of these settings, such as instance type, will affect the cost of deployment. See the pricing pages for each AWS service you will be using for cost estimates.

    This Quick Start also deploys the Kibana and Apache Zeppelin open-source software, which are both free of charge.

AWS competency partners offer consulting services to help you quickly discover value from this data lake foundation solution. Follow these links to find more about these partners and their consulting offers, and to request more information or support. We'll be adding offers as we finalize details with partners, so please check back for more options.