reference deployment

Data Lake Foundation on AWS

Using AWS services, including Amazon Redshift, Amazon Kinesis, AWS Glue, and Amazon SageMaker

This solution deploys a data lake foundation that integrates Amazon Web Services (AWS) services such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Kinesis, Amazon Athena, AWS Glue, Amazon Elasticsearch Service (Amazon ES), Amazon SageMaker, and Amazon QuickSight.

The data lake foundation uses these AWS services to provide capabilities such as data submission, ingest processing, dataset management, data transformation and analysis, building and deploying machine learning tools, search, publishing, and visualization. When this foundation is in place, you may choose to augment the data lake with ISV and SaaS tools.

This reference architecture is automated by AWS CloudFormation templates that you can customize to meet your requirements.

This solution was developed by AWS.

  •  What you'll build
  • This solution sets up the following:

    • A virtual private cloud (VPC) that spans two Availability Zones and includes two public and two private subnets.*
    • An internet gateway to allow access to the internet.*
    • In the public subnets, managed NAT gateways to allow outbound Internet access for resources in the private subnets.*
    • In the public subnets, Linux bastion hosts in an Auto Scaling group to allow inbound Secure Shell (SSH) access to EC2 instances in public and private subnets.*
    • AWS Identity and Access Management (IAM) roles to provide permissions to access AWS resources; for example, to permit Amazon Redshift and Amazon Athena to read and write curated datasets.
    • In the private subnets, Amazon Redshift for data aggregation, analysis, transformation, and creation of new curated and published datasets.
    • An Amazon SageMaker instance, which you can access by using AWS authentication.
    • Integration with other Amazon services such as Amazon S3, Amazon Athena, AWS Glue, AWS Lambda, Amazon ES with Kibana, Amazon Kinesis, and Amazon QuickSight.

    * The template that deploys the solution into an existing VPC skips the tasks marked by asterisks and prompts you for your existing VPC configuration.

  •  How to deploy
  • To deploy this solution, follow the instructions in the deployment guide, which includes these steps.

    1. Sign in to your AWS account. If you don’t have an AWS account, sign up at https://aws.amazon.com.
    2. Launch the solution. The stack takes about 50 minutes to deploy. Before you create the stack, choose the AWS Region from the top toolbar. Choose one of the following options:
    3. Test the deployment by checking the resources created by the solution.

    The solution includes parameters that you can customize. For example, you can configure your network or customize the Amazon Redshift, Kinesis, and Elasticsearch settings.  

    Amazon may share user-deployment information with the AWS Partner that collaborated with AWS on this solution.  

  •  Costs and licenses
  • You are responsible for the cost of the AWS services and any third-party licenses used while running this solution. There is no additional cost for using the solution.

    This solution includes configuration parameters that you can customize. Some of these settings, such as instance type, affect the cost of deployment. For cost estimates, refer to the pricing pages for each AWS service you use. Prices are subject to change.

    Tip: After you deploy a solution, create AWS Cost and Usage Reports to track associated costs. These reports deliver billing metrics to an Amazon Simple Storage Service (Amazon S3) bucket in your account. They provide cost estimates based on usage throughout each month and aggregate the data at the end of the month. For more information, refer to What are AWS Cost and Usage Reports?
  •  Resources
  • This solution is related to to one featured in Solution Space that includes a briefing, optional consulting offers crafted by AWS Competency Partners, and AWS co-investment in proof-of-concept (PoC) projects. For more information, refer to Solution Space.