New Quick Start: Build a Data Lake Foundation on the AWS Cloud with AWS Services

Posted on: Sep 8, 2017

This Quick Start deploys a data lake foundation that integrates Amazon Web Services (AWS) Cloud services such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Kinesis, Amazon Athena, Amazon Elasticsearch Service (Amazon ES), and Amazon QuickSight.  

The data lake foundation provides these features:

  • Data submission, including batch submissions to Amazon S3 and streaming submissions via Amazon Kinesis Firehose
  • Ingest processing, including data validation, metadata extraction, and indexing via Amazon S3 events, Amazon Simple Notification Service (Amazon SNS), AWS Lambda, Amazon Kinesis Analytics, and Amazon ES
  • Dataset management through Amazon Redshift transformations and Kinesis Analytics
  • Data transformation, aggregation, and analysis through Amazon Athena and Amazon Redshift Spectrum
  • Search, by indexing metadata in Amazon ES and exposing it through Kibana dashboards
  • Publishing into an S3 bucket for use by visualization tools, and visualization with Amazon QuickSight

Once this foundation is in place, you may choose to augment the data lake with ISV and software as a service (SaaS) tools. 

The deployment also includes an optional wizard and a sample dataset that is loaded into the Amazon Redshift cluster and Kinesis streams. The data lake wizard uses the dataset to demonstrate data lake capabilities such as search, transforms, queries, analytics, and visualization. 

AWS CloudFormation templates automate the deployment and provide customization options for network resources and AWS services. You can choose to build a new virtual private cloud (VPC) infrastructure that’s configured for security, scalability, and high availability, or use your existing VPC infrastructure for the data lake foundation.  

To get started, use the following resources:

About Quick Starts

Quick Starts are automated reference deployments for key workloads on the AWS Cloud. Each Quick Start launches, configures, and runs the AWS compute, network, storage, and other services required to deploy a specific workload on AWS, using AWS best practices for security and availability. This is the latest in a set of AWS customer-ready solutions, which are ready-to-deploy reference architectures and best practices that address specific use cases or business processes.