New Quick Start: Build a Data Lake Foundation on the AWS Cloud with Apache Zeppelin, Amazon RDS, and Other AWS Services

Posted on: Aug 2, 2017

This Quick Start deploys a data lake foundation that integrates services and components to help you migrate data to the Amazon Web Services (AWS) Cloud, and store, monitor, and analyze the data.

The deployment uses Amazon Simple Storage Service (Amazon S3) as a core service to store the data. It integrates with additional AWS services such as Amazon Relational Database Service (Amazon RDS), AWS Data Pipeline, Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), Amazon Kinesis Firehose, and AWS CloudTrail. The Quick Start also deploys Apache Zeppelin and Kibana for analyzing and visualizing the data stored in Amazon S3.

The Quick Start supports multiple user scenarios, including:
• Ingestion, storage, and analytics of structured or unstructured datasets
• Integration and analysis of data from disparate sources
• Reduction in analytics costs as the data captured grows exponentially
• Ability to leverage multiple analytic engines and processing frameworks by using the same data stored in Amazon S3 

AWS CloudFormation templates automate the deployment and provide customization options for network resources, instance types, and configuration for AWS services. You can choose to build a new virtual private cloud (VPC) infrastructure that’s configured for security, scalability, and high availability, or use your existing VPC infrastructure for the data lake foundation.

After deployment, you can use the data lake portal provided by the Quick Start to manage your files in the data lake repository, migrate data to Amazon Redshift, monitor real-time streaming data using Amazon Kinesis Firehose, and analyze and explore the data you’ve uploaded in Kibana and Zeppelin.

To get started, use the following resources:
• Learn more about the data lake foundation architecture
• View the deployment guide
• Browse and launch other AWS Quick Start reference deployments

About Quick Starts

Quick Starts are automated reference deployments for key workloads on the AWS Cloud. Each Quick Start launches, configures, and runs the AWS compute, network, storage, and other services required to deploy a specific workload on AWS, using AWS best practices for security and availability. This is the latest in a set of AWS customer-ready solutions, which are ready-to-deploy reference architectures and best practices that address specific use cases or business processes.