New Quick Start: Build a hybrid data lake on the AWS Cloud with WANdisco Fusion and AWS services

Posted on: Sep 18, 2017

This Quick Start deploys a hybrid architecture that integrates on-premises Hadoop clusters with a data lake environment on the Amazon Web Services (AWS) Cloud. The deployment takes about 15 minutes and includes WANdisco Fusion, Amazon Simple Storage Service (Amazon S3), and Amazon Athena, and supports cloud migration and burst-out processing scenarios. 

The Quick Start provides the option to deploy a Docker container, which represents your on-premises Hadoop cluster for demonstration purposes, and helps you gain hands-on experience with the hybrid data lake architecture. WANdisco Fusion replicates data from Docker to Amazon S3 continuously, ensuring strong consistency between data residing on premises and data in the cloud. You can use Amazon Athena to analyze and view the data that has been replicated. 

You can also customize the Quick Start to enable a disaster recovery scenario for your on-premises Hadooop cluster, by provisioning an Amazon EMR cluster that references the data replicated into Amazon S3. 

AWS CloudFormation templates automate the deployment and provide customization options for network resources, WANdisco Fusion, and AWS services. You can choose to build a new virtual private cloud (VPC) infrastructure that’s configured for security, scalability, and high availability, or use your existing VPC infrastructure for the hybrid data lake.

To get started, use the following resources:

About Quick Starts
Quick Starts are automated reference deployments for key workloads on the AWS Cloud. Each Quick Start launches, configures, and runs the AWS compute, network, storage, and other services required to deploy a specific workload on AWS, using AWS best practices for security and availability. This is the latest in a set of AWS customer-ready solutions, which are ready-to-deploy reference architectures and best practices that address specific use cases or business processes.