Deploy on AWS into a new VPC

or deploy into your existing VPC
(deployment requires AMI subscription; see guide)


This reference architecture is automated by AWS CloudFormation templates that you can customize to meet your specific requirements. For detailed information about the architecture and step-by-step instructions, see the deployment guide.

See also: If this architecture doesn't meet your specific requirements, see the other data lake deployments in the Quick Start catalog.

datalake_icon_crs

This Quick Start deploys a hybrid cloud environment that integrates on-premises Hadoop clusters with a data lake on the Amazon Web Services (AWS) Cloud. The deployment includes WANdisco Fusion, Amazon Simple Storage Service (Amazon S3), and Amazon Athena, and supports cloud migration and burst-out processing scenarios.

The Quick Start provides the option to deploy a Docker container, which represents your on-premises Hadoop cluster for demonstration purposes, and helps you gain hands-on experience with the hybrid data lake architecture. WANdisco Fusion replicates data from Docker to Amazon S3 continuously, ensuring strong consistency between data residing on premises and data in the cloud. You can use Amazon Athena to analyze and view the data that has been replicated.

 

  • What you'll build

    The Quick Start architecture for the hybrid data lake includes the following:
     
    • A virtual private cloud (VPC) that spans two Availability Zones and includes two public subnets.*
    • An Internet gateway to provide access to the Internet.*
    • In the public subnets, WANdisco Fusion server instances in an Auto Scaling group, functioning as a single clustered service. 
    • (Optional) An on-premises WANdisco server deployed in a Docker container, to demonstrate the synchronization from HDFS to the S3 bucket in the cloud. The Quick Start uses a sample open dataset consisting of publicly available NYC taxi data.
    • (Optional) Amazon Athena to query and analyze the data from the local WANdisco Fusion server, which is synchronized with Amazon S3.
    • (Optional) An S3 bucket to store the content that is being synchronized by WANdisco Fusion and the analysis information processed by Athena.


    * The template that deploys the Quick Start into an existing VPC skips the components marked by asterisks.

    For details, see the Quick Start deployment guide.
  • Deployment details

    You can build your data lake environment on AWS in about 15 minutes, by following a few simple steps:

    1. Sign up for an AWS account, if you don't already have one, at https://aws.amazon.com.
    2. Subscribe to the Amazon Machine Image (AMI) for WANdisco Fusion in AWS Marketplace.
    3. Launch the Quick Start into a new VPC, if you want to build a new AWS infrastructure.
      —or—
      Launch the Quick Start into an existing VPC, if you already have your AWS environment set up.
    4. (Optional) Deploy an on-premises WANdisco server in a Docker container and set up replication to see the synchronization ability from HDFS to Amazon S3.


    The Quick Start includes parameters that you can customize. For example, you can configure your network or customize the WANdisco Fusion and Amazon Athena settings. 

    For complete details, see the Quick Start deployment guide.

  • Cost and licenses

    You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using the Quick Start. 

    The AWS CloudFormation templates for this Quick Start include configuration parameters that you can customize. Some of these settings, such as instance type, will affect the cost of deployment. See the pricing pages for each AWS service you will be using for cost estimates.

    The Quick Start requires a subscription to the WANdisco Fusion AMI in the AWS Marketplace. The WANdisco Fusion software is provided with the Bring Your Own License model. If no license is provided, the Quick Start will configure the application with a trial key. To continue using WANdisco Fusion beyond the 14-day trial period, you must purchase a license by contacting WANdisco at http://www.wandisco.com/contact.