New Quick Start: Build a data lake on the AWS Cloud with Talend Big Data Platform and AWS services

Posted on: Nov 7, 2017

This Quick Start automates the design, setup, and configuration of hardware and software to implement a data lake on the Amazon Web Services (AWS) Cloud. The Quick Start provisions Talend Big Data Platform components and AWS services such as Amazon EMR, Amazon Redshift, Amazon Simple Storage Service (Amazon S3), and Amazon Relational Database Service (Amazon RDS) to build a data lake. It also provides an optional sample dataset and Talend jobs developed by Cognizant Technology Solutions to illustrate big data practices for integrating Apache Spark, Apache Hadoop, Amazon EMR, Amazon Redshift, and Amazon S3 technologies into a data lake implementation. 

The Quick Start is for users who are evaluating big data in the cloud or looking to accelerate their big data initiative through the adoption of best practices for big data integration. The Quick Start provides the following features:

  • Enables self-service by provisioning required services and components to build a data lake.
  • Provides flexibility to spin up environments for development, test, and production. 
  • Includes an optional sample dataset and prebuilt Talend Spark jobs that help you explore the architecture and understand the stages of the end-to-end dataflow. 
  • Includes data ingestion, data processing, and data repository features, using Talend and Spark capabilities.
  • Optionally offers the Cognizant ingestion framework, big data validation, and DevOps platform to ingest, validate, and deploy big data solutions. (These features aren’t automated through the Quick Start CloudFormation template.)

The AWS CloudFormation templates that automate the deployment are customizable.  

To get started, use these resources:

About Quick Starts
Quick Starts are automated reference deployments for key workloads on the AWS Cloud. Each Quick Start launches, configures, and runs the AWS compute, network, storage, and other services required to deploy a specific workload on AWS, using AWS best practices for security and availability. This is the latest in a set of AWS customer-ready solutions, which are ready-to-deploy reference architectures and best practices that address specific use cases or business processes. The Quick Start was created by Talend and Cognizant in partnership with AWS.