reference deployment

CDP Public Cloud Partner Solution on AWS—Terraform module

Integrated platform for data analytics, AI functionality, and data governance

This Partner Solution uses a Terraform module to deploy Cloudera Data Platform (CDP) to the Amazon Web Services (AWS) Cloud. CDP is an integrated analytics and data management platform that offers broad data analytics and artificial intelligence functionality along with improved security of user access and data governance features. The Terraform module automates the setup of the required AWS prerequisites and the creation of a CDP Public Cloud environment.

For more information, refer to the CDP Public Cloud documentation.

Cloudera logo

This Partner Solution was developed by Cloudera in collaboration with AWS. Cloudera is an AWS Partner.

  •  What you'll build
  • This Partner Solution sets up the following:

    • A highly available architecture that spans three Availability Zones.
    • A virtual private cloud (VPC) configured with public and private subnets, according to AWS best practices, to provide you with your own virtual network on AWS.*
    • In the public subnets:
      • Managed NAT gateways for outbound traffic.*
      • Network Load Balancers for routing external traffic to endpoints in the public subnets.**
      • One internet gateway per deployment to allow outbound traffic (not shown).*
    • In the private subnets:
      • Managed NAT gateways for outbound traffic.*
      • Network Load Balancers for routing internal traffic to endpoints in the private subnets.**
      • Auto Scaling groups for the following:
        • CDP environment with one scaling group.**
        • Data lake with one scaling group per host group, for a total of five.**
        • (Optional) Data hub with one scaling group per host group, the total number depends on data hub configuration.***
      • Amazon RDS for PostgreSQL multi-AZ database cluster with two instances in an Amazon RDS group, used by the CDP data lake.**
    • Two AWS security groups, as required by CDP (not shown).
    • A cross-account role and an attached cross-account policy providing access to the AWS Cloud account from your CDP Management Console (not shown). 
    • Various AWS Identity and Access Management (IAM) roles, policies, and instance profiles for configuring resource-level permission for cloud storage access and AWS compute services.
    • An Amazon Simple Storage Service (Amazon S3) bucket with three default locations for storing data, table metadata, logs, and audits.

    *If you configure the module to deploy the Partner Solution into an existing VPC, the deployment skips the components marked by a single asterisk.

    **Created by CDP and are not created directly by the Terraform module that deploys this solution. These components are created when called by the CDP provider in the second stage of deploying this solution. The creation requests come from the CDP control plane.

    ***Created by CDP. These components are optional and can be created using the CDP UI.

  •  Costs and licenses
  • To use CDP Public Cloud, you must have a license. For more information, refer to Try CDP Public Cloud.

    You are responsible for the cost of the AWS services and any third-party licenses used while running this solution. There is no additional cost for using the solution.

    This solution includes configuration parameters that you can customize. Some of these settings, such as instance type, affect the cost of deployment. For cost estimates, refer to the pricing pages for each AWS service you use. Prices are subject to change.

    Tip: After you deploy a solution, create AWS Cost and Usage Reports to track associated costs. These reports deliver billing metrics to an Amazon Simple Storage Service (Amazon S3) bucket in your account. They provide cost estimates based on usage throughout each month and aggregate the data at the end of the month. For more information, refer to What are AWS Cost and Usage Reports?

    Amazon may share user-deployment information with the AWS Partner that collaborated with AWS on this solution.