reference deployment

IBM Cloud Pak for Data on AWS

An end-to-end data and AI platform with data management, governance, and analytics capabilities

This Quick Start provides step-by-step instructions for deploying IBM Cloud Pak for Data on a Red Hat OpenShift Container Platform cluster on the AWS Cloud. Cloud Pak for Data is an analytics platform that helps prepare data for artificial intelligence (AI). It enables data engineers, data stewards, data scientists, and business analysts to collaborate using an integrated multicloud platform.

Cloud Pak for Data can use AWS services and features, including virtual private clouds (VPCs), Availability Zones, security groups, Amazon Elastic Block Store (Amazon EBS), Amazon Elastic File System (Amazon EFS), and Elastic Load Balancing to build a reliable and scalable cloud platform.

This deployment is for enterprise users who connect, catalog, govern, transform, and analyze data, regardless of location.

IBM logo

This Quick Start was developed by IBM, in collaboration with AWS. IBM is an
AWS Partner.

  •  What you'll build
  • The Quick Start sets up the following:

    • A highly available architecture that spans one or three Availability Zones.*
    • A VPC configured with public and private subnets, according to AWS best practices, to provide you with your own virtual network on AWS.*
    • In the public subnets:
      • A boot node Amazon Elastic Compute Cloud (Amazon EC2) instance that also serves as a bastion host to allow inbound Secure Shell (SSH) access to EC2 instances in the private subnets.
      • Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.*
    • In the private subnets:
      • OpenShift Container Platform (OCP) master nodes in up to three Availability Zones.
      • OCP compute nodes that combine services by Cloud Pak for Data (Collect, Organize, and Analyze).
      • For container-persistent data, Elastic Block Store disks that are mounted on the compute nodes.
    • A Master Load Balancer, which spans the private subnets, for accessing the OCP compute nodes. This provides web-browser access to Cloud Pak for Data.
    • An Application Load Balancer, which spans the private subnets, for accessing the OCP compute nodes.
    • OpenShift auto scaling for the OCP compute nodes.
    • Amazon Route 53, as your public Domain Name System (DNS), for resolving domain names for the IBM Cloud Pak for Data management console.
    • Amazon S3 for storing the pull secret and deployment logs.

    * The template that deploys the Quick Start into an existing VPC skips the components marked by asterisks and prompts you for your existing VPC configuration.

  •  How to deploy
  • To deploy IBM Cloud Pak for Data on AWS, follow the instructions in the deployment guide. A standard deployment takes about 3 hours, and a high-availability (HA) deployment takes about 4 hours. The deployment process includes these steps:

    1. This Quick Start requires a Red Hat subscription. During the deployment of the Quick Start, provide your OpenShift installer-provisioned infrastructure pull secret. To procure a 60-day evaluation license for OpenShift, follow the instructions at Evaluate Red Hat OpenShift Container Platform.
    2. Subscribe to Cloud Pak for Data. If you don't have a paid entitlement, you can create a 60-day trial subscription key.
    3. Choose a container-storage option.
    4. If you don't already have an AWS account, sign up at, and sign in to your account.
    5. Launch the Quick Start by choosing from the following options. Be sure to choose your Region in the toolbar before creating the stack.
    6. (Optional) Edit the application security group.
    7. Test the deployment by using the Cloud Pak for Data web client.
    8. Manage your cluster using the OpenShift Console.
    9. (Optional) Provide boot-node SSH access.

    Amazon may share user-deployment information with the AWS Partner that collaborated with AWS on this solution.  

  •  Cost and licenses
  • You are responsible for the cost of the AWS services used while running this Quick Start reference deployment. There is no additional cost for using the Quick Start.

    This Quick Start deploys the Cloud Pak for Data environment by using an AWS CloudFormation template, which you can use to build a new VPC for your AWS cluster. The AWS CloudFormation template for this Quick Start includes configuration parameters that you can customize. Some of these settings, such as instance type, affect the cost of deployment. For cost estimates, see the pricing pages for each AWS service you use. Prices are subject to change.

    Tip: After you deploy the Quick Start, we recommend that you enable the AWS Cost and Usage Report to track costs associated with the Quick Start. This report delivers billing metrics to an Amazon Simple Storage Service (Amazon S3) bucket in your account. It provides cost estimates based on usage throughout each month, and finalizes the data at the end of the month. For more information about the report, see the AWS documentation.

    You can get started with a 60-day trial or use your existing Cloud Pak for Data entitlement. Additional costs depend on the storage type for your platform. You can choose among three storage classes: Amazon EFS, Portworx, or OCS. EFS costs are charged automatically to your AWS account. You have an option to use a Portworx trial version or an evaluation version of OCS. Visit Portworx or OCS for licensed versions.

    For Cloud Pak for Data pricing information, or to use your existing entitlements, contact your IBM sales representative at +1 (877) 426-3774. For more information about licensing terms, see the IBM License Agreement.