reference deployment

IBM Cloud Pak for Data on AWS

A data and AI platform with data management, governance, and analytics

This Partner Solution deploys IBM Cloud Pak for Data on a Red Hat OpenShift Container Platform cluster to the Amazon Web Services (AWS) Cloud. Cloud Pak for Data is an analytics platform that helps prepare data for artificial intelligence (AI). It enables data engineers, data stewards, data scientists, and business analysts to collaborate using an integrated multicloud platform.

This deployment is for enterprise users who connect, catalog, govern, transform, and analyze data, regardless of location.

IBM logo

This Partner Solution was developed by IBM in collaboration with AWS. IBM is an AWS Partner.


AWS Service Catalog administrators can add this architecture to their own catalog.  

  •  What you'll build
  • The Partner Solution sets up the following:

    • A highly available architecture that spans one or three Availability Zones.*
    • A VPC configured with public and private subnets, according to AWS best practices, to provide you with your own virtual network on AWS.*
    • In the public subnets:
      • A boot node Amazon Elastic Compute Cloud (Amazon EC2) instance that also serves as a bastion host to allow inbound Secure Shell (SSH) access to EC2 instances in the private subnets.
      • Managed network address translation (NAT) gateways to allow outbound internet access for resources in the private subnets.*
    • In the private subnets:
      • OpenShift Container Platform (OCP) master nodes in up to three Availability Zones.
      • OCP compute nodes that combine services by Cloud Pak for Data (Collect, Organize, and Analyze).
      • For container-persistent data, Elastic Block Store disks that are mounted on the compute nodes.
    • A Master Load Balancer, which spans the private subnets, for accessing the OCP compute nodes. This provides web-browser access to Cloud Pak for Data.
    • An Application Load Balancer, which spans the private subnets, for accessing the OCP compute nodes.
    • OpenShift auto scaling for the OCP compute nodes.
    • Amazon Route 53, as your public Domain Name System (DNS), for resolving domain names for the IBM Cloud Pak for Data management console.
    • Amazon S3 for storing the pull secret and deployment logs.

    * The template that deploys the Partner Solution into an existing VPC skips the components marked by asterisks and prompts you for your existing VPC configuration.

  •  How to deploy
  • To deploy IBM Cloud Pak for Data, follow the instructions in the deployment guide. A standard deployment takes about 3 hours, and a high-availability (HA) deployment takes about 4 hours. The deployment process includes these steps:

    1. This Partner Solution requires a Red Hat subscription. During the deployment, provide your OpenShift installer-provisioned infrastructure pull secret. To procure a 60-day evaluation license for OpenShift, follow the instructions at Evaluate Red Hat OpenShift Container Platform.
    2. Subscribe to Cloud Pak for Data. If you don't have a paid entitlement, you can create a 60-day trial subscription key.
    3. Choose a container-storage option.
    4. If you don't already have an AWS account, sign up at https://aws.amazon.com, and sign in to your account.
    5. Launch the Partner Solution by choosing from the following options. Be sure to choose your Region in the toolbar before creating the stack.
    6. (Optional) Edit the application security group.
    7. Test the deployment by using the Cloud Pak for Data web client.
    8. Manage your cluster using the OpenShift Console.
    9. (Optional) Provide boot-node SSH access.

    Amazon may share user-deployment information with the AWS Partner that collaborated with AWS on this solution.  

  •  Costs and licenses
  • You can get started with a 60-day trial or use your existing Cloud Pak for Data entitlement. Additional costs depend on the storage type for your platform. You can choose among three storage classes: Amazon EFS, Portworx, or OCS. EFS costs are charged automatically to your AWS account. You have an option to use a Portworx trial version or an evaluation version of OCS. Fore more information, refer to Portworx or OCS.

    For Cloud Pak for Data pricing information, or to use your existing entitlements, contact your IBM sales representative at +1 (877) 426-3774. For more information about licensing terms, refer to the IBM License Agreement.

    You are responsible for the cost of the AWS services and any third-party licenses used while running this solution. There is no additional cost for using the solution.

    This solution includes configuration parameters that you can customize. Some of these settings, such as instance type, affect the cost of deployment. For cost estimates, refer to the pricing pages for each AWS service you use. Prices are subject to change.

    Tip: After you deploy a solution, create AWS Cost and Usage Reports to track associated costs. These reports deliver billing metrics to an Amazon Simple Storage Service (Amazon S3) bucket in your account. They provide cost estimates based on usage throughout each month and aggregate the data at the end of the month. For more information, refer to What are AWS Cost and Usage Reports?