IBM DataStage for IBM Cloud Pak for Data

IBM DataStage on Cloud Pak for Data is a modern, cloud native, secure data integration solution that enables you to collect, transform, enrich, and deliver data at any scale and complexity. Bring IBM DataStage best in breed parallel engine to run data integration tasks in your AWS account.

4.1

View purchase options

Overview

Try agent mode

Create proposal

Ask question

IBM DataStage on IBM Cloud Pak for Data is a modernized data integration solution to collect and deliver trusted data anywhere, at any scale and complexity, on and across multi cloud and hybrid cloud environments. Save on data movement costs by bringing a best-in-breed parallel engine to where your data is. Increase the productivity of your business and IT users through automated job design and out of the box, native integration with cloud data lakes, real time data sources, relational databases, big data, and NoSQL data stores.

IBM DataStage is the data integration tool of choice for customers across industries.

Design once, run anywhere paradigm allows you to bring data integration to where your data resides.
A best in breed parallel engine processes substantial data volumes, and built in workload balancing supports multi cloud scalability and elasticity.
Pre built functions reduce development time and improve consistency of design and deployment.
Packaged CICD DevOps tooling and out of the box integration with data virtualization, governance, business intelligence, and data science services on IBM Cloud Pak for Data accelerates DataOps.
In flight data quality and security helps ensure trusted data delivery to data lakes.
Automated design templates, backward compatibility, and license cost savings benefit existing DataStage customers.

Highlights

Best-in-breed parallel engine and automated load balancing to process data at scale and maximize throughput for your AWS data lake and data warehouse projects.
Extensive prebuilt connectors to move data between AWS sources, multi-cloud sources and data warehouses, and on-premises sources. Increase developer productivity with hundreds of out-of-the-box, ready-to-use functions, and design and development capabilities.
Design your data integration jobs once and deploy runtime components in your AWS environment to save development costs while eliminating data latencies. Deliver data quickly and in a secure tool.

Details

Sold by

IBM Software

Introducing multi-product solutions

You can now purchase comprehensive solutions tailored to use cases and industries.

Learn more

Explore multi-product solutions

Features and programs

Financing for AWS Marketplace purchases

AWS Marketplace now accepts line of credit payments through the PNC Vendor Finance program. This program is available to select AWS customers in the US, excluding NV, NC, ND, TN, & VT.

View financing details

Pricing

IBM DataStage for IBM Cloud Pak for Data

Info

View purchase options

Pricing is based on the duration and terms of your contract with the vendor. This entitles you to a specified quantity of use for the contract duration. If you choose not to renew or replace your contract before it ends, access to these entitlements will expire.

Additional AWS infrastructure costs may apply. Use the AWS Pricing Calculator to estimate your infrastructure costs.

12-month contract (1)

Info

Dimension	Description	Cost/12 months
IBM DataStage for Cloud Pak for Data	IBM DataStage for Cloud Pak for Data, 6 VPC	$199,440.00

Vendor refund policy

Please contact us with any questions

How can we make this page better?

Tell us how we can improve this page, or report an issue with this product.

Legal

Vendor terms and conditions

Upon subscribing to this product, you must acknowledge and agree to the terms and conditions outlined in the vendor's End User License Agreement (EULA) .

Content disclaimer

Vendors are responsible for their product descriptions and other product content. AWS does not warrant that vendors' product descriptions or other product content are accurate, complete, reliable, current, or error-free.

Usage information

Info

Delivery details

Software as a Service (SaaS)

SaaS delivers cloud-based software applications directly to customers over the internet. You can access these applications through a subscription model. You will pay recurring monthly usage fees through your AWS bill, while AWS handles deployment and infrastructure management, ensuring scalability, reliability, and seamless integration with other AWS services.

Resources

Vendor resources

Gartner: Magic Quadrant for Data Integration Tools

Support

Vendor support

https://www.ibm.com/support/pages/node/795690

Get support

AWS infrastructure support

AWS Support is a one-on-one, fast-response support channel that is staffed 24x7x365 with experienced and technical support engineers. The service helps customers of all sizes and technical abilities to successfully utilize the products and features provided by Amazon Web Services.

Get support

Similar products

IBM Data & AI Software Professional Services

By The Fillmore Group - IBM Db2 Solutions

Since 1987 The Fillmore Group (TFG) has provided IBM data management systems integration, consulting, and training solutions to commercial, government, and not-for-profit clients around the world.

View product

IBM QRadar SIEM v7.5.0 (BYOL)

By IBM Security

IBM QRadar SIEM empowers security analysts and security operations teams with the visibility, automation and insights needed to quickly detect anomalies and uncover advanced threats in real-time.

View product

IBM Verify Identity Access v11

By IBM Security

IBM Verify Identity Access helps you simplify your users' access while more securely adopting web, mobile and cloud technologies.

View product

IBM Granite 3.2 Instruct 8B

By IBM Data and AI

IBM Granite 3.2 Instruct is an open-source model with controllable reasoning, offering strong performance and enhanced complex thinking.

View product

IBM Granite 3.0 8B Instruct

By IBM Data and AI

IBM's Granite 3.0 8B Instruct is an 8B-parameter AI model for enterprise use, excelling in multilingual and code tasks; Apache 2.0 licensed.

View product

IBM Security Guardium Data Protection - Collector

By IBM Security

Safeguard critical, sensitive, or regulated data wherever it resides

View product

Customer reviews

Leave a review

Ratings and reviews

Info

4.1

72 ratings

5 star

4 star

3 star

2 star

1 star

44%

45%

10%

0 AWS reviews

72 external reviews

External reviews are from G2 .

Steve L.

Blazingly Fast, Full-Featured ETL tool with Flexible Data Connections

Reviewed on Apr 22, 2026

Review provided by G2

What do you like best about the product?

DataStage is a full-featured and blazingly fast ETL tool. It handles many different types of data connection, and gives excellent options for parameterising processes to facilitate code promotion.

What do you dislike about the product?

The UI feels dated and for some "Stage" types (most notably "Hierarchical Stages") it can be difficult to understand. There isn't a lot of online assistance from typical forums (fora?) and much of IBMs help is difficult to access as it's hidden behind their login requirements.

What problems is the product solving and how is that benefiting you?

DataStage helps us process huge volumes of data into our Data Warehouse (on a Netezza appliance) on a regular basis. We also use it for many of our system-to-system integrations. It handles many use cases that SSIS had previously struggled with, though this is partly due to being paired with further tooling that wasn't available to us when using SSIS.

Poojasree M.

Unmatched Performance and Reliability for Enterprise Data Workloads

Reviewed on Dec 21, 2025

Review provided by G2

What do you like best about the product?

The most impressive aspect of DataStage is its high-performance parallel processing engine, which allows it to handle massive enterprise data volumes with ease. By utilizing "pipelining" and "partitioning," the system can process different stages of a job simultaneously across multiple CPU nodes. This means that instead of waiting for one task to finish before the next begins, data flows through the pipeline like an assembly line, ensuring that even petabyte-scale workloads are completed within tight processing windows.
Furthermore, its visual design environment offers a sophisticated balance between simplicity and power. The drag-and-drop interface allows engineers to build complex ETL logic using pre-built "Stages" for joins, lookups, and transformations without needing to write manual code. However, it remains highly extensible for developers; if a specific requirement isn't met by a standard component, you can integrate custom Python scripts or SQL, making it flexible enough for both standard reporting and complex data science pipelines.
Finally, DataStage excels in enterprise-grade reliability and governance, which is why it remains a staple in highly regulated industries like finance and healthcare. It integrates seamlessly with metadata catalogs to provide end-to-end data lineage, allowing users to track exactly how data has changed from source to target. Combined with robust error-handling and "Reject Links" that capture bad data without crashing the entire job, it provides a level of stability and auditability that many lightweight or open-source tools struggle to match.

What do you dislike about the product?

One of the most significant drawbacks of IBM DataStage is its prohibitive cost and complex licensing model, which often makes it inaccessible for small-to-medium businesses. Beyond the high initial purchase price, the "IBM Tax" includes ongoing maintenance and specialized infrastructure requirements that scale aggressively with data volume. Furthermore, because the tool is highly proprietary, organizations face heavy vendor lock-in; migrating logic out of DataStage to a modern, open-source-friendly stack like dbt or Airbyte is notoriously difficult and time-consuming.
From a technical standpoint, many engineers find the platform increasingly clunky and "legacy" compared to agile, cloud-native alternatives. While its parallel engine is powerful, it requires deep, specialized expertise to tune—settings like partition methods and buffer sizes are manual and unintuitive, leading to a steep learning curve for new hires. Additionally, while the newer "Next Gen" versions have improved, the ecosystem is still criticized for being batch-heavy, making it less agile for teams that require modern real-time streaming or "DataOps" automation.

What problems is the product solving and how is that benefiting you?

IBM DataStage primarily solves the challenge of data fragmentation and processing bottlenecks in massive enterprise environments. Large organizations often have data trapped in "silos" across legacy mainframes, modern cloud databases, and various third-party applications; DataStage provides a unified, high-performance bridge to extract and harmonize this information. Its parallel processing engine solves the "time problem" by breaking down petabyte-scale datasets into smaller chunks and processing them simultaneously, ensuring that critical business reports and data warehouses are updated within strict overnight windows rather than taking days to complete.
The primary benefit to you and your organization is data trust and operational efficiency. Because the platform includes built-in data quality and governance tools, it automatically cleanses and validates records as they move through the pipeline, reducing the risk of making business decisions based on "dirty" or inaccurate data. Furthermore, its "design once, run anywhere" architecture allows your team to build a data flow once and deploy it across on-premises servers or multiple cloud providers without rewriting code. This saves significant development time and future-proofs your infrastructure, allowing you to focus on gaining insights rather than troubleshooting manual data transfers.

Ivan S.

Exceptional Performance and Connectivity with Intuitive Interface

Reviewed on Dec 03, 2025

Review provided by G2

What do you like best about the product?

Wide Connectivity, High Performance and Scalability, Intuitive Graphical Interface

What do you dislike about the product?

High Learning Curve, Infrastructure Dependency

What problems is the product solving and how is that benefiting you?

Complex data integration, Data transformation and cleaning

Max R.

Data Integration and Quality with DataStage

Reviewed on Jun 18, 2025

Review provided by G2

What do you like best about the product?

Best data integration tool on the market with a wide range of connectors and advanced data integration and quality features.

What do you dislike about the product?

I quite like the platform as a whole, but I believe it can improve regarding data lineage (it should indeed improve now with the arrival of Manta to the IBM portfolio).

What problems is the product solving and how is that benefiting you?

Help our clients work with integrated, qualified, and reliable data.

Banking

IBM Datastage for ETL

Reviewed on Mar 08, 2024

Review provided by G2

What do you like best about the product?

IBM InfoSphere DataStage is simple yet efficient tool for ETL processing.
It has the variety of stages to implement your designs and test the same at runtime.
It has got additional features compared to other ETL tools, which helps in debugging and error handling.

What do you dislike about the product?

Datastage is UI is little at the backseat compared to other ETL tools.
Stages could be categorised based on functionalities.

What problems is the product solving and how is that benefiting you?

It is solving the data integration problems from variety of platforms and provide approciate data formats at the end user.
Like, JSON, Files, txts, DB , amd Bigdata etc

View all reviews