IBM watsonx.data integration now available on AWS Marketplace

Customers building AI and analytics workloads often need to move data across multiple sources, formats, and integration styles. As organizations increasingly embed real-time and agentic workflows across hybrid environments, teams require robust integration functionality, flexibility, portability, and governance.

IBM watsonx.data integration is now available on AWS Marketplace to help customers meet these requirements. It provides a unified control plane for data integration with built-in observability.

In this post, we introduce IBM watsonx.data integration on AWS, describe its capabilities and AWS integrations, and explain how to get started.

What is IBM watsonx.data integration?

IBM watsonx.data integration is a unified data integration control plane designed to scale the delivery of AI-ready data. It offers code-first, low-code, and no-code approaches to accommodate different user preferences and skill levels. It orchestrates data movement across diverse integration styles, including bulk and batch extract, transform, load (ETL) and extract, load, transform (ELT), real-time streaming, data replication, and data observability capabilities across structured or unstructured data.

This unified approach helps reduce fragmented tooling and technical debt as data storage paradigms evolve. Organizations can build adaptable data infrastructure that supports their data strategy, with pipelines that flex to meet changing business requirements.

Organizations across industries use IBM watsonx.data integration to:

Consolidate tools with a unified control plane for batch, streaming, replication, and unstructured data integration
Build reusable pipelines that adapt to changes in data architectures and technologies, avoiding costly rewrites
Optimize pipeline execution for cost, performance, and compliance across hybrid and multi-cloud environments
Deliver real-time data for faster decision-making, supporting use cases such as fraud detection, personalization, and operational responsiveness
Support reliable data delivery with built-in observability, helping teams detect issues early and maintain pipeline health at scale

Capabilities on AWS

IBM watsonx.data integration includes the following capabilities:

Move and transform data with high-performance batch processing

Create batch data flows that extract structured and semi-structured data from multiple sources, transform it, and deliver it to target systems. A high-performance parallel processing engine is designed to help your ETL and ELT jobs deliver data reliably.

Stream real-time data

Create streaming data flows that continuously read, process, and write data as it arrives. Add processors to transform data in flight. A schema-on-read approach identifies and adapts to data drift.

Prepare data for AI with unstructured data integration

Ingest, transform, and enrich unstructured data from sources such as PDFs, HTML files, and markdown documents. Prebuilt operators handle text extraction, PII removal, deduplication, and quality filtering. Chunking and embedding operators prepare data for retrieval-augmented generation (RAG) by populating vector databases.

Note: Unstructured data integration requires IBM watsonx.ai Runtime and IBM watsonx.data, acquired separately. You can optionally add IBM watsonx.data intelligence for unstructured data governance.

Monitor data pipelines with data observability

Create alerts to track DataStage job health. Configure thresholds for job run states, pipeline durations, and data quality deviations. Route alerts to your team through PagerDuty, Slack, Microsoft Teams, or email.

AWS Service Integrations

IBM watsonx.data integration stores data in Amazon Simple Storage Service (Amazon S3). Access is delegated to establish trust between the customer AWS account and the IBM-managed AWS account.

Native connectors are available for the following AWS services:

Amazon S3: Read and write structured and unstructured data in formats such as Avro, CSV, JSON, Parquet, PDF, and DOCX. Also supports Delta Lake and Iceberg table formats.
Amazon Relational Database Service (Amazon RDS): Connect to Amazon RDS for MySQL, PostgreSQL, and Oracle.
Amazon Aurora: Connect to Amazon Aurora for MySQL and Amazon Aurora for PostgreSQL databases.
Amazon Redshift: Connect to your Amazon Redshift data warehouse.
Amazon DynamoDB: Connect to Amazon DynamoDB tables for NoSQL data. You can also read change data from Amazon DynamoDB Streams.
Amazon Kinesis Data Streams: Read streaming data for real-time pipelines.
Amazon CloudWatch: Read observability data.
Amazon Simple Queue Service (Amazon SQS): Read from Amazon SQS queues for message-based ingestion.

For the full list of supported connectors, refer to the IBM watsonx.data integration documentation.

AWS Regional Availability

IBM watsonx.data integration is available in the US East (N. Virginia) and Asia Pacific (Mumbai) AWS Regions.

Note: Capabilities vary by region. For the most current regional capability matrix, refer to the IBM documentation.

Get Started with IBM watsonx.data Integration on AWS

IBM watsonx.data integration is a unified control plane for batch ETL/ELT, real-time streaming, and unstructured data integration with built-in observability on AWS. With native connectors for AWS services and multiple authoring experiences, it helps data teams deliver AI-ready data from a single solution.

To get started, visit the the IBM watsonx.data integration as a Service listing on AWS Marketplace, or contact your AWS representative to learn more.

AWS Marketplace:

Additional Content:

IBM & Red Hat on AWS

IBM watsonx.data integration now available on AWS Marketplace

What is IBM watsonx.data integration?

Capabilities on AWS

AWS Service Integrations

AWS Regional Availability

Get Started with IBM watsonx.data Integration on AWS

Learn

Resources

Developers

Help