Analytics on AWS

Data integration with AWS

Connect to and act on all your data, no matter where it lives

Get started with data integration on AWS

Overview

The most impactful data-driven insights come from getting a full picture of your business and customers. You can only achieve this when you connect data from all your data sources across multiple departments, services, on-premises tools, and third-party applications.

Data integration with AWS makes it easy to connect to and act on all your data, no matter where it lives. With AWS data integration capabilities, you can bring together data from multiple sources; you can transform, operationalize, and manage data to deliver high quality data across your data lakes and data warehouses.

Watch the re:Invent 2023 session on new releases

AWS Announces Four Zero-ETL Integrations to Make Data Access and Analysis Faster and Easier Across Data Stores

Circular diagram illustrating the data integration lifecycle, with four colored sections labeled Manage, Connect, Transform, and Operationalize.

Benefits of Data integration with AWS

Your data comes from many sources in a variety of formats—third-party hosted applications, on-premises data stores, and operational data stores. AWS services connect to hundreds of data sources including third-party Software-as-a-Service (SaaS), on premises, and other clouds. Once you’ve connected or moved data into data lakes, warehouses and databases, you can make it securely available across your entire organization.

AWS Glue, one of the many AWS data integration services, consolidates major data integration capabilities into one place including data discovery, extract, transform, and load (ETL), cleansing, transforming, and centralized cataloging. It is serverless and can automatically provision and manage workers as needed.

AWS provides tools that meet the needs of data engineers, ETL developers, and business analysts, helping users at all technical levels to interactively explore and work with their data. You can visually transform data with a drag-and-drop interface in AWS Glue Studio, clean and normalize data with data preparation tool AWS Glue DataBrew, and test data using your preferred integrated development environment (IDE) or notebook.

Oftentimes organizations must support a variety of data processing frameworks, like ETL, reverse ETL, and extract, load, and transform (ELT), as well as, different workloads like batch, micro-batch, and streaming. AWS provides flexible support for all frameworks and workloads, and enables portability by leveraging open-source standards.

Address data integration challenges with AWS

AWS helps you access and integrate data from anywhere you store it, letting your data integration teams focus on high-value activities that maximize the value of your data.

AWS Glue makes it easy to discover, prepare and integrate all your data at any scale. AWS Database Migration Service helps move database and analytics workloads to AWS quickly, securely, and with minimal downtime and zero data loss. Amazon Managed Workflows for Apache Airflow (MWAA) provides secure and highly available managed workflow orchestration for Apache Airflow. AWS Data Exchange connects with third-party data from 300+ data providers and 3,500+ data products.

For complex use cases where you need to transform data, normalize data, check data quality before ingesting data from a raw data store, or reduce table columns, rows, or data size, AWS Glue makes it easier for you to prepare and integrate data.

For common use cases where ETL is regularly repeated, you can use AWS zero-ETL capabilities, which eliminate the need to build and manage ETL pipelines. With AWS zero-ETL capabilities, you can directly ingest streaming data from Amazon Managed Streaming for Apache Kafka to Amazon Redshift, analyze Amazon Aurora data with Amazon Redshift in near real time, and so much more.

Users across the business have varying technical abilities to interact with the data, and may not be able to do so without the right tools.

AWS provides skill-specific interfaces and job-authoring tools for all user types, from developers to business users. AWS Glue Studio automatically generates ETL code, and lets ETL developers and business analysts transform data with a no-code interface. AWS Glue also lets developers and engineers use their preferred IDE, notebook, and processing engines. Amazon Managed Workflows for Apache Airflow enables scientists and engineers to orchestrate end-to-end data pipelines.

AWS provides support for various workloads with no lock-in.

AWS Glue Studio helps you author highly scalable ETL jobs without becoming an Apache Spark expert, and load structured and unstructured data to data warehouses and data lakes. Amazon Managed Streaming for Apache Kafka (MSK) and Amazon Kinesis make it easy to ingest and process streaming data in real-time. Other common workloads include batch data transformation, database replication, data ingestion from SaaS, data sharing among teams, and subscription to third-party data.

Data integration with AWS

Overview

Benefits of Data integration with AWS

Address data integration challenges with AWS

What's new

Get started with data integration with AWS

AWS Glue

Sign up for a free account

Updates

Learn

Resources

Developers

Help

Data integration with AWS

Overview

Benefits of Data integration with AWS

Connect data from every source

Simplify data integration pipeline development

Provide tools designed for users of all skillsets

Support all of your workloads

Address data integration challenges with AWS

Data systems are sprawling and siloed

Data integration can be complex and time-consuming

Different user skillsets require specific capabilities

Data integration workloads are diverse

Related AWS services and features

What's new

Get started with data integration with AWS

AWS Glue

Sign up for a free account

Updates

Learn

Resources

Developers

Help