Select your cookie preferences

We use essential cookies and similar tools that are necessary to provide our site and services. We use performance cookies to collect anonymous statistics, so we can understand how customers use our site and make improvements. Essential cookies cannot be deactivated, but you can choose “Customize” or “Decline” to decline performance cookies.

If you agree, AWS and approved third parties will also use cookies to provide useful site features, remember your preferences, and display relevant content, including relevant advertising. To accept or decline all non-essential cookies, choose “Accept” or “Decline.” To make more detailed choices, choose “Customize.”

Skip to main content

Why AWS Glue?

Preparing your data to obtain quality results is the first step in an analytics or AI project. AWS Glue is a serverless service that makes data integration simpler, faster, and cheaper. You can discover and connect to more than 100 diverse data sources, manage your data in a centralized data catalog, and visually create, run, and monitor data pipelines to load data into your data lakes, data warehouses, and lakehouses. With built-in generative AI capabilities, you can modernize Apache Spark jobs and develop faster with intelligent assistance for ETL authoring and Spark troubleshooting.

Integrate your data with AWS Glue in the next generation of Amazon SageMaker

With AWS Glue in the next generation of Amazon SageMaker, you can manage and build your workloads in one place with cost-effective, serverless, and scalable data integration.

"Amazon SageMaker logo with text reading 'The next generation of Amazon SageMaker' and 'The center for all your data, analytics, and AI' on a teal gradient background."

Benefits

Use Cases

Simplify ETL pipeline management

Remove infrastructure management with automatic provisioning and worker management, and consolidate all your data integration needs into a single service. Learn more about AWS Glue Auto Scaling

Interactively explore, experiment on, and process data

Using AWS Glue interactive sessions, data engineers can interactively explore and prepare data using the integrated development environment (IDE) or notebook of their choice. Learn more about AWS Glue Interactive Sessions

Discover data efficiently

Quickly identify data across AWS, on premises, and other clouds, and then make it instantly available for querying and transforming. Learn more about AWS Glue Data Catalog

Support various processing frameworks and workloads

More easily support various data processing frameworks, such as ETL and ELT, and various workloads, including batch, micro-batch, and streaming. Learn more about streaming ETL jobs

What's New

Displaying 1-8 (275)
2025-05-15

AWS Glue Studio now supports additional file types and single file output

Today, AWS Glue Studio announces support for additional compressed file types, Excel files (as source), and XML and Tableau's Hyper files (as target). We are also introducing the option to select the number of output files for an S3 target. These enhancements will allow you to use visual ETL jobs for additional data processing workflows not supported today, for example loading data from an Excel file into a single XML file output.

The new experience will now enable you to have one single file as the output of your Glue job, or to specify a custom number for the output files. Further, Glue now supports Excel files via S3 file source nodes, and XML or Tableau Hyper files for S3 file target nodes. New compression types that will be available to use are: LZ4 , SNAPPY, DEFLATE, LZO, BROTLI, ZSTD and ZLIB.

These new features are now available in all AWS commercial Regions and AWS GovCloud (US) Regions where AWS Glue is available. Access the AWS Regional Services List for the most up-to-date availability information.

To learn more, visit the AWS Glue documentation.
 

2025-04-23

Amazon Redshift adds history mode support to 8 third-party SaaS applications

Amazon Redshift now supports history mode for zero-ETL integrations with eight third-party applications including Salesforce, ServiceNow, and SAP. This addition complements existing history mode support for Amazon Aurora PostgreSQL-compatible and MySQL-compatible, DynamoDB, and RDS for MySQL databases. The expansion enables you to track historical data changes without Extract, Transform, and Load (ETL) processes, simplifying data management across AWS and third-party applications.

History Mode for zero-ETL integrations with third-party applications lets customers easily run advanced analytics on historical data from their applications, build comprehensive lookback reports, and perform trend analysis and data auditing across multiple zero-ETL data sources. This feature preserves the complete history of data changes without maintaining duplicate copies across various external data sources, allowing organizations to meet data retention requirements while significantly reducing storage needs and operational costs. Available for both existing and new integrations, history mode offers enhanced flexibility by allowing selective enabling of historical tracking for specific tables within third-party application integrations, giving businesses precise control over their data analysis and storage strategies.

To learn more about history mode for zero-ETL integrations in Amazon Redshift and how it can benefit your data analytics workflows, visit the
history mode documentation. To learn more about the supported third-party applications, visit the AWS Glue documentation. To get started with zero-ETL integrations, visit the getting started guides for Amazon Redshift.