Products›
Analytics›
AWS Glue

1 million objects stored free with the AWS Free Tier

AWS Glue

Discover, prepare, and integrate all your data at any scale

Get started with AWS Glue

Learn more about AWS Data integration

Why AWS Glue?

Preparing your data to obtain quality results is the first step in an analytics or ML project. AWS Glue is a serverless data integration service that makes data preparation simpler, faster, and cheaper. You can discover and connect to over 70 diverse data sources, manage your data in a centralized data catalog, and visually create, run, and monitor ETL pipelines to load data into your data lakes.

Introduction to AWS Glue (01:54)

Benefits of AWS Glue

Support all workloads

Flexible support for ETL, ELT, batch, streaming and more, with no lock-in

Scale on demand

Petabyte scale, pay-as-you-go billing, any data size

Tailored tools

Support all data users from developers to business users

All in one

Complete data integration capabilities in one serverless service

How it works

AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.

Data integration engine options
Choose your preferred data integration engine in AWS Glue to support your users and workloads.

The diagram shows how AWS Glue users can choose from interface options to create job workloads using multiple data integration engines. Four sections display: one on the left, two in the middle, and one on the right.

The first section on the left is called “Data sources.” It includes the following data sources: “Amazon S3,” “Amazon DynamoDB,” “Databases running on Amazon EC2,” “Databases,” and “SaaS.”

From the first section, there is an arrow pointing to the middle section at the top of the diagram called "Choice of interfaces." Three items are included in this second section: “AWS Glue Studio,” “Amazon SageMaker notebooks,” and “Notebooks and IDEs.”

Below this second section, there is text that says, "Open interfaces support interactive and job workloads." This text includes an arrow pointing to the previously described second section above it and an arrow pointing to the third section below it.

This third section is called "Data integration engines." The text says, "Choose a preferred serverless, scalable data processing engine with automatic scaling and pay-as-you-go pricing." This section includes three engine names: “AWS Glue for Ray,” “AWS Glue for Python Shell,” and “AWS Glue for Apache Spark.”

The fourth section appears to the right of the second section with an arrow pointing from the second section to the fourth section. The fourth section says, "Create and load data into data lakes and data warehouses." This section also includes three items: “Amazon Redshift,” “Data lakes,” and “Data warehouses.”

Click to enlarge
Event-driven ETL
AWS Glue can run your extract, transform, and load (ETL) jobs as new data arrives. For example, you can configure AWS Glue to initiate your ETL jobs to run as soon as new data becomes available in Amazon Simple Storage Service (S3).
AWS Glue Data Catalog
You can use the Data Catalog to quickly discover and search multiple AWS datasets without moving the data. Once the data is cataloged, it is immediately available for search and query using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.
No-code ETL jobs
AWS Glue Studio makes it easier to visually create, run, and monitor AWS Glue ETL jobs. You can build ETL jobs that move and transform data using a drag-and-drop editor, and AWS Glue automatically generates the code.
Manage and monitor data quality
AWS Glue Data Quality automates data quality rule creation, management, and monitoring to help ensure high quality data across your data lakes and pipelines.

The diagram shows how AWS Glue Data Quality can be used to create rule recommendations, monitor data quality, and send alerts when data quality deteriorates. Three sections display from left to right.

The first section has an illustration of AWS Glue Data Catalog and AWS Glue ETL. Under AWS Glue Data Catalog, it says, “Catalog all datasets in your data lakes.” Under AWS Glue ETL, it says, “Integrate and transform data from disparate data sources.”

The second section is titled "AWS Glue Data Quality." There are three icons in this section. The first is a checklist. Underneath it, it says, “Data quality rule recommendations. Get started quickly with automatic data quality rule recommendations.” The second icon is a pencil. Underneath it, it says, “Preconfigured data quality rules. Edit or augment recommendations with preconfigured data quality rules.” The third icon is a bell. Underneath it, it says, “Alerts and actions. Add alerts and actions to perform when data quality deteriorates.”

The third section has two icons stacked. The first icon is a bar chart. Underneath it, it says, “Metrics. Use data quality metrics to make confident business decisions.” The second icon is a warning sign. Underneath it, it says, “Alerts. Use alerts to get notified when quality deteriorates, and take actions to fix the data.”

Click to enlarge
Data preparation
With AWS Glue DataBrew, you can explore and experiment with data directly from your data lake, data warehouses, and databases, including Amazon S3, Amazon Redshift, AWS Lake Formation, Amazon Aurora, and Amazon Relational Database Service (RDS). You can choose from over 250 prebuilt transformations in DataBrew to automate data preparation tasks such as filtering anomalies, standardizing formats, and correcting invalid values.

Additionally, AWS Glue Studio offers a data preparation tool that allows you to prepare data with an interactive, point-and-click visual interface without writing code.

Use cases

Simplify ETL pipeline development

Remove infrastructure management with automatic provisioning and worker management, and consolidate all your data integration needs into a single service.

Learn more about AWS Glue Auto Scaling

Interactively explore, experiment on, and process data

Using AWS Glue interactive sessions, data engineers can interactively explore and prepare data using the integrated development environment (IDE) or notebook of their choice.

Learn more about AWS Glue Interactive Sessions

Discover data efficiently

Quickly identify data across AWS, on premises, and other clouds, and then make it instantly available for querying and transforming.

Learn more about AWS Glue Data Catalog

Support various processing frameworks and workloads

More easily support various data processing frameworks, such as ETL and ELT, and various workloads, including batch, micro-batch, and streaming.

Learn more about streaming ETL jobs

What's new

No results found

1 …

…

Get started with AWS Glue

Try AWS Glue at no cost

Build with AWS Glue

Integrate your data

Explore the developer guide

Explore more of AWS