Why AWS Glue?
Preparing your data to obtain quality results is the first step in an analytics or ML project. AWS Glue is a serverless data integration service that makes data preparation simpler, faster, and cheaper. You can discover and connect to over 70 diverse data sources, manage your data in a centralized data catalog, and visually create, run, and monitor ETL pipelines to load data into your data lakes. With built-in generative AI capabilities, you can modernize Spark jobs and develop faster with intelligent assistance for ETL authoring and Spark troubleshooting.
Introduction to AWS Glue (01:54)

Benefits of AWS Glue

Flexible support for ETL, ELT, batch, streaming and more, with no lock-in
Petabyte scale, pay-as-you-go billing, any data size
Support all data users from developers to business users
Get AI-powered help throughout your data integration journey—from automatically generating ETL code to modernizing your Spark jobs. AWS Glue provides intelligent code generation, AI-assisted Spark upgrades (preview), and built-in Spark troubleshooting (preview).
Complete data integration capabilities in one serverless service

How it works

AWS Glue is a serverless data integration service that makes it easier to discover, prepare, move, and integrate data from multiple sources for analytics, machine learning (ML), and application development.

  • Data integration engine options
  • Choose your preferred data integration engine in AWS Glue to support your users and workloads.

    Diagram shows multiple data processing engine options for AWS Glue.
  • Event-driven ETL
  • AWS Glue can run your extract, transform, and load (ETL) jobs as new data arrives. For example, you can configure AWS Glue to initiate your ETL jobs to run as soon as new data becomes available in Amazon Simple Storage Service (S3).

    Diagram showing how AWS Glue can run your ETL jobs as new data arrives.
  • AWS Glue Data Catalog
  • You can use the Data Catalog to quickly discover and search multiple AWS datasets without moving the data. Once the data is cataloged, it is immediately available for search and query using Amazon Athena, Amazon EMR, and Amazon Redshift Spectrum.

    Diagram showing the Data Catalog discovering and searching datasets without moving the data.
  • No-code ETL jobs
  • AWS Glue Studio makes it easier to visually create, run, and monitor AWS Glue ETL jobs. You can build ETL jobs that move and transform data using a drag-and-drop editor, and AWS Glue automatically generates the code.

    Diagram showing how users can compose ETL jobs that move and transform data using a drag-and-drop editor.
  • Manage and monitor data quality
  • AWS Glue Data Quality automates data quality rule creation, management, and monitoring to help ensure high quality data across your data lakes and pipelines.

    Diagram shows how AWS Glue Data Quality automatically measures, monitors, and manages data quality in data lakes and data pipelines.
  • Data preparation
  • With AWS Glue DataBrew, you can explore and experiment with data directly from your data lake, data warehouses, and databases, including Amazon S3, Amazon Redshift, AWS Lake Formation, Amazon Aurora, and Amazon Relational Database Service (RDS). You can choose from over 250 prebuilt transformations in DataBrew to automate data preparation tasks such as filtering anomalies, standardizing formats, and correcting invalid values.

    Additionally, AWS Glue Studio offers a data preparation tool that allows you to prepare data with an interactive, point-and-click visual interface without writing code.

    Diagram showing how DataBrew automates data preparation tasks for users.

Use Cases

Simplify ETL pipeline development

Remove infrastructure management with automatic provisioning and worker management, and consolidate all your data integration needs into a single service.

Interactively explore, experiment on, and process data

Using AWS Glue interactive sessions, data engineers can interactively explore and prepare data using the integrated development environment (IDE) or notebook of their choice.

Discover data efficiently

Quickly identify data across AWS, on premises, and other clouds, and then make it instantly available for querying and transforming.

Support various processing frameworks and workloads

More easily support various data processing frameworks, such as ETL and ELT, and various workloads, including batch, micro-batch, and streaming.

What's new

  • Date (Newest to Oldest)
No results found
1

Explore more of AWS