Introducing AWS Glue: A Simple, Flexible, and Cost-Effective Extract, Transfer, and Load (ETL) Service

Posted on: Aug 14, 2017

Today we announced the general availability of AWS Glue, a new, fully managed extract, transform, and load (ETL) service that makes it easy for customers to prepare and load their data for analytics. With a few clicks in the AWS Management Console, customers can create and run an ETL job. You simply point AWS Glue to your data stored on AWS, and AWS Glue discovers your data and stores associated metadata (e.g. table definition and schema) in the AWS Glue Data Catalog. Once catalogued, your data is immediately searchable, queryable, and available for ETL. AWS Glue generates the code for you to execute your data transformations and loading processes.

AWS Glue generates Python code that is entirely customizable, reusable, and portable. Once your ETL job is ready, you can schedule it to run on AWS Glue’s fully managed, scale-out Apache Spark environment. AWS Glue is serverless, so there is no infrastructure to provision, set up, or manage. It automatically provisions the environment needed to complete the job, and customers pay only for the compute resources consumed while running the jobs. AWS Glue provides a flexible scheduler to manage your ETL jobs, with dependency resolution, job monitoring, and alerting. With AWS Glue, data can be available for analytics in minutes.

AWS Glue is available in the US East (N. Virginia) region and will expand to additional regions in the coming months. To learn more, please visit https://aws.amazon.com/glue/.