Posted On: Mar 30, 2021
AWS Glue is a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning, and application development. Using AWS Glue Workflows, you can orchestrate and execute a complex multi-job, multi-crawler data-integration workflow. AWS Glue custom blueprints make it easy for data engineers to create repeatable AWS Glue workflows.
Before starting with the AWS Glue blueprint, you identify a repeatable data integration workflow. For example, an ETL workflow that converts CSV data in your raw S3 bucket to parquet format in your production S3 bucket, and you want to run this ETL workflow multiple times in different AWS accounts. Instead of creating one workflow for each ETL process, you can create and register an AWS Glue blueprint that accepts the S3 bucket as an input parameter. A data analyst simply needs to provide input parameters (e.g., data sources and targets) to create new data integration workflows.