How can I use AWS Glue workflows to automatically start a job when a crawler run completes?

Last updated: 2020-03-13

I want to use AWS Glue workflows to automatically start a job when a crawler run completes. How can I do that?

Short Description

To start a job when a crawler run completes, create an AWS Glue workflow and two triggers: one for the crawler and one for the job. This method requires you to start the crawler from the Workflows page on the AWS Glue console.

Note: You can also use an AWS Lambda function and an Amazon CloudWatch Events rule to automate job runs. When you choose this option, the Lambda function is always on. It monitors the crawler regardless of where or when you start it. For more information, see How can I use a Lambda function to automatically start an AWS Glue job when a crawler run completes?

Resolution

Before completing the following steps, be sure that you have:

  • An AWS Glue extract, transform, and load (ETL) job.
  • An AWS Glue crawler.
  • An AWS Identity and Access Management (IAM) role for AWS Glue that has the AWSGlueServiceRole policy attached to it.

Create the workflow

  1. Open the AWS Glue console.
  2. In the navigation pane, choose Workflows, and then choose Add workflow.
  3. Enter a name for the workflow, and then choose Add workflow. The new workflow appears in the list on the Workflows page.

Create the trigger for the crawler

  1. On the Workflows page, select your new workflow, and then choose the Graph tab.
  2. Choose Add trigger, and then choose the Add new tab. For Trigger type, choose On demand.
  3. Choose Add. The trigger appears on the graph.
  4. On the graph, choose Add node.
  5. On the Crawlers tab, select your crawler, and then choose Add.

Create the trigger for the AWS Glue job

  1. On the Action menu above the graph, choose Add trigger.
  2. Choose the Add new tab, and then select the following options: For Trigger type, choose Event. For Trigger logic, choose Start after ALL watched event.
  3. Choose Add. The trigger appears on the graph.
  4. On the graph, to the left of the job trigger that you just created, choose Add node.
  5. On the Crawlers tab, select your crawler, and then choose Add. The trigger appears on the graph.
  6. On the graph, to the right of the job trigger that you just created, choose Add node.
  7. On the Jobs tab, select the job that you want to start when the crawler run completes, and then choose Add.

Test the workflow

  1. On the Actions menu, next to the Add workflow button, choose Run. The Last run status column changes to Running.
  2. Check the Graph tab to see the status of the workflow. Or, open your corresponding crawler or job to confirm that it's running.

Did this article help you?

Anything we could improve?


Need more help?