How can I automatically start an AWS Glue job when a crawler run completes?
Last updated: 2022-03-24
I want to configure AWS Glue to automatically start a job when a crawler run completes.
Resolution
You can use AWS Glue triggers to start a job when a crawler run completes. However, the AWS Glue console supports only jobs and doesn't support crawlers when working with triggers. You can use the AWS Command Line Interface (AWS CLI) or AWS Glue API to configure triggers for both jobs and crawlers.
Run the following AWS CLI command to create a trigger that can start a job when a crawler run completes:
$ aws glue create-trigger --name testTrigger --type CONDITIONAL --predicate 'Logical=AND,Conditions=[{LogicalOperator=EQUALS,CrawlerName=testCrawler,CrawlState=SUCCEEDED}]' --actions JobName=testJob --start-on-creation
Note: If you receive errors when running AWS CLI commands, make sure that you’re using the most recent version of the AWS CLI.
You can also create a trigger using Python boto3 SDK:
import boto3
client = boto3.client("glue")
response = client.create_trigger(
Name="testTrigger",
Type="CONDITIONAL",
Predicate={
"Logical": "AND",
"Conditions": [
{
"LogicalOperator": "EQUALS",
"CrawlerName": "testCrawler",
"CrawlState": "SUCCEEDED",
},
],
},
Actions=[
{"JobName": "testJob"},
],
StartOnCreation=True,
)
With either of the preceding approaches, you can create the trigger testTrigger that can start the job testJob after the crawler testCrawler runs successfully.
Note: The crawler testCrawler must be started only using a trigger. If you start the crawler manually, then the job doesn't get fired by the trigger. In AWS Glue, all jobs or crawlers are started only if they are started by a trigger. Be sure that all jobs or crawlers in a dependency chain are descendants of the scheduled or on-demand triggers.
Additionally, you can use one of the following methods:
- Create an AWS Lambda function and an Amazon EventBridge rule. When you choose this option, the Lambda function is always on. It monitors the crawler regardless of where or when you start it. You can also modify this method to automate other AWS Glue functions. For more information, see How can I use a Lambda function to automatically start an AWS Glue job when a crawler run completes?
- Use AWS Glue workflows. This method requires you to start the crawler from the Workflows page on the AWS Glue console. For more information, see How can I use AWS Glue workflows to automatically start a job when a crawler run completes?
Did this article help?
Do you need billing or technical support?