How can I automatically start an AWS Glue job when a crawler run completes?

Last updated: 2019-01-16

I want to configure AWS Glue to automatically start a job when a crawler run completes. How can I do that?

Short Description

It's not possible to use AWS Glue triggers to start a job when a crawler run completes. Instead, create an AWS Lambda function and an Amazon CloudWatch Events rule.

Resolution

Before completing the following steps, be sure that you have:

  • An AWS Glue extract, transform, and load (ETL) job.
  • An AWS Glue crawler.
  • An AWS Identity and Access Management (IAM) role for Lambda with the permission to run AWS Glue jobs. For example, set up a service-linked role for Lambda that has the AWSGlueServiceRole policy attached to it.

Create the Lambda function

1.     Open the Lambda console.

2.     Choose Create function.
Note: If you have no Lambda functions, then the Get started page appears. Choose Create a function and then continue to the next step.

3.     Be sure that Author from scratch is selected, and then configure the following options:
For Name, enter a name for your function.
For Runtime, choose Python 2.7, Python 3.6, or Python 3.7.
For Role, choose Choose an existing role.
For Existing role, select an IAM role that has permission to run AWS Glue jobs.

4.     Choose Create function.

5.     Paste the following code into the Function code section. Be sure to replace MyTestJob with the name of your AWS Glue ETL job.

import boto3
client = boto3.client('glue')

def lambda_handler(event, context):
response = client.start_job_run(
JobName = 'MyTestJob')

6.     In the top-right corner of the page, choose Save, and then choose Test.

7.     Open the AWS Glue console and confirm that the job started.

Create the CloudWatch Events rule

1.     Open the CloudWatch console.

2.     In the navigation pane, choose Rules, and then choose Create rule.

3.     In the Event Source section, choose Event Pattern, and then choose the element labeled Build event pattern to match events by service. From the resulting drop-down list, choose Custom event pattern.

4.     In the Build custom event pattern box, replace the existing code with the following code. Be sure to replace MyTestCrawl with the name of your AWS Glue crawler.

{
    "detail-type": [
        "Glue Crawler State Change"
    ],
    "source": [
        "aws.glue"
    ],
    "detail": {
        "crawlerName": [
            "MyTestCrawl"
        ],
        "state": [
            "Succeeded"
        ]
    }
}

5.     In the Targets section on the right side of the page, choose Add target.

6.     In the drop-down list, choose Lambda function, if it isn't already selected.

7.     In the Function drop-down list, choose the name of your Lambda function.

8.     In the lower-right corner of the page, choose Configure details.

9.     Enter a Name and Description for your CloudWatch Events rule, and then choose Create rule.

To test the Lambda function and CloudWatch Events rule, run your AWS Glue crawler. Then, check the History tab of your AWS Glue ETL job. The Run status should display Starting or Running.


Did this article help you?

Anything we could improve?


Need more help?