How can I use a Lambda function to automatically start an AWS Glue job when a crawler run completes?

Last updated: 2022-08-05

I want to use an AWS Lambda function to automatically start an AWS Glue job when a crawler run completes.

Short description

To start a job when a crawler run completes, create an AWS Lambda function and an Amazon EventBridge rule. You can modify this method to automate other AWS Glue functions.

Note: You can also use AWS Glue workflows to automatically start a job when a crawler run completes. This method requires that you start the crawler from the Workflows page on the AWS Glue console. For more information, see How can I use AWS Glue workflows to automatically start a job when a crawler run completes?

Resolution

Before completing the following steps, be sure that you have:

  • An AWS Glue extract, transform, and load (ETL) job.
  • An AWS Glue crawler.
  • An AWS Identity and Access Management (IAM) role for Lambda with permission to run AWS Glue jobs. For example, set up a service-linked role for Lambda that has the AWSGlueServiceRole policy attached to it.

Create the Lambda function

1.    Open the Lambda console.

2.    Choose Create function.
Note: If you have no Lambda functions, then the Get started page appears. Choose Create a function and then continue to the next step.

3.    Be sure that Author from scratch is selected, and then configure the following options:
For Name, enter a name for your function.
For Runtime, choose one of the Python options.
For Architecture, use the default option, x86_64.
For Role, select the dropdown at Change default execution role and select Use an existing role.
For Existing role, select an IAM role that has permission to run AWS Glue jobs.

4.    Choose Create function.

5.    In the Function code section, paste code similar to the following. Be sure to replace MyTestJob with the name of your AWS Glue ETL job.

# Set up logging
import json
import os
import logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)

# Import Boto 3 for AWS Glue
import boto3
client = boto3.client('glue')

# Variables for the job: 
glueJobName = "MyTestJob"

# Define Lambda function
def lambda_handler(event, context):
    logger.info('## INITIATED BY EVENT: ')
    logger.info(event['detail'])
    response = client.start_job_run(JobName = glueJobName)
    logger.info('## STARTED GLUE JOB: ' + glueJobName)
    logger.info('## GLUE JOB RUN ID: ' + response['JobRunId'])
    return response

6.    In the top of the code edit UI, chose Deploy, and then choose Test.

7.    Open the AWS Glue console and confirm that the job started.

Create the EventBridge rule

1.    Open the Amazon EventBridge console.

2.    In the navigation pane, choose Rules, and then choose Create rule.

3.    Enter a name and description for the rule and select Next.

4.    Use default values for Event source and Sample event. In the Event pattern section, select Custom Patterns (JSON editor).

5.    Copy and paste the following code in the Event pattern box. Be sure to replace MyTestCrawl with the name of your AWS Glue crawler.

{
    "detail-type": [
        "Glue Crawler State Change"
    ],
    "source": [
        "aws.glue"
    ],
    "detail": {
        "crawlerName": [
            "MyTestCrawl"
        ],
        "state": [
            "Succeeded"
        ]
    }
}

6.    In the Select targets section, do the following:

For Target, select Lambda function.

For Function, select the name of your Lambda function.

7.    Choose Create.

To test the Lambda function and EventBridge rule, run your AWS Glue crawler. Then, check the History tab of your AWS Glue ETL job. The Run status should display Starting or Running.


Did this article help?


Do you need billing or technical support?