How can I use a Lambda function to receive SNS alerts when an AWS Glue job fails a retry?

Last updated: 2021-03-30

I want to be notified by an Amazon Simple Notification Service (Amazon SNS) alert when my AWS Glue job fails a retry.

Short description

Amazon EventBridge events for AWS Glue can be used to create Amazon SNS alerts, but the alerts might not be specific enough for certain situations. To receive SNS notifications for certain AWS Glue Events, such as an AWS Glue job failing on retry, you can use AWS Lambda. You can create a Lambda function to do the following:

  1. Check the incoming event for a specific string.
  2. Publish a message to Amazon SNS if the string in the event matches the string in the Lambda function.

To use an AWS Lambda function to receive an email from SNS when any of your AWS Glue jobs fail a retry, do the following:

  1. Create an Amazon SNS topic.
  2. Create an AWS Lambda function.
  3. Create an Amazon EventBridge event that uses the Lambda function to initiate email notifications.

Resolution

Be sure that you have:

  • An AWS Glue extract, transform, and load (ETL) job
  • An AWS Identity and Access Management (IAM) role for AWS Lambda with permission to publish SNS Notifications

Create an Amazon SNS topic

  1. Open the Amazon SNS Console.
  2. Choose Topics, and then Choose Create topic.
  3. For Type, select Standard.
  4. For Name, enter the topic name.
  5. (Optional) For Display name, enter the display name for your topic.
  6. Choose Create topic.
    Your topic is created.
  7. Choose Create subscription.
    For Topic ARN, select the topic that you created.
    For Protocol, select your desired protocol.
    For Endpoint, enter the address where you want to receive the SNS notifications.
  8. Choose Create subscription.
    Your subscription is created.

Create an AWS Lambda function

1.    Open the Lambda console.

2.    Choose Create function.

3.    On the Create function page, do the following:

Select Author from scratch.
For Function name, enter a name for your function.
For Runtime, select one of the Python options (For script compatibility, Python 3.7 is recommended).
Expand the Change default execution role dropdown list.
For Execution role, select Use an existing role.
For Existing role, select the IAM role with permission to send SNS Notifications.

4.    Choose Create function.
Your Lambda function is created.

5.    On the Code tab, in the Code source section, choose File, and then choose New file.
In the new file, enter code similar to the following:

# Import modules
import boto3
import json
import os
import logging
# Set up logging
logger = logging.getLogger()
logger.setLevel(logging.INFO)
# Set up Boto 3 client for SNS
client = boto3.client('sns')
# Variables for the SNS:
snsTopicARN = "arn:aws:sns:us-east-1:012345627499:Glue_Job_Failure_Notification"
# Define Lambda function
def lambda_handler(event, context):
    logger.info('## INITIATED BY EVENT: ')
    logger.info(event['detail'])
    # Define variables based on the event
    glueJobName = event['detail']['jobName']
    jobRunId = event['detail']['jobRunId']
    # Only send SNS notification if the event pattern contains _attempt_1
    if event['detail']['jobRunId'].endswith('_attempt_1'):
        logger.info('## GLUE JOB FAILED RETRY: ' + glueJobName)
    message = \
        "A Glue Job has failed after attempting to retry. JobName: " \
        + glueJobName + ", JobRunID: " + jobRunId
    print(message)
    response = client.publish(
        TargetArn=snsTopicARN,
        Message=json.dumps({'default': json.dumps(message)}),
        Subject='An AWS Glue Job has failed',
        MessageStructure='json')

Note: Be sure to replace snsTopicARN with the ARN of your SNS Topic.
Choose File, and then choose Save.
For Filename, enter the filename of your choice.

6.    Choose Deploy.

(Optional) You can test your event by doing the following:

1.    Choose the Test tab.

For Name, enter the event name. Enter JSON similar to the following:

{
    "version": "0",
    "id": "abcdef01-1234-5678-9abc-def012345678",
    "detail-type": "Glue Job State Change",
    "source": "aws.glue",
    "account": "123456789012",
    "time": "2017-09-07T06:02:03Z",
    "region": "us-west-2",
    "resources": [],
    "detail": {
        "jobName": "MyTestJob",
        "severity": "ERROR",
        "state": "FAILED",
        "jobRunId": "jr_0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef_attempt_1",
        "message": "JobName:MyTestJob and JobRunId:jr_0123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef failed to execute with exception Role arn:aws:iam::123456789012:role/Glue_Role should be given assume role permissions for Glue Service."
    }
}

Note: Be sure to replace MyTestJob with the name of your AWS Glue job.

2.    Choose Save changes.

3.    Choose Test.

4.    View the Execution result that opens up after the test is complete.

5.    Confirm that you receive an SNS notification.

Use Amazon EventBridge to initiate email notifications

1.    Open the EventBridget Console.

2.    On the navigation pane, choose Rules, and then choose Create rule.

3.    On the Create rule page, do the following:

For Name, enter the rule name.
(Optional) For Description - optional, enter the description of the rule.
For Define pattern, select Event pattern.
For Event matching pattern, select Custom pattern.
For Event pattern, enter the following pattern, or a pattern of your choice:

{
    "detail-type": [
        "Glue Job State Change"
    ],
    "source": [
        "aws.glue"
    ],
    "detail": {
        "state": [
            "FAILED"
        ]
    }
}

Choose Save.
On the Select targets section, do the following:
For Target, Choose Lambda function
For Function, choose the function that you've created.

4.    Choose Create.

Test the notification with your AWS Glue job

  1. Open the AWS Glue Console.
  2. On the navigation pane, choose Jobs.
  3. Select the Glue Job where you want to test the notification.
  4. Choose the Action dropdown list, and then choose Edit job.
  5. Expand Security configuration, script libraries, and job parameters (optional).
  6. Under Security configuration, for Number of retries, enter 1.
  7. Choose Save.
  8. On the Jobs page, select the Glue job where you want to test the notification.
  9. Choose the Action dropdown list, and then choose Edit Script.
  10. Change an element of your code so that your job fails. (Example: Add the word "_BROKEN" to a table name).
  11. Choose Save.
  12. On the Jobs page, select the Glue job where you want to test the notification.
  13. Choose the Action dropdown list, and then Choose Run job.
    You should receive a notification on the second failed attempt.
  14. After the testing is complete, edit your Glue job and undo the changes.

Did this article help?


Do you need billing or technical support?