AWS for M&E Blog

Monitoring AWS Media Services using Amazon CloudWatch Events

The Media Services Application Mapper (MSAM) tool was created to help video streaming operators monitor their live channels. The tool provides a web interface that visualizes the components of live video workflows, including logical connections between AWS Media Services and other services from AWS. MSAM can be configured to show error messages, alerts, and alarms associated with each specific media service resource. MSAM takes advantage of a few different technologies to make these visualizations happen, including Amazon CloudWatch Events, which is the focus of this blog post.

What exactly are CloudWatch Events? They are essentially a mechanism to describe changes in an AWS resource, represented in JavaScript Object Notation (JSON) format. Events can be delivered to a number of different AWS services as soon as these changes occur. AWS services can choose to publish these events, which can be captured, stored, and acted upon. AWS Elemental MediaPackage, for example, publishes quite a few CloudWatch event types including, but not limited to, input notifications that let you know when there are changes related to the video ingest process. AWS Elemental MediaLive, on the other hand, publishes events related to the state of its inputs and channels. AWS Elemental MediaStore publishes events related to the creation and deletion of containers and its objects. The following example CloudWatch event indicates a MediaLive channel changing from a stopped to a starting state:

{
    "version": "0",
    "id": "645951c9-b88c-6cbe-5d65-d3db7c9ea753",
    "detail-type": "MediaLive Channel State Change",
    "source": "aws.medialive",
    "account": "123456789012",
    "time": "2020-05-12T19:34:18Z",
    "region": "us-west-2",
    "resources": [
        "arn:aws:medialive:us-west-2:123456789012:channel:1234567"
    ],
    "detail": {
        "pipelines_running_count": 1,
        "state": "STARTING",
        "pipeline": "0",
        "channel_arn": "arn:aws:medialive:us-west-2:123456789012:channel:1234567",
        "message": "Pipeline started for channel"
    }
}

Among the variety of ways to receive CloudWatch events, MSAM uses an AWS Lambda as its target. In MSAM, a CloudWatch event rule is subscribed to events published by AWS Media Services, and a Lambda function gets triggered by this event rule. These resources are created automatically through an AWS Serverless Application Model (SAM) template as shown by the following code snippet. The Lambda function is defined as Collector under Resources and the CloudWatch Event rule is defined in the Events section within that resource. Any CloudWatch event that comes in is checked against the Pattern definition and those that match (i.e. the source matches one of the media services in the list) get forwarded to the Lambda function for handling.

{
    "AWSTemplateFormatVersion": "2010-09-09",
    "Transform": "AWS::Serverless-2016-10-31",
    "Description": "Media Services Application Mapper (MSAM) event capture (ID: DEV_0_0_0)",
    "Resources": {
        "Collector": {
            "Type": "AWS::Serverless::Function",
            "Properties": {
                "Handler": "media_events.lambda_handler",
                "Description": "MSAM Lambda for handling CloudWatch event notifications",
                ...
                "*Events*": {
                    "MediaEvents": {
                        "Type": "CloudWatchEvent",
                        "Properties": {
                            "*Pattern*": {
                                "*source*": [
                                    "aws.medialive",
                                    "aws.mediapackage",
                                    "aws.mediastore",
                                    "aws.mediatailor",
                                    "aws.mediaconnect"
                                ]
                            }
                        }
                    }
                }
            }
        }
        ...
    }
}

Once the SAM template described previously is deployed through AWS CloudFormation, the CloudWatch event rule gets created and can be inspected in the console:

The target Lambda function decides what to do with the event data depending on the source of the event. For example, if the event received is from MediaLive and happens to be a channel or multiplex alert, then the information is saved in an Amazon DynamoDB table specifically for alerts. The following Lambda function code snippet from MSAM describes exactly that. The cached information in the DynamoDB table is then used by the front end code to make the appropriate visual changes to the node representing the alert state of a MediaLive channel or multiplex (e.g. a node’s color is red if an alert is set).

def lambda_handler(event, _):
    """
    Entry point for CloudWatch event receipt.
    """
    try:
        ...
        # alerts are stored into a DynamoDB table after some processing
        if "Alert" in event["detail-type"]:
            # MediaLive alerts
            if "MediaLive" in event["detail-type"]:
                event["alarm_id"] = event["detail"]["alarm_id"]
                event["alarm_state"] = event["detail"]["alarm_state"].lower()
                event["detail"]["pipeline_state"] = get_pipeline_state(event)  
                 ...
            EVENTS_TABLE.put_item(Item=event)
            *print*(event["detail-type"] + " stored.")
            
        # all CloudWatch events, including alerts, are stored in another DynamoDB table
        if "resource_arn" in item:
            print("Storing media service event.")
            print(item)
            CLOUDWATCH_EVENTS_TABLE.put_item(Item=item)

MSAM previously only stored events related to media services that required action on its part but was updated more recently to store all events. These events are now displayed on the MSAM web console, where the live stream operator gets an even more holistic view of monitored resources. In the following diagram, we see how selecting the node corresponding to a MediaLive channel makes available the “Recent CloudWatch Events” tab that shows the CloudWatch events associated with that channel. We see there are multiple channel alert events received, and correspondingly, the node’s color is set to red to indicate there are ongoing alerts.

Recent CloudWatch alert events of a MediaLive channel on MSAM

In addition to the CloudWatch Events that are published by the media services, the API calls they make are also captured as events by another service called AWS CloudTrail. These API calls include those made by the service through the console or as direct calls through the SDK. CloudTrail is enabled by default on your AWS account and you can see events on the CloudTrail console under Event History. However, if you explicitly create a trail, you’ll be able to capture these API calls through the same CloudWatch Event rule created by MSAM described previously in this blog.

MSAM doesn’t automatically create a trail for you. But, you can follow the instructions in the documentation to create one manually. Note that depending on how you set up your trail, you could incur additional costs. Refer to the CloudTrail pricing page for more information. Once a trail has been created in your account, you will begin to see events like those in the following graphic (an API call to update a MediaPackage channel description) captured by the CloudWatch Event rule and processed by the target Lambda function. You can tell that is a CloudTrail event by examining the detail-type.

{
    "version": "0",
    "id": "07038943-4537-6b3f-c686-02f74e556a5d",
    "*detail-type*": "AWS API Call via CloudTrail",
    "source": "aws.mediapackage",
    "account": "123456789012",
    "time": "2020-05-01T22:28:55Z",
    "region": "us-west-2",
    "resources": [],
    "detail": {
        "eventVersion": "1.05",
        "userIdentity": { ...
        },
        "eventTime": "2020-05-01T22:28:55Z",
        "eventSource": "mediapackage.amazonaws.com",
        "eventName": "UpdateChannel",
        "userAgent": "aws-sdk-java/1.11.761",
        "requestParameters": {
            "description": "workshop channel - update description",
            "id": "workshop-channel"
        },
        "responseElements": {
            "description": "workshop channel - update description",
            "id": "workshop-channel",
            "hls_ingest": {
                "ingest_endpoints": [ ...
                ]
            },
            "arn": "arn:aws:mediapackage:us-west-2:123456789012:channels/1dd2170fbfdf4a0890dd575f7a2646b2",
            "tags": {}
        },
        "requestID": "d13c3f60-8666-416c-a9f0-97291bcac75f",
        "eventID": "1743d7e8-a70b-4a12-ab55-1fdcbb782a51",
        "readOnly": False,
        "eventType": "AwsApiCall"
    }
}

If you chose to store CloudTrail events, you may log a lot of events depending on the activities in your account. For example, if you are using a MediaStore container in your live workflow, every time a manifest file is updated in a container, an API call is made and captured by CloudTrail. Although this only happens every 10 minutes or so, it can add up. So you need to make sure that you create an appropriate life cycle policy for the objects in the Amazon S3 bucket where your CloudTrail events are stored. In addition, you should adjust the ITEM_TTL setting of your Events stack in MSAM accordingly. This setting determines how long items in your DynamoDB event tables are retained.

Also note that you may end up capturing the same event twice because the service published an event as a CloudWatch event, and the corresponding API call of the event was also recorded by CloudTrail. When a MediaLive channel is started, for instance, MediaLive publishes a CloudWatch event for this, but there’s also a corresponding API call captured by CloudTrail. This is described in the following diagram:

MediaLive channel start CloudWatch events published by service and CloudTrail on MSAM

Having said that, a trail allows you to capture events you may not necessarily have received otherwise. As an example, when credentials of a MediaPackage input have been rotated, MediaPackage doesn’t publish that directly as a CloudWatch event. But the API call is recorded by CloudTrail and therefore received by the CloudWatch Event rule, as described in the following diagram. So if your MediaLive channel suddenly has problems pushing to a MediaPackage channel, this captured event can be a clue as to why the issue seemingly occurred for no apparent reason.

MediaPackage rotate input credentials CloudWatch event published by CloudTrail on MSAM

In this post, we’ve shown you how a tool like MSAM captures and stores CloudWatch events published by AWS Media Services in order to provide a monitoring system for live streaming workflows. We’ve shown you how creating a trail in CloudTrail gives you access to API calls made by services in your account, including the media services, through the same CloudWatch event mechanism. The concepts and code snippets presented here can be used to customize your own monitoring system or to make improvements and contributions to MSAM, which is an open source project.

Resources