Capture Amazon Chime SDK Meetings Using Media Capture Pipelines

Update August 18, 2022 – Introducing video compositing and concatenation to generate single video recording file with all attendee’s video and audio. See section “Compositing and Concatenation”.

Today we’re launching a new feature for the Amazon Chime SDK to allow builders to capture the contents of a meeting. This new feature, media capture pipeline, captures meeting audio, video, and content share streams, along with meeting events and data messages, and saves them to an Amazon Simple Storage Service (Amazon S3, or S3) bucket designated by the builder. By default, the media capture pipeline with the Amazon Chime SDK provides the ability to capture the mixed meeting audio stream for the duration of the capture, as well as the video of the active speaker when their video is available, and the content share streams when available. Builders can request to turn off the active speaker video and instead capture individual video streams of each attendee when they are active. All meeting artifacts are delivered to the S3 bucked designated by the builder in up to five second file chunks throughout the meeting.

Previously we published a blog demonstrating how you can enable client-side recording using the Amazon Chime SDK, on your own Amazon Elastic Container Service (Amazon ECS) infrastructure. It requires builders to deploy and maintain their serverless application in containers and record the meeting through a browser’s built-in API to capture the browser session as a single video stream. With the new media capture pipeline feature, the builders only need to call the new APIs to start the media capture and no longer need to worry about extra infrastructure.

Overview

Media capture can be started by calling CreateMediaCapturePipeline API. By specifying the Amazon Chime SDK Meeting and S3 bucket in the API request, the media capture pipeline service will start to capture the meeting’s content and upload them to the specified S3 bucket. When the DeleteMediaCapturePipeline API is called to stop the media capture pipeline or if the Amazon Chime SDK meeting ends, the meeting capture stops.

In this blog post, we will walk through the functionalities of new media capture pipeline. We will manually start a media capture pipeline to capture an ongoing Amazon Chime SDK meeting, and then we look at what we can do with the captured artifacts. We have also included an AWS Cloud Development Kit (CDK) package for you to deploy a simple demo app to try out this new feature by including a button on a React page.

Note: Deploying this demo and receiving traffic from the demo created in this post can incur AWS charges.

Recording or capturing Amazon Chime SDK meetings with the demo in this blog may be subject to laws or regulations regarding the recording of electronic communications. It is your and your end users’ responsibility to comply with all applicable laws regarding the recordings, including properly notifying all participants in a recorded session, or communication that the session or communication is being recorded, and obtain their consent.

Walkthrough Media Capture Pipeline

Four new APIs have been added to the AWS SDK under Amazon Chime for media capture pipeline functionalities:

CreateMediaCapturePipeline
DeleteMediaCapturePipeline
GetMediaCapturePipeline
ListMediaCapturePipeline

Please read the AWS API reference for more details on how to use these APIs.

Prerequisites

You have basic knowledge in any language of the AWS SDK or AWS Command Line Interface (used in this blog post). Get the latest if not yet updated.
You have basic knowledge in Amazon Chime SDK for Javascipt or Amazon Chime SDK for mobile.
You have created an Amazon Chime SDK meeting (see included CDK demo).
You have your AWS account’s admin role or a role with a policy allowing chime:*, s3:GetBucketPolicy and s3:GetBucketLocation.
You have the S3 buckets created in the same account and region where your meeting is hosted, and properly configured with the bucket policy following the instruction below.

Prepare the S3 bucket to receive captured artifacts

It must be owned by the same AWS account as the Amazon Chime SDK meeting
It must have an S3 bucket policy to allow the Amazon Chime service to upload files to it. We recommend you set the S3 bucket policy as below. [BUCKET_NAME] needs to be replaced with the actual S3 bucket name.
The S3 bucket must be in the same region as the meeting that is to be captured.

{
    "Version": "2012-10-17",
    "Id": "AWSChimeMediaCaptureBucketPolicy",
    "Statement": [
        {
            "Sid": "AWSChimeMediaCaptureBucketPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": "mediapipelines.chime.amazonaws.com"
            },
            "Action": [ "s3:PutObject", "s3:PutObjectAcl"],
            "Resource": "arn:aws:s3:::BUCKET_NAME/*",
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "YOUR_ACCOUNT_ID"
                },
                "ArnLike": {
                    "aws:SourceArn": "arn:aws:chime:*:YOUR_ACCOUNT_ID:*"
                }
            }
        }
    ]
}

Start Media Capture Pipeline

This blog post uses AWS Command Line Interface to call the AWS APIs. It will be a similar request in a different syntax if using other languages of AWS SDK, such as Java, Javascript or Go.

To start the media capture pipeline, an existing Amazon Chime SDK meeting is required. You can start a new meeting with the Amazon Chime SDK meeting serverless demo or create your own Amazon Chime SDK meeting from your existing service. Then obtain the Amazon Chime SDK meeting Id and construct the Amazon Chime SDK meeting ARN:

arn:aws:chime::[ACCOUNT ID]:meeting:[MEETING ID]

In command line terminal, ensure you have AWS credentials configured in your ~/.aws folder for a role with a policy allowing chime:* and s3:GetBucketPolicy. Run this command to start media capture pipeline:

aws chime-sdk-media-pipelines create-media-capture-pipeline \
    --source-type ChimeSdkMeeting \
    --source-arn arn:aws:chime::[ACCOUNT-ID]:meeting:[MEETING-ID] \
    --sink-type S3Bucket \
    --sink-arn arn:aws:s3:::[BUCKET-NAME]/[Prefix]

Note:

source-type must be ChimeSdkMeeting;
source-arn is the meeting arn we constructed. The ACCOUNT-ID must be the account that is hosting the meeting
sink-type is where the media capture file stored. S3Bucket is the only option for now.
sink-arn is the S3 path ARN. The BUCKET-NAME is the S3 bucket we prepared earlier and it must be owned by the same account.
It is the developer’s choice to store all meeting captures under the same S3 bucket, or each meeting under different buckets, as long as the bucket policy is set properly. We recommend to use the MEETING-ID as prefix such that it would be structured for separate meeting under different prefix.

With the command executed successfully, the response below is followed. The MediaPipelineId is used for later.

{
    "MediaCapturePipeline": {
        "MediaPipelineId": "93494fbe-48d1-447c-a923-fe483c0c534d",
        "SourceType": "ChimeSdkMeeting",
        "SourceArn": "arn:aws:chime::0000000000:meeting:23793cce-000e-44a4-82ce-f9f1ae28b309",
        "Status": "Initializing",
        "SinkType": "S3Bucket",
        "SinkArn": "arn:aws:s3:::[YOUR-BUCKET]/23793cce-000e-44a4-82ce-f9f1ae28b309",
        "CreatedTimestamp": "2021-04-30T18:53:54.076Z",
        "UpdatedTimestamp": "2021-04-30T18:53:54.077Z"
    }
}

In the Amazon Chime SDK meeting roster, a new attendee named “MediaPipeline-*” appears. This is the attendee we use to capture the meeting audio, video, content, data messages, and events. It will always be muted.

Shortly after the capture is started, the captured artifacts will begin to be uploaded into the S3 destination defined in SinkArn.

Note:

Audio folder contains the captured audio and active speaker in mp4 format. It is mixed track audio from all attendees and the video stream of the active speaker (if any) chunked into 5-second pieces. Optionally, you can choose to record only audio. Please refer to the next section on configuring to capture audio only through API parameters.
data-channel folder contains the data messages recorded in this meeting in text format.
meeting-events folder contains the events data during the recording in progress, including active speaker event, video/content track added/removed, attendee join/leave.
transcription-messages folder contains files of transcription messages if Chime live transcription is enabled for the SDK meeting.
video folder contains content share or video stream from each attendee in mp4 format. The media data are also chunked into five-second. The individual video stream capture is disabled by default. To enable individual video streams captures (which will incur additional cost) or only capture certain attendee’s video stream, please refer to next section.

Customizing Source and Artifacts Configuration

Media capture pipeline supports configurable source and artifacts configuration. By providing the parameters in create-media-capture-pipeline, you can choose between AudioOnly mode and AudioWithActiveSpeakerVideo mode. In addition, you can also enable individual video stream or just selected attendees’ video to be captured. Please check out the API reference for detail.

Here we have an example to use aws command line to provide customized parameter for create-media-capture-pipeline API.

We start with generating the cli skeleton json file:

aws chime-sdk-media-pipelines create-media-capture-pipeline --generate-cli-skeleton > mediacapture.json

The JSON fill will look like:

{
    "SourceType": "ChimeSdkMeeting",
    "SourceArn": "arn:aws:chime::[ACCOUNT-ID]:meeting:[MEETING-ID]",         
    "SinkType": "S3Bucket",
    "SinkArn": "arn:aws:s3:::[BUCKET-NAME]/[Prefix]",
    "ChimeSdkMeetingConfiguration": {
        "SourceConfiguration": {
            "SelectedVideoStreams": {
                "AttendeeIds": ["[ATTENDEE-ID]"],
                "ExternalUserIds": ["[EXTERNAL-USER-ID]"]
            }
        },
        "ArtifactsConfiguration": {
            "Audio": {
                "MuxType": "AudioOnly | AudioWithActiveSpeakerVideo"
            },
            "Video": {
                "State": "Enabled | Disabled",
                "MuxType": "VideoOnly"
            },
            "Content": {
                "State": "Enabled | Disabled",
                "MuxType": "ContentOnly"
            }
        }
    }
}

ChimeSdkMeetingConfiguration is an optional field to define the capture configuration for the media capture pipeline.

Note:

SourceConfiguration is optional. You can provide a list of attendee-id or external-user-id to specify which video stream or content stream to be captured.
ArtifactsConfiguration is optional. You can provide the proper mux type to specify audio/video/content stream capture.
If you have previously contacted AWS to set up static configuration for media capture, these API parameter will override the previous configuration.

After editing JSON file with proper parameters, you can run the aws command line to create a media capture pipeline:

aws chime-sdk-media-pipelines create-media-capture-pipeline --cli-input-json file://mediacapture.json

Delete Media Capture Pipeline

When you want to stop the media capture pipeline, run the below command. The media-pipeline-id is the value returned in the create-media-capture-pipeline response.

aws chime-sdk-media-pipelines delete-media-capture-pipeline \
    --media-pipeline-id 93494fbe-48d1-447c-a923-fe483c0c534d

Once the API is called, the MediaPipeline attendee will leave the meeting and the capture will stop. No new captured artifacts are uploaded into S3 bucket unless new media capture pipeline is started with same SinkArn.

Note: Ending the meeting also stops the media capture pipeline.

Notifications

Please refer to AWS developer guide for details of these statuses. With each status change, a notification is sent to the Amazon Simple Notification Service (Amazon SNS) topic, Amazon Simple Queue Service (Amazon SQS)queue specified by the builder when calling CreateMeeting API, or Amazon EventBridge’s event source under Amazon Chime service.

Using the Captured Artifacts

Now let’s take a look the artifacts under each folder.

Inside the “audio” folder, are the mixed audio tracks with active speaker video of the meeting chunked into 5 second pieces. The file name is in the format of yyyy-MM-dd-HH-mm-ss-SSS.mp4.

The “video” folder, by default will contain the content share streams with suffix of “#content” as part of the attendee id that is in the file name.

The file name format is yyyy-MM-dd-HH-mm-ss-SSS-[ATTENDEE-ID]#content.mp4.

The “meeting-events”, “data-channel” and “transcription-messages” folders contain the events message, the data channel message and transcriptions (if Chime SDK Live transcription enabled) respectively. These artifacts only appear if there is an event or message occurs in the meeting during the recording.

Demo for combining the meeting artifacts

We have also included an AWS Cloud Development Kit (AWS CDK) deployment that can be used to see these new features in action. This deployment includes the following:

S3 Bucket – Used as sink for storage of the captured media as well as the processed media
Create AWS Lambda – Used to create the meeting and join users to the Amazon Chime SDK Meeting.
Record Lambda – Used to start and stop the media capture pipeline.
Process Lambda – Used after the recording has stopped to process the video from separate chunks into a single mp4 file.
Amazon API Gateway – Used to trigger Lambdas from the client side browser
AWS SDK Layer – Used by the Create and Record Lambdas to have access to Amazon Chime APIs that are not currently available in Lambda
Python Layer – Used by the Process Lambda to assist with ffmpeg
FFMPEG Layer – Used by Process Lambda to concat files together. Static build of FFmpeg/FFprobe for Amazon Linux 2. Bundles FFmpeg 4.1.3. Deployed from serverlessrepo.

The instructions for the deployment and use of the demo are included in the github repo. This demo uses the AWS SDK versions of the createMediaCapturePipeline and deleteMediaCapturePipeline in a Node.JS Lambda to create and delete media capture pipelines. A React client is used to create and join an Amazon Chime SDK meeting. Finally, a Python Lambda is used to create a combined media file once recording has stopped.

Compositing and Concatenation

On August 18, 2022, the Amazon Chime SDK launched the video compositing and concatenation feature through new APIs. You can find detailed API information in CreateMediaCapturePipeline and CreateMediaConcatenationPipeline API reference doc.

The basic work flow of media concatenation pipeline is to fetch all the chunk files that was uploaded in media capture pipeline, concatenate each stream, such as composited video, audio or active speaker video. Each stream will generate a single file and uploaded to the designated destination (could be same S3 bucket or different). To support fetching the chunk files from your S3 bucket, the service will require additional permissions on your S3 bucket. Here is a updated policy for the S3 bucket to support concatenation:

{
    "Version": "2012-10-17",
    "Id": "AWSChimeMediaCaptureBucketPolicy",
    "Statement": [
        {
            "Sid": "AWSChimeMediaCaptureConcatBucketPolicy",
            "Effect": "Allow",
            "Principal": {
                "Service": ["mediapipelines.chime.amazonaws.com"]
            },
            "Action": [
                "s3:PutObject",
                "s3:PutObjectAcl",
                "s3:GetObject",
                "s3:ListBucket",
            ],
            "Resource": [
                "arn:aws:s3:::[Bucket-Name]/*",
                "arn:aws:s3:::[Bucket-Name]",
            ],
            "Condition": {
                "StringEquals": {
                    "aws:SourceAccount": "[Account-Id]"
                }
            },
            "ArnLike": {
                "aws:SourceArn": "arn:aws:chime:*:[Account-Id]:*"
            }
        },
    ],
}

With the bucket permission properly set, we can call create-media-concatenation-pipeline API. This API can be called any time after media capture pipeline is created – it doesn’t have to wait after meeting or recording to finish. You can also concatenate previously recorded meetings up to 30 days old. It is an asynchronous work flow which waits for media capture to complete. Here is a request sample, while you can find more detailed information in our developer guide.

{
    "Sources": [
        {
            "Type": "MediaCapturePipeline",
            "MediaCapturePipelineSourceConfiguration": {
                "MediaPipelineArn": "[Media_Pipeline_Arn]",  //must be <30 days old
                "ChimeSdkMeetingConfiguration": {
                    "ArtifactsConfiguration": {
                        "Audio": {
                            "State": "Enabled"
                        },
                        "Video": {
                            "State": "Enabled | Disabled"
                        },
                        "Content": {
                            "State": "Enabled | Disabled"
                        },
                        "DataChannel": {
                            "State": "Enabled | Disabled"
                        },
                        "TranscriptionMessages": {
                            "State": "Enabled | Disabled"
                        },
                        "MeetingEvents": {
                            "State": "Enabled | Disabled"
                        },
                        "CompositedVideo": {
                            "State": "Enabled | Disabled"
                        }
                    }
                }
            }
        }
    ],
    "Sinks": [
        {
            "Type": "S3Bucket",
            "S3BucketSinkConfiguration": {
                "Destination": "arn:aws:s3:::[Bucket_Name]/[Path]"
            }
        }
    ]
}

Next, let’s walkthrough how to set up the video compositing and concatenation feature to generate the preferred single video recording file.

Before we get started, we need to clarify these two concepts: What is video compositing? And what is concatenation?

Generally speaking, video compositing is the process of combining visual elements from difference sources into single image. In the context of Amazon Chime SDK meeting, video compositing is combining attendees’ video, and displayed such as grid view in one video stream. Note that this combined video stream recording will still be stored in your S3 bucket as 5 second chunk.

Concatenation is the technique to merge all these chunks into a single file. In this blog, we introduced to use FFMPEG to concatenate the video and audio chunks by yourself, but now you can use CreateMediaConcatenationPipeline to merge the media capture artifacts that’s generated from media capture pipeline.

We will be using CreateMediaCapturePipeline request examples to demonstrate different video compositing layout.

PresentorOnly is one of the two options with content share enabled. Content share will occupy majority of the screen with the presenter’s video overlayed on the corner you set with PresenterPosition.

{
    "SourceType": "ChimeSdkMeeting",
    "SourceArn": "arn:aws:chime::<account-id>:meeting:<meeting-id>",
    "SinkType": "S3Bucket",
    "SinkArn": "arn:aws:s3:::<bucket-name>",
    "ChimeSdkMeetingConfiguration": {
        "ArtifactsConfiguration": {
            "Audio": {
                "MuxType": "AudioWithCompositedVideo"
            },
            "Video": {
                "State": "Disabled",
                "MuxType": "VideoOnly"
            },
            "Content": {
                "State": "Disabled",
                "MuxType": "ContentOnly"
            },
            "CompositedVideo": {
                "Layout": "GridView",
                "Resolution": "FHD",
                "GridViewConfiguration": {
                    "ContentShareLayout": "PresenterOnly",
                    "PresenterOnlyConfiguration": {
                        "PresenterPosition": "TopRight"
                    }
                }
            }
        }
    }
}

Feature content share is the second option with content share enabled. Content share will occupy majority of the screen with other video tiles show on either horizontal or vertical depending on the configuration. The most recent enabled four videos will be shown with presenter video shown on the first tile.

{
    "SourceType": "ChimeSdkMeeting",
    "SourceArn": "arn:aws:chime::<account-id>:meeting:<meeting-id>",
    "SinkType": "S3Bucket",
    "SinkArn": "arn:aws:s3:::<bucket-name>",
    "ChimeSdkMeetingConfiguration": {
        "ArtifactsConfiguration": {
            "Audio": {
                "MuxType": "AudioWithCompositedVideo"
            },
            "Video": {
                "State": "Disabled",
                "MuxType": "VideoOnly"
            },
            "Content": {
                "State": "Disabled",
                "MuxType": "ContentOnly"
            },
            "CompositedVideo": {
                "Layout": "GridView",
                "Resolution": "FHD",
                "GridViewConfiguration": {
                    "ContentShareLayout": "Horizontal"
                }
            }
        }
    }
}

When content is not being shared in either “Presenter only” or “Feature content share,” Grid View will be the default layout. In this layout, each video stream will occupy a tile and will automatically scale based on the number of video streams in the meeting session. Tiles are automatically arranged in rows and columns based on number of video streams with a maximum of 25 video tiles. See that added code block in the example JSON file below. Note that SourceConfiguration is an optional field, it can use either AttendeeIds or ExternalUserIds to select the source. ExternalUserIds is often the friendly name or other identifier of the attendee that the builder chooses.

{
    "SourceType": "ChimeSdkMeeting",
    "SourceArn": "arn:aws:chime::<account-id>:meeting:<meeting-id>",
    "SinkType": "S3Bucket",
    "SinkArn": "arn:aws:s3:::tenge-recording-test",
    "ChimeSdkMeetingConfiguration": {
        "ArtifactsConfiguration": {
            "Audio": {
                "MuxType": "AudioWithCompositedVideo"
            },
            "Video": {
                "State": "Disabled",
                "MuxType": "VideoOnly"
            },
            "Content": {
                "State": "Disabled",
                "MuxType": "ContentOnly"
            },
            "CompositedVideo": {
                "Layout": "GridView",
                "Resolution": "FHD",
                "GridViewConfiguration": {
                    "ContentShareLayout": "Horizontal"
                }
            }
        },
        "SourceConfiguration": {
            "SelectedVideoStreams": { 
                "AttendeeIds": ["attendeeID1","attendeeID2"], 
                "ExternalUserIds": [ "string" ]
            }
        }
    }
}

The video compositing configuration will generate the composited video chunks in your designated S3 location under “CompositedVideo” prefix. To merge these chunks into a single file, you can follow the out developer guide on media concatenation pipeline.

Conclusion

In this blog, we walked through the new media capture pipeline feature that enables the builder to capture Amazon Chime SDK meetings without the need of maintaining their own container fleet. You also learned about deploying the demo to try out the media capture functionalities, as well as combining the media artifacts.

As the next steps, we suggest you to check out the media capture pipeline developer guide for more details and requirements.

Business Productivity

Capture Amazon Chime SDK Meetings Using Media Capture Pipelines

Overview

Walkthrough Media Capture Pipeline

Prerequisites

Prepare the S3 bucket to receive captured artifacts

Start Media Capture Pipeline

Customizing Source and Artifacts Configuration

Delete Media Capture Pipeline

Notifications

Using the Captured Artifacts

Demo for combining the meeting artifacts

Compositing and Concatenation

Conclusion

Resources

Follow