Track and visualize streaming beacon data on AWS

Capturing viewers’ experience when delivering valuable media content is more important than ever, as more and more streaming service providers are competing for consumer viewership. Tracking quality of service helps providers keep customers. Although data from the edge, from content delivery networks (CDNs) and Origin logs, is extremely valuable, it does not paint a complete picture. Client-side beaconing (analytical data sourced from the end user’s application) provides near real-time detailed information about all aspects of the playback session. This offers valuable insight into performance, viewer behavior, and overall satisfaction. An example of the information collected is data around stalls or interruptions during video playback.

Many third-party solutions exist to capture this data using proprietary plugins, providing dashboards and logs to analyze beacon data. While these packaged solutions are right for some, others are interested in building their own solutions to manage cost and features.

Using only a few managed services within Amazon Web Services (AWS), you can capture and analyze player beacons in near real-time for a fraction of the cost of a packaged solution, while gaining ownership over customization options.

To deploy the scalable cloud-based beaconing solution described in this blog post, you need to have access to the AWS Management Console (a web interface for accessing and managing your AWS accounts) with permission to use the following services:

Amazon API Gateway – a fully managed service that makes it easy for developers to create, publish, maintain, monitor, and secure APIs at any scale.
Amazon Kinesis – a set of AWS streaming services that makes it easy to collect, process, and analyze near-real-time streaming data so that you can get timely insights and react quickly to new information.
- Amazon Kinesis Data Streams – a massively scalable and durable near real-time data streaming service that can continuously capture gigabytes of data per second from sources such as website clickstreams. The data collected is available in milliseconds to facilitate near real-time analytics use cases such as dashboards, anomaly detection, and more.
- Amazon Kinesis Data Analytics – an easy way to transform and analyze streaming data in near real time using Apache Flink, an open-source framework and engine for processing data streams. Amazon Kinesis Data Analytics reduces the complexity of building, managing, and integrating Apache Flink applications for use alongside AWS services. It takes care of everything required to run streaming applications continually and scales automatically to match the volume and throughput of incoming data.
Amazon Timestream – a fast, scalable, and serverless time series database service for Internet of Things (IoT) and operational applications. It makes it easy to store and analyze trillions of events per day.
Amazon Managed Grafana – a fully managed service for open-source Grafana, recently launched in 10 AWS Regions. Enhanced with enterprise capabilities, Amazon Managed Grafana makes it easy for you to visualize and analyze your operational data at scale. In this post, we use it to query Amazon Timestream database tables and display beaconing data in a Grafana dashboard.

Cost

The following shows the estimated cost of deploying and running these services for 1 hour in your AWS account:

Amazon API Gateway—no cost for the first one million calls
Amazon Kinesis Data Streams—$0.03
Amazon Kinesis Data Analytics application—$0.22
Amazon Timestream database—$0.55
Amazon Managed Grafana workspace—no cost (90-day free trial, up to five users)

Regions

The chosen Region(s) must support all the preceding services. Currently, these Regions support all services:

US East (Ohio)
US East (N. Virginia)
US West (Oregon)
Europe (Frankfurt)
Europe (Ireland)

Also required is a local system (Windows, Mac, or Linux) capable of running Git, Java, Apache Maven, and Apache Flink to build the Apache Flink application.

Let’s get started!

Step 1: Create an Amazon Kinesis Data stream

a) Open the Amazon Kinesis Console and click Data streams. Click Create data stream.

b) For the Data stream name, type “beacon-blog”.

c) For the Number of open shards, specify “1”.
(For the purpose of this post, we use a single shard. A shard calculator is included in the interface to help determine how many shards a production use case requires.)

d) Click Create data stream.

Step 2: Create an Amazon Timestream database and an Apache Flink application

The Apache Flink application reads data from the Amazon Kinesis Data Stream and writes that data into an Amazon Timestream database table.

The tools required are Git, Java, Apache Maven, and Apache Flink. You can install these tools on Windows, Linux, or Mac. macOS requires the following steps (Linux and Windows differ slightly).

a) Create a database in Amazon Timestream with the name “beacon-blog-db” following the instructions described in Create a database.

b) Create a table in Amazon Timestream with the name “beacon-blog-table” using the instructions described in Create a table. Set Memory store retention to one (1) hour and Magnetic Store retention to one (1) day.

c) On your local system, clone the amazon-timestream-tools GitHub repository by executing the following command:

git clone https://github.com/awslabs/amazon-timestream-tools.git

(Note that according to amazon-timestream-tools/integrations/flink_connector/README.md, Java Development Kit 11 and Apache Maven are required on your local system.)

d) Using your favorite editor, open the following Java source file:

amazon-timestream-
tools/integrations/flink_connector/src/main/java/com/amazonaws/services/kin
esisanalytics/StreamingJob.java

Then, modify the file in four (4) places:

(line 56) Set DEFAULT_STREAM_NAME to “beacon-blog”

(line 105) If your region is not us-east-1, replace “us-east-1” with your region

(line 106) Replace “kdaflink” with “beacon-blog-db”

(line 107) Replace “kinesisdata1” with “beacon-blog-table”

e) Set the JAVA_HOME environment variable to point to the location of Java Development Kit 11 on your local system.

export JAVA_HOME=$(/usr/libexec/java_home)

f) Build the application executing the following commands. The resulting jar file is target/timestreamsink-1.0-SNAPSHOT.jar.

cd amazon-timestream-tools/integrations/flink_connector

mvn clean compile

mvn package

g) Upload the Amazon Kinesis Analytics application binary from target/timestreamsink-1.0-SNAPSHOT.jar following the instructions to Upload the Apache Flink Streaming Java Code.

h) Open the Amazon Kinesis Analytics Console and click Create application.

i) Provide the application details as follows:

Runtime: Apache Flink – Streaming application

Apache Flink version: Apache Flink 1.13

Application name: “beacon-blog”

Description: “beacon blog app”

Access to application resources: Create/Update IAM Role kinesis-analytics-beacon-blog-<region> with required policies

Template for application settings: development

j) Click Create application.

k) Modify the Role created in Step 2i to have access to Amazon Kinesis and Amazon Timestream. Navigate to the AWS IAM Console. Click Roles. Click the kinesis-analytics-beacon-blog-<region>. Click Attach policies and select the following AWS managed policies:

- - AmazonKinesisReadOnlyAccess
  - AmazonTimestreamFullAccess

l) In the Amazon Kinesis Analytics Console, click the beacon-blog. Click Configure:

Amazon S3 bucket: Click Browse and select the circle next to the Amazon S3 bucket containing your jar file from Step 2g. Click Choose.

Path to S3 object: “timestreamsink-1.0-SNAPSHOT.jar”

Access to application resources: Select Create/update IAM role kinesis-analytics-beacon-blog-<region> with required policies.

Leave all other defaults, and click Save Changes.

m) In the Amazon Kinesis Analytics Console, select the beacon-blog application and click Run.

After a few minutes, the status changes to show that the beacon-blog app has successfully started.

Step 3: Create an AWS IAM role and policy for Amazon API Gateway and Amazon Kinesis

In this step, we use AWS Identity and Access Management (AWS IAM) (which lets you securely manage access to AWS services and resources) to create an AWS IAM role and policy allowing Amazon API Gateway to post to Amazon Kinesis.

a) Open the AWS IAM Console, click Policies, and click Create Policy. Paste the following policy, which should be further restricted for production use. Name the policy “beacon-blog-kinesis-policy”.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
           "Effect": "Allow",
            "Action": "kinesis:*",
            "Resource": "*"
        }
    ]
}

b) In the AWS IAM Console, click Roles, and click Create role.

c) Select the API Gateway Click Next: Permission, click Next: Tags, and click Next: Review.

d) For the Role name, type “beacon-blog-APIGatewayKinesis-role” and click Create role.

e) Search for “beacon-blog-APIGatewayKinesis-role”. Click the role, and click Attach policies.

f) Search for “beacon-blog-kinesis-policy”. Select the checkbox next to the policy, and click Attach policy.

g) Under the role summary, note the Role ARN. We use it in Step 4.

Step 4: Create an API Gateway

a) In the Amazon API Gateway Console, click Create API.

b) In API Type REST API, click Import.

c) Download http://d2mee59kmfnfxw.cloudfront.net/2021/OCT21_2021/api-gw-swagger-export.json and save locally.

d) Under Import from Swagger or Open API 3, click Select Swagger File, and choose the api-gw-swagger-export.json file on your local system.

e) The contents of the JSON file appear in the Amazon API Gateway Console window. Replace the “credentials” value on line 30 with the ARN of the IAM Role created in Step 3d. If your region is not us-east-1, replace “us-east-1” in line 32 with your region.

f) Choose Endpoint type “Edge Optimized”, and click Import.

g) In the Resources menu, click POST.

h) On the right, click Integration Request.

i) Scroll down and expand Mapping Templates.

j) Click application/json. The Mapping Template appears. This template converts the data submitted from the webpage in Step 5 into a JSON format required by Amazon Kinesis and Amazon Timestream.

k) Once you have reviewed the Mapping Template, click Cancel.

l) Click Actions, and choose Deploy API.

m) In the Deployment stage pull-down, create a new stage named “Default”. Click

n) Note the Invoke URL that is displayed. You use this in Step 5.

Step 5: Generate data

The HTML page generates data by hosting the video player. For details on how this data transforms for Amazon Timestream use, see Step 4j. The process differs depending on the player used. For most players, the process is as simple as writing some JavaScript code that attaches to player events. There are usually two sets of events to monitor.

HTML5 Video Events:
https://developer.mozilla.org/en-US/docs/Web/HTML/Element/video
Player-Specific Video Events (This example uses the open-source HLS.js player):
https://github.com/video-dev/hls.js/blob/master/docs/API.md

To generate beacon data, follow these steps:

a) In a web browser, navigate to https://d2mee59kmfnfxw.cloudfront.net/2021/OCT21_2021/web2/beacon.html?session=66454545556. Note that the session query string populates a simulated user session variable. The session number shown here (66454545556) matches the session identifier used in the Grafana dashboard queries provided in Step 6.

b) At the top of the webpage, for API Endpoint, paste the Invoke URL from Step 4n.

Note: Do not press Enter/Return – changes will take effect immediately.

c) View the video, and click the pause, volume change, mute, and seek buttons in the on-screen player multiple times. View the near-real-time events and notice that each beacon event is sent to the API invoke URL.

Step 6: Visualize data

a) Navigate to the Amazon Managed Service for Grafana Console.

b) Click Create workspace.

c) For Workspace Name, type “beacon-blog”. Click Next.

d) Under Authentication access, check the box next to AWS Single Sign-On (AWS SSO).

- - For this, you use AWS Single Sign-On (AWS SSO), which lets you centrally manage access to multiple AWS accounts and applications. If you have not activated AWS SSO in your AWS account, a message appears indicating to do so. Click the Create user button and provide an email and name. AWS SSO activates upon user creation.
  - If your AWS account already has AWS SSO activated, go to the AWS SSO Console in a new tab. Click Users, then click Add user. Provide email, username, and required details. Click Next: Groups. Do not add to any groups. Click Add user.

e) For Permission type, choose the default Service managed. Click Next.

f) For IAM Permission Access Settings, choose the default Current account.

g) Under Data Sources, check the box next to Amazon Timestream.

h) We don’t use any Notification channels. Click Next.

i) Review the workspace details, and click Create workspace.

j) Login into your email and locate the AWS Single Sign-On invitation. In the email message, click Accept invitation. Follow the prompts to set a password.

k) In the AWS Console window, scroll down to the Authentication Click Assign new user or group. Select the check box next to the user, and click Assign users and groups.

l) Select the checkbox next to user’s name, and click Make Admin.

m) In the Amazon Managed Service for Grafana Console, select All workspaces in the left-side menu.

n) The Grafana workspace URL displays next to the beacon-blog workspace name. Click the URL, and a new tab opens in your browser.

o) Log in with the user and password that you created in Step 6d.

p) In the left side menu, click the AWS logo, and click AWS Services.

q) Click Timestream. Under Default region, select your AWS region. Click Add data source.

r) Timestream now appears under Provisioned Data Sources. Click Go to Settings (adjacent to Timestream). Under Default Query Macros, select beacon-blog-db and beacon-blog-table. Click Save & test.

s) To import a dashboard, click the + (plus) icon in the left menu bar, and select Import.

t) Download the JSON dashboard file http://d2mee59kmfnfxw.cloudfront.net/2021/OCT21_2021/beacon-dashboard.json and save to your local system. If you are using a region other than us-east-1, modify lines 41, 144, 248, and 352

in the .json file and replace “us-east-1” with your region.

u) Click Upload JSON file and select the beacon-dashboard.json file that was downloaded.

v) Displayed is the dashboard name and unique identifier. Click Import.

w) If desired, adjust the time-frame pull-down as shown below.

The Grafana dashboard features four panels. Each panel shows elapsed time on the x-axis (time 0 represents when the video begins playing) and the number of beacon events on the y-axis. Click on a panel title, and choose Edit to view the panel details, including the associated Amazon Timestream query. This is an example of what you can graph—you can add additional metrics.

The points on the beacon:started panel (upper left) represent the number of viewers who clicked the play button in the video player. The points on the beacon:volume_changed panel (lower left) represent the number of viewers who clicked the volume-up, volume-down, or mute buttons in the video player. The points on the beacon:paused panel (upper right) represent the number of viewers who clicked the pause button in the video player. The points on the beacon:variant_loaded panel (lower right) represent the first media playlist being loaded after the viewer clicks the play button in the video player.

Step 7: Clean up

When testing is complete, to avoid charges, be sure to clean up any created resources by removing the API gateway created through Amazon API Gateway, the Amazon Kinesis Data Stream, the Apache Flink application, the Grafana instance, and the Amazon Timestream tables and databases.

Conclusion

In this post, we created an AWS workflow to track beacon data from a simulated video feed and visualize the data. You can use this data for many purposes. Common use cases are to maintain quality of service for viewers, optimize encoding bitrates based on real-world consumption, diagnose viewer and regional issues quickly, and intelligently direct traffic.

A production workflow would likely reside in multiple Regions and would require optimization of Amazon Kinesis, Amazon API Gateway, and other components to maintain high availability and scalability. Note that creation of client-side code is an example only. A production use case would likely exist as a hardened JavaScript library.

As always, AWS Professional Services—our global team of experts on using AWS—and our AWS Partners are available to assist you in implementing a media analytics solution.

If you have questions, feedback, or would like to get involved in discussions with other community members, visit the AWS Developer Forums: Media Services.

AWS for M&E Blog

Track and visualize streaming­­­ beacon data on AWS