Ingest and visualize streaming data for games

Analytics and Machine Learning

Game studios are increasingly realizing the value of player and game data. With analytics, you can turn this data into actionable insights to better meet your players’ high expectations for amazing games. This blog post explores two important components of analytics for games—ingestion and visualization—and how they can be implemented using the AWS Game Analytics Pipeline solution.

This solution shows you how to build a serverless data pipeline that helps game developers ingest, store, analyze, and visualize telemetry data generated from games and services. If you’re unfamiliar with this solution and want to learn more about it and how it can be used for different game use cases, check out our blog post, Implement an analytics pipeline for games.

First, it’s important to consider the various approaches to ingesting data into the pipeline before determining the extract, transform, and load (ETL) process performed on the game data and before determining how to store data in the data lake. There are two approaches to ingesting telemetry data into the solution:

Direct integration with Amazon Kinesis Data Streams: Choose this option if you want to publish events from your games and services directly to Amazon Kinesis Data Streams (KDS) without the solution API. This is useful if you’re a mobile game developer or are developing a game without an existing backend.

Proxy integration with the solution API events endpoint: Choose this option if you require custom REST proxy integration for ingestion of game events. Applications send events to the events endpoint that synchronously proxies the request to KDS and returns a response to the client.

To integrate clients directly with KDS, configure Amazon Cognito identity pools (federated users) to generate temporary AWS credentials that authorize clients to securely interact with AWS services, including KDS. If you choose to integrate directly with KDS, see the Game Analytics Pipeline AWS Developer Guide to review the format required for sending data records to KDS. Alternatively, you can integrate with the solution API events endpoint to abstract your backend implementation from the client with a custom REST interface. This is also helpful if you require additional customization of ingested data.

If you operate a backend for your game, such as a game server or other application backend, use Kinesis Agent, Amazon Kinesis Producer Library (KPL), an AWS SDK, or another supported integration to send data directly to KDS from your backend. With this approach, game clients and other applications benefit from reusing an existing client-server connection and authentication in order to send telemetry events to your backend. Your game backend can be configured to ingest events and send them to KDS. Additionally, this approach can be used in situations where you want to minimize changes to client integrations or implement high throughput use cases.

By collecting and aggregating events from multiple clients within your backend, you can increase overall batching and ingestion throughput. You can also perform data enrichment with additional context before sending data to KDS. This can reduce costs, improve security, and simplify client integration for games with existing backends. Many of the existing KDS options provide automated retries and error handling, among other built-in functions. The KPL and AWS SDKs are commonly used to develop custom data producers. The Kinesis Agent can be deployed onto your game servers to process telemetry events in log files and send them to KDS.

Now that you understand the different ways to ingest data, let’s explore how ingestion is done on the backend.

The following is an architecture diagram that focuses on how ingestion is done with the Game Analytics Pipeline:

Figure 1: Reference architecture for data ingestion using the AWS Game Analytics Pipeline solution

Data is ingested either via direct integration with KDS or collected using the solution’s REST API. Kinesis Data Firehose then consumes the event data in real time and triggers an AWS Lambda function to preprocess event data. The Lambda function validates, transforms, and processes game events from Kinesis Data Firehose before events are loaded into Amazon Simple Storage Service (Amazon S3). This function is invoked with a batch of input event records and performs validation and processing before returning transformed records back to Kinesis Data Firehose for delivery to Amazon S3. The function can be modified to add additional data processing as needed. Kinesis Data Firehose uses the AWS Glue Data Catalog to validate schema and compress and load the data in Apache Parquet file format for optimized query performance.

Once the data is ingested, you can query and visualize it for both batch and real-time insights. The following diagram illustrates how you can achieve insights for batched data:

Figure 2: Reference architecture for real-time data visualization using AWS

All of your data is stored in an Amazon S3 data lake and collected over time. Storing your data in a centralized repository makes it easy to get insights on your data as it grows as it becomes historical. The AWS Glue Data Catalog is used to provide metadata storage of game events. It also integrates nicely with other AWS services, in addition to third-party tools, giving you the flexibility to extend this pipeline. Amazon Athena enables you to run queries and reports on the game events data stored in Amazon S3. The solution also comes with a set of pre-built, saved queries that enable you to explore game events data. These saved queries have been built for common use cases, like querying for daily active users (DAU), understanding level completion rates, and discovering how many new players you had last month.

Finally, Amazon QuickSight is a business intelligence service that enables you to create, configure, and customize dashboards for deep data exploration. The following diagram illustrates a custom reporting dashboard that you can set up using data from the Game Analytics Pipeline, including insights like level completion rate trend, tutorial progression, and more:

Figure 3: Screenshot of the Game Analytics Pipeline Reporting Dashboard

Now that you’ve generated reports and visualizations from your batch processed data, you can gain insights from real-time data. The following diagram illustrates how this can be achieved:

Figure 4: Reference architecture for real-time analytics on AWS

The Game Analytics Pipeline solution provides a real-time streaming analytics application. This enables developers to use raw application event data to generate custom metrics and identify key performance indicators (KPIs). With this solution, you can also filter events using custom SQL, track usage behavior, and aggregate metrics to power live dashboards. The application uses Amazon Kinesis Data Analytics for SQL Applications to process Kinesis stream data and an AWS Lambda function to process analytics outputs. The AWS Lambda function publishes metrics to Amazon CloudWatch for metrics storage and monitoring. And it’s integrated with Amazon Simple Notification Service (Amazon SNS) for notifications and alerts.

The solution uses Amazon CloudWatch to monitor and log resources and store real-time metrics from Kinesis Data Analytics. Then, the solution deploys CloudWatch alarms to track the usage of AWS resources and alert subscribed administrators when issues are detected. By sending metrics to CloudWatch, the solution can rely on a single storage location for both real-time and AWS resource metrics.

The following is a screenshot of the Game Analytics Pipeline operational health dashboard that’s deployed with the solution:

Figure 5: Screenshot of Game Analytics Pipeline operational health dashboard

This dashboard pulls metrics from AWS resources used to power the pipeline. You can send your own custom metrics from the Kinesis Data Analytics application and create your own custom dashboards. Learn how to create a custom dashboard by taking our Visualizing with Amazon QuickSight course.

Read the Implement an analytics pipeline for games blog post.

About the Authors

Gena Gizzi is a Games SA for AWS located in Southern California. She helps games customers build, launch, and scale their games and businesses on AWS. She has a focus on analytics for games and helps customers gain insights from their data. Some of Gena’s favorite games include Breath of the Wild, Pokémon, and Minecraft.

Greg Cheng is a Solutions Architect at AWS where he helps customers navigate the AWS platform throughout their cloud journey by providing well-architected best practices and guidance. He is focused on analytics and serverless technologies, and his goal is to enable games customers to provide seamless multiplayer experiences with low latency and no downtime. In his spare time, he enjoys playing Super Smash Bros Melee, competitive online multiplayer games, boardgames, cooking, and reading.

Dominic Mills is a Games Solutions Architect at AWS. He helps video game developers of all sizes utilize AWS to develop, build and deploy games, with a particular focus on analytics and game production in the cloud. He has a deep obsession with dungeon crawlers and any game with roguelite elements.

AWS for Games Blog

Ingest and visualize streaming data for games

About the Authors

Resources

Follow