Game Server Observability with Amazon GameLift and Amazon CloudWatch
When you’re running game servers to host session-based games for your players globally, it’s important to have as much visibility as possible on what’s happening within those game server processes. This includes collecting metrics and logs in realtime, and capabilities to get insights on this data to investigate issues and find opportunities for performance improvement.
Amazon GameLift is a dedicated game server hosting solution that deploys, operates, and scales cloud servers for multiplayer games. GameLift hosting allows you to focus on your game, while GameLift manages the deployment and scaling of your game servers, as well as other tasks such as game session placement and matchmaking. GameLift natively provides you rich metrics on the GameLift resources such as fleets, queues, game sessions, player sessions, and matchmaking. These metrics are extremely useful when visualizing the performance of your game, as well as for improving your configuration to make sure players are matched effectively, and that you use the right amount of resources. We have another blog post for details on how to optimize GameLift fleets with CloudWatch.
But, on top of these metrics you want to have visibility on the game server processes themselves. Three typical ways of achieving this include:
- Collecting realtime log output that can be effectively queried and extracted to metrics and alarms
- Collecting realtime metrics on the game server process level (such as cpu usage, memory usage and disk reads/writes)
- Collecting custom metrics directly from your game server process (such as player actions, dropped connections etc.)
In this blog post we’ll cover how to leverage Amazon CloudWatch Agent to collect all of this data and then query and visualize it with CloudWatch. Even though we’re focusing on GameLift hosting, all of the covered topics will apply to any custom EC2 based game server hosting as well.
Throughout the post, we will be referencing an example solution on GitHub, that showcases an end-to-end GameLift integration with Unity and C++. It also has all of the mentioned observability options implemented. See this blog post for a thorough overview on the C++ solution.
Setting up the Amazon CloudWatch agent for GameLift
To enable sending custom logs and metrics to CloudWatch, we need to install and configure the CloudWatch Agent. For GameLift we will do this in the install-script, which will be named
install.sh for our Linux servers. The example solution shows how to configure the agent through CLI commands in
You will notice that as part of the configuration, we’re referencing another file called
amazon-cloudwatch-agent.json. We will cover different parts of this file in the different topics below, and you can find the full configuration in the LinuxServerBuild folder in the sample.
A key configuration specific to Amazon GameLift is that we need to define an IAM Role for the GameLift Fleet, and this role needs access to CloudWatch Logs and CloudWatch Metrics. You can assign a managed policy called
CloudWatchAgentServerPolicy to the IAM role you create, and then assign this role to the GameLift Fleets you create.
We need to separately tell the CloudWatch Agent to assume this IAM Role, since the instance is running on GameLift service accounts. To do this, we define the following agent configuration in the agent configuration json file:
As you can see, we also define the agent output log file as well as the metrics collection interval in seconds.
Collecting and searching logs with CloudWatch Logs
Now that we have the agent fully configured, we need to define what log files it sends to CloudWatch Logs. This can be achieved with the logs section in the configuration file. In our example we’re running two game server processes on each GameLift instance, and they both create their own log files. To configure these log files to be sent to their own log streams, we need to define them separately. We will configure these same output files in the processes themselves. In the example we’re using the time zone local to the game servers, but you could set the game server logs to UTC as well.
When you have the solution up and running and navigate to CloudWatch Log Groups, you will find individual log streams for all game server processes in the
GameServerLogs Log Group.
Collecting game server process metrics
Next we will want to start collecting metrics from all the individual game server processes separately. You can run up to 50 game server processes on a single GameLift instance, so getting insights from them separately will help a lot in monitoring and debugging.
The CloudWatch Agent supports procstat -metrics for your processes. This includes extensive metrics support for things like memory, cpu, and disk usage. You can configure which metrics you want to monitor, and only selecting the ones you need will help control the CloudWatch Custom metrics costs. We need to define which processes we want to monitor, and we’ve done this in the example by filtering based on the launch command. In my case, the filter values are “-port 1935” and “-port 7777” which will give me the metrics from the two game server processes, that have these port configurations passed to them. Using this filter gives us nice dimensions in CloudWatch as well, as we can see the DNS name and the port, which identifies the processes. It’s worth noting that as the game server processes are run with sudo command on GameLift, this configuration tracks the sudo process metrics separately. Below you can see the procstat configuration, which is placed within the metrics settings in the configuration file.
Collecting custom metrics from the game server process
In addition to the process level metrics, you often want to track completely custom application metrics, specific to your game. These metrics can only be tracked from the game server process itself, and can include things like player actions, game session events, and other game specific things.
The CloudWatch Agent supports these custom metrics through a StatsD agent. StatsD is a simple and lightweight daemon for collecting application level metrics. It listens to metric events on a UDP port, and your game server processes can use any StatsD client, or a custom implementation to send the events. This also decouples the metrics sending from your game server process to minimize any performance impact. To configure the StatsD agent, we only need to add the following configuration within “metrics” in the CloudWatch Agent configuration file. It defines to port we’re listening on, as well as the aggregation and collection intervals.
In the example solution, we have StatsD custom metrics implemented in the Unity server. The project has a simple custom StatsD client, which sends metrics using the tags -feature of StatsD, which will be mapped to dimensions on CloudWatch. This way we can identify the metrics to a specific game session. You will find the implementation in
Assets/Scripts/Server/SimpleStatsdClient.cs. Feel free to use any existing StatsD client of your choice in case you don’t want to implement your own.
We are using both counters and gauges in some of the sample metrics we’re sending. A gauge sends the exact value of a metric, where a counter adds to the total value of a metric.
An example gauge is the amount of connected game clients:
And an example counter would be game sessions started:
These metrics can then be visualized in CloudWatch just like any other metric. They are found in CloudWatch under
CWAgent, and based on the tags/dimensions we’re using, they are grouped by
game session. You can use the tags to define any dimensions of your choice as well.
The image below shows the tracked sample custom metrics for a single game session. As we’re tracking player position updates (which is a relatively high value), that dominates the graph. But removing that would show a scale more suitable for the other metrics.
Visualizing game server process and custom metrics with CloudWatch Dashboards
While you can always search and query the CloudWatch metrics from the management console, sometimes you want to build more permanent views on those metrics. This can be achieved with CloudWatch Dashboards. You can add any custom graph to a CloudWatch Dashboard and manage the configuration for these visualizations. See the documentation for full details on the features.
In the dashboard below, you can see the CPU and memory usage tracked for two different locations for our GameLift Fleet. You can also see a small spike in the CPU and memory usage for one of the processes in the Ireland fleet when a game session was played. As you can see, you can add metrics from multiple AWS regions to a single dashboard to visualize your whole global fleet of game servers at once.
We’ve shown how to get insights into your game server processes running on Amazon GameLift using the features of Amazon CloudWatch. The same tools and principles apply also to any custom game server hosting on EC2. These insights into log and monitoring data will help you identify any issues or bottlenecks in your game servers, understand the performance of those game servers, and get business insights into game session details and player activity.
AWS provides a whole portfolio of services for observability. You can extend your monitoring capabilities with services such Amazon OpenSearch Service, Amazon Managed Service for Prometheus, and Amazon Managed Service for Grafana. You can also use the Amazon Athena CloudWatch Connector to query insights from your metrics with Amazon Athena, and then further visualize that data with Amazon QuickSight. And with a wide range of additional 3rd party solutions from AWS Partners, whatever your observability tooling of choice is, we got you covered.