Introducing the Amazon GameLift FleetIQ adapter for Agones
Authored by Jeremy Cowan, Principal Specialist SA, Containers, Trevor Roberts, Senior Solutions Architect
Launching a new game title carries a certain amount of risk, requires a fair amount of investment, and might require a lot of compute power. Though exciting as it may be, you don’t always know whether the game will be a runaway hit. The cloud helps mitigate part of that risk because it allows you to convert what was once a capital investment into an operational expense. As a game studio, you’d like to lower your spend as much as possible, i.e. you want to deliver a great player experience at the lowest possible cost, which requires game servers that delivers the most reliability. Amazon EC2 Spot Instances let you take advantage of unused EC2 capacity with ease at at massive scale in the AWS Cloud, while offering a way to lower your EC2 spend by as much as 70% compared to existing on-premises deployments. What if you could minimize the chance that game sessions might end prematurely and make low-cost Spot instances viable for game hosting?
Agones is an open source project for managing the lifecycle of containerized game servers. It is built on top of Kubernetes, a popular open source container orchestration platform. The popularity of Agones is growing among game studio because it includes features that helps them avoid building bespoke solutions for scheduling, scaling, and managing the lifecycle of game servers. By utilizing Agones, studios can spend time on creating a rich player experiences while lowering their production costs.
Amazon GameLift FleetIQ optimizes the use of low-cost Spot Instances for cloud-based game hosting with Amazon EC2. With GameLift FleetIQ, you can work directly with your hosting resources in Amazon EC2 and Auto Scaling while taking advantage of GameLift optimizations to deliver inexpensive, resilient game hosting for your players and makes the use of low-cost Spot Instances viable for game hosting. Developers can access GameLift FleetIQ independent of other GameLift features. By doing so, it enables developers to launch low-cost GameLift servers into their AWS accounts and register the servers with their existing game server management systems to incrementally migrate live games, burst in-game events, or deploy containerized games onto AWS. It introduces the concept of a Game Server Group (GSG) — an EC2 autoscaling group managed by FleetIQ. As part of its configuration, you provide it with a list of preferred instance types. FleetIQ provisions Spot instances from this list and continually monitors the viability of the Spot pools using a predictive algorithm. When FleetIQ predicts that a pool is no longer viable, it replaces the instances provisioned from those pools with viable instances automatically. This reduces odds of game servers being run on non-viable instance types.
FleetIQ includes a couple of key features that make it a good option for running game servers on Spot. First, the APIs, ClaimGameServer and UpdateGameServer, can be used to protect an instance that FleetIQ has flagged as non-viable from being scaled-in. Second, a GSG can be configured to automatically utilize On-Demand if EC2 Spot Instances are not available.
What if I manage my own game servers?
The EC2 Spot team recently released a new EC2 instance rebalance recommendation to alert users in advance of the two-minute notice that the current spot instance may be at risk of interruption. For developers who are already accustomed to working with EC2 APIs, they may choose to work with this feature. Otherwise, the FleetIQ integration handles EC2 Spot interruptions for you and automatically provisions replacement EC2 instances. This way developers can focus on integrating with the Agones APIs while the solution components monitor FleetIQ APIs for EC2 Spot instance viability.
The GameLift FleetIQ adapter for Agones
The ability to run game servers on low cost Spot instances reliably, with fewer game session interruptions, served as the motivation for creating the FleetIQ adapter for Agones. At the same time we also wanted to accommodate studios that wanted to run game servers on Kubernetes with Agones. The adapter combines both by providing a layer that abstracts away the FleetIQ APIs. This allows studios that have chosen Agones and Kubernetes to run their game servers on Spot instances without having to modify the way they build and run their game servers.
How it works
The adapter is comprised of 2 components: a DaemonSet and a pubsub application. The DaemonSet is responsible for interfacing with the FleetIQ APIs. When it starts, it registers the instance with FleetIQ, calls the ClaimGameServer API, and updates the instance’s health status once a minute. This protects the instance from being replaced when FleetIQ predicts that it is no longer viable. The pubsub application calls DescribeGameServerInstances and publishes the status of each instance in a Game Server group to a Redis channel. The DaemonSet subscribes to the channel for the instance that it is running on. When an instance is no longer viable, FleetIQ places it in a draining state. The pubsub application communicates this to the daemonset, causing it to take the following actions:
- The instance on which it is running is cordoned to prevent Kubernetes from scheduling new game servers onto the instance.
- A toleration is added to game server pods with active players. Agones refers to this as an allocated game server.
- A taint is applied to the instance that forces all inactive or unallocated game server pods to be evicted and rescheduled onto other instances in the cluster.
- The DaemonSet continues to poll Agones for the list of allocated game servers running on the instance. When the last allocated game server is shutdown, the DaemonSet de-registers the instance from FleetIQ. De-registration removes the protection and allows FleetIQ to replace the instance with another instance provisioned from a viable Spot pool. If Spot Instances are unavailable, the game server group continues to provide hosting capacity by falling back to On-Demand Instances.
The adapter has a multitude of benefits, including the ability to:
- Allow game studios to lower the cost of launching a new game title by utilizing low cost Spot instances.
- Reduce the risk of using Spot by automatically replacing non-viable instances with instances that are less likely to be interrupted.
- Protect instances with active game sessions from being scaled-in prematurely.
- Abstract away the FleetIQ APIs, allowing game developers to concentrate on a single API for managing game server state.
While the adapter has been designed to support the use of Agones and GameLift FleetIQ, a similar approach could be used to support other game server and compute provisioning layers. This is but one of many options available for hosting containerized game sessions on AWS, such as Game Server Hosting on AWS Fargate.
The team decided to open source the adapter because it aligns with the spirit and openness of Agones and Kubernetes. Additionally, we want to nurture and support the community in running game servers on the AWS cloud. Open sourcing this solution gives the community the ability to contribute, provide feedback, and adapt it to best suit their own purposes. We welcome your ideas and contributions, and we are looking look forward to collaborating with you as we evolve the solution.