Demonstrating Cloud Gaming Concurrency at Scale with Polystream and AWS Game Tech
We invited AWS APN Partner Polystream Platform Architect Scott Perham to write a guest blog. Learn how Polystream deliver 3D interactivity at scale using AWS services including Amazon Elastic Compute Cloud (EC2).
We are witnessing the next evolution of cloud gaming, 3D interactive apps like 3D car configurators and collaborative development using cloud-based tools but there has been no pathway to deliver at scale the same experiences we see in other environments. For example, how does a cloud-first game deliver a Fortnite-like moment of 10 million concurrent users when existing technology currently limits us to a few thousand concurrent users in the cloud?
Success for this type of mass-market content and specifically for cloud gaming it is reliant on being able to deliver mass scale. But today, scale is elusive. Traditional approaches leveraging cloud-based GPUs struggle to deliver the flexibility and elasticity that cloud solutions promise when streaming 3D interactive content and applications. There are not enough of the right GPUs available and the costs are prohibitive. At Polystream with the support of AWS Game Tech we are looking to the future and re-designing how to deliver 3D interactivity at scale.
Current solutions use VMs with GPUs in the cloud to deliver a video to each user. With Polystream Command Streaming technology we are replacing this cloud hardware dependency with a software-defined architecture that delivers graphics commands, not video. Connecting the computing power of the cloud to the billions of GPUs in games consoles, phones and computers, our ground-breaking technology creates scale at a level once thought impossible.
Supporting Polystream’s revolutionary Command Streaming technology is our globally distributed, multi-cloud service Polystream Platform. The Polystream Platform has been architected to take full advantage of our primary differentiator of not being limited to using cloud-based GPUs. It is able to provision, manage and orchestrate huge numbers of interactive streams across any cloud provider anywhere in the world.
To drive this concurrency scaling test, we joined forces with AWS Game Tech who aided the trial and powered the bulk of the streaming compute. They were able to provide the required number of Virtual Machines in Amazon Elastic Cloud Compute (EC2) to support the planned levels of concurrent usage.
In preparation for this large test, we ran a number of smaller tests to iron out any potential provisioning and deployment issues. These tests began at 1,000 CCU’s, scaled to 5,000, then 10,000 and finally the largest test to hit our CCU target.
Our provisioning process involves running pre-built Docker images containing the Terraform runtime which in turn interacts calls AWS provisioning APIs to create the virtual infrastructure and begin the interactive streams setup process. Using this approach, we were able to dynamically scale-out before and during the tests and scale-in afterwards. This meant we could provision a very large number of virtual machines in a very short period in an automated fashion, gave us an element of monitoring and allowed for automatic retries should a particular virtual machine fail during provisioning or setup.
After some successful smaller tests we decided to begin preparation to reach our target synthetic stream concurrency.
The application we decided to use required 2 vCPUs and only a small amount of RAM so we opted for t3.micro SKUs. Previous tests had shown that we were able to run approximately 1,000 synthetic clients on a specially configured virtual machine and therefore we pre-provisioned 40 instances, 4 targeted for each AWS region.
We had hit a couple of configuration issues during some of the smaller tests which caused provisioning within specific regions to be quite slow, so with our initial target of 23,000 concurrent streams in mind, we chose to over-provision and aim for 4,000 virtual machines in each of the 10 AWS regions. This meant that we could still hit our target in a reasonable timeframe even if a couple of the regions provisioned more slowly than others.
We began by queuing the 40,000 provisioning requests and scheduling them to begin processing at 4am. Previous testing had indicated that this would mean we should have a large number of interactive streams available by the time the team got to the office and could begin running synthetic clients as early as possible.
Our aim for the tests was not just to provision lots of infrastructure for a few hours, we wanted to prove that we could actually operate at that number of interactive streams too. To achieve this we started running a few thousand client sessions and after running for a period of time they were terminated before another batch of a few thousand were started again. This ensured that, at scale, our interactive streams recovered from the previous session they had run and were ready to receive new sessions.
Once we were happy that we proved our ability to run and terminate many short lived sessions we started ramping up the synthetic clients for the push towards our target. We began running synthetic clients from around midday and in less than two hours we had almost reached our original target.
We had successfully resolved the configuration issues we had seen in previous tests and as a result, almost every provisioning request had been fulfilled so we had plenty of capacity available to push beyond the target. We had originally defined our testing to end at 5pm, with about an hour to go we had almost entirely saturated our available capacity so decided to queue an additional 2,000 t3.micro requests.
As we reached the deadline we crossed the 40,000 CCU mark. At this point we had not only saturated our interactive stream capacity and were waiting for the new agents to be setup, we had also saturated the machines we had provisioned for our synthetic clients. As the test end time was approaching and we had already surpassed our original target by almost 45%, we decided to start the process of gracefully ending all the client sessions and terminating our virtual machines.
At this point, the streaming sessions that had been started first had been successfully streaming interactive 3D content for over 4 hours which demonstrated that not only could we start tens of thousands of sessions, we could successfully maintain that many streaming sessions for hours.
The overall size and performance of the test was recorded and reported through our business intelligence platform. This meant that we were not only testing the provisioning and streaming capabilities but also load testing our telemetry pipelines and the ancillary services of the platform, as well as integrations with third-party providers.
These pipelines gathered telemetry data and metrics from each interactive stream, supporting services and infrastructure, and routed it to either Logz.io, Grafana cloud or PowerBI and SQL Server in real-time based on routing rules configured for each event raised.
Once all client sessions had been stopped and all telemetry had been received and verified, the interactive streams were terminated.
The testing was originally planned to prove that we could dynamically scale and run huge numbers of interactive 3D streams concurrently. We achieved that goal, but also proved many more things along the way:
- Scaling up to 42,000 interactive streams through the course of a single day and then scaled back to our typical deployment size proved we could utilize truly elastic compute to meet the demands of the test.
- During the tests, many thousands of synthetic sessions were started and stopped, and then restarted to demonstrate that the platform supported a less synthetic user behavior.
- A great deal of sessions were long-running, in some cases nearly 5 hours, showing that we can maintain long sessions at scale.
- Using our own telemetry pipelines and business intelligence tools to prove our concurrency demonstrated our ability to operate at scale.
- All the tests were run on existing platform deployments showing that our control plane can cope with scaling from tens to tens-of-thousands of interactive streams and back again.
- We imposed an arbitrary duration and end time for the test and during the day we didn’t hit a single limitation or bottleneck that suggested we were reaching the limits of what the platform could cope with.
Including the additional time for automated provisioning, the entire test lasted around 12 hours. In that time we provisioned 42,000 interactive streams, ran more than 50,000 interactive streams and reached a peak concurrency of 40,165. We ran interactive streams across 10 AWS regions and around 35 physical data centres, consumed 87,000 CPU cores, streamed 14.5 billion frames and over 15 years of content.
Achieved in one working day from a small office in Guildford by six service engineers.
In conclusion, traditional streaming approaches for 3D interactive content and applications like online gaming have difficulty reaching this level of concurrency. The cost is prohibitive, and even if budgets were limitless, the cloud GPU hardware is just not available to scale. To ensure the future of our cloud gaming for changing experiences and immersive environments we must face the realities of current approaches head-on and redefine scale to incorporate the concepts of concurrency, reach and new experiences. Polystream is focused on building a distributed approach where we use the cloud as it was designed, removing the barriers to explosive scale. Driving towards a future where the possibilities are limitless, leading to games being built in a cloud-first approach. Developing games, environments, and experiences we don’t even have the words for yet.