AWS Game Tech Blog

How Backtrace streamlines crash reporting with Amazon Elastic Cloud Compute and Amazon Elastic Block Store

Visual representation of Backtrace.io workflow

For many developers, resolving application, server, and game errors can feel like a detective solving a crime. The process of identifying, triaging, and finding root causes is long and littered with obstacles. If players find a bug before you push a fix, it could ruin their experience, and ultimately impact the success of your game.

Backtrace is a comprehensive error management suite that helps games ensure highly available, bug free experiences. It helps game developers catch and resolve software quality issues including errors, crashes, and hangs across a wide range of platforms including game consoles, desktops, mobile, servers, and more. With a roster of AAA customers, many with hundreds of millions of active players, Backtrace needs a highly resilient infrastructure that is constantly available for its customers should they see a problem. Samy Al Bahra, CTO and co-founder of Backtrace, shares his experiences migrating to AWS as the core cloud platform for their mission-critical infrastructure.

“From the beginning, Backtrace has been cloud agnostic in order to support our customers who could be hosted anywhere. As we continue to scale, our SLAs become more stringent. We decided to evaluate new ways to exceed our customers’ expectations and demand as we continue to scale,” Bahra shares.

Capturing and analyzing large volumes of data

Backtrace captures detailed dumps of failed application state from customer applications, automates data analysis, and highlights important classifiers. This enables engineers to quickly identify root causes and execute a fix. “We receive errors and dumps from hundreds of millions of users and devices. You can only imagine the volumes we see for a AAA game. We are receiving objects over the network ranging from hundreds of kilobytes to gigabytes for any given game,” Bahra says.

Handling that amount of data could be challenging for any application. For high performance games, where players want to immediately jump into the action, tolerance to faults and patience for resolution is low. Bahra explains, “We’re dealing with millions of errors a day from a single customer. The errors themselves can be fairly large. We must be able to capture the context of the entire application if needed to help our customers find the root cause. That means we need to be able to scale up to gigabytes per object, per error.”

Quickly determining how to resolve crashes and errors

Capturing the data is only one piece of the puzzle. Bahra continues, “That data needs to be written to disk extremely quickly and be readily available, so it can be read immediately by games engineers who need to find and resolve issues before they negatively impact players.”

Speed is also integral for Backtrace. Bahra explains, “When a customer’s infrastructure fails, they look to Backtrace to find the root cause. We need the ability to implement fast failover in order to provide high uptime and availability of our service to customers.”

As the ultimate driver for the product, speed acts as a guiding principle for Bahra and his team. “We have two ultimate missions for Backtrace: how quickly can a customer be notified of a major failure occurring; and when an issue is assigned to an engineer, how quickly can they push out a bug fix,” he says.

To deliver on this mission-critical commitment to games customers, the team performed an extensive evaluation to find a solution that would meet its security, scale, performance, and resilience needs. Backtrace eventually chose Amazon Elastic Cloud Compute (Amazon EC2) and Amazon Elastic Block Store (Amazon EBS).

Consistency is key

“We used several benchmarks to test AWS against other providers including CPU performance, storage performance, and cost,” recalls Bahra. “We preloaded instances with our largest dataset and tested query and processing benchmarks across the host. The team found that AWS was the most consistent with regards to performance, helping us build better models and provide more stringent guarantees for our customers. With Reserved Instances (RI) and saving plans, it’s also the most cost competitive.”

“Consistent performance is key for us,” Bahra continues. “We also performed a series of storage I/O benchmarks. We chose EBS because it was much more reliable with respect to performance. There was a lot less deviation in performance throughout the benchmarks.”

Migrating to AWS took the team just six weeks to complete. Today Backtrace uses Amazon EC2, Amazon EBS and Amazon Route 53 for its core real-time analysis pipeline, long-term storage, and dynamic provisioning infrastructure.

Backtrace has seen significant reductions in cost since moving to AWS, as well as improved reliability and performance. “The migration to AWS was very smooth,” says Bahra. “Using Amazon EC2 and Amazon EBS, we’ve seen our operational costs reduce by 30% . The end-user experience and our ability to scale quickly to changes in our workload have also improved,” Bahra continues. “With the various programs AWS offers to startup companies, our first year costs were also reduced by 90%”.

The migration to AWS has delivered other benefits as well. “The suite of security features and convenient scheduled snapshots provided by Amazon EBS help us to significantly reduce the operational burden of bringing security and storage back-ups. Our engineers can sleep easier at night knowing these systems just work,” Bahra concludes.

Visit Backtrace.io to learn more about its gaming solution.

Visit AWS Game Tech to learn how you can build faster, operate smarter, and create computationally ridiculous games with AWS.