How Ankama shifted its analytics into the cloud using Amazon Redshift
The creatively inspired studio behind Dofus and Wakfu explains how it transformed data into insights with new architecture and data warehousing.
The games industry is fierce in its competitiveness and remarkable in its creativity, but few studios are as immersive in their world building across multiple media as Ankama. The studio takes each of these unique universes and tells their stories through various media including video games, books, comics, and boardgames—each medium adding more depth but never telling the same story.
Established in 2001 and headquartered in Roubaix, France, the company has spent the last 25 years using creative and technical skills to make games and universes that remain immersive for fans and captivating for critics. I caught up with Ankama’s Technical Project Manager, Frédéric Ravidat to find out how the company has migrated its data warehouse from Vertica to Amazon Redshift in order to fully realize the potential of data and explore machine learning and analytics.
Ankama specializes in Massively Multiplayer Online Role-Playing games (MMORPGs) that have solid gamer statistics—Dofus alone has attracted more than 40 million global players since its launch in 2004. The independent publisher counts Dofus, Wakfu, and Krosmaga as its leading PC and online game titles with Drag’n’Boom, Dofus Touch, Krosmaster Arena, and Dofus Pogo for mobile devices. The company also has TV shows, board games, and console games to carry these immersive stories even further, each one bringing something new to the universe and to the gamer.
“We use technical skills and stories to create universes that are perfect and immersive for our players and fans.”
The games on offer from Ankama have their own original worlds but they share the same backend. As Ravidat points out, why would you need a different payment platform for each game? In-game purchases and payments are centralized so that payments are simple and cohesive, information is accessible and platforms run smoothly. The studio uses a customer relationship management platform and multiple tech architectures to ensure the threads of each game are woven together. As the studio has evolved, its data management system has undergone several iterations to keep up with changing demands.
The weight of data
Ankama was using MySQL to manage their data flows before moving to data warehousing platform Vertica for improved query time and replication of data. It was a solid system at first, but as the company started to collect game events, which are decisions users make at different points in each of their vast game worlds, the data became extremely heavy on storage.
“Every time a player did something in the game, we stored this action for at least three months, but this wasn’t possible with Vertica as we only had one terabyte (TB) of space,” says Ravidat. Increasing storage in Vertica would have led to a significant increase in costs—costs that could be bypassed using Amazon Redshift as an alternative. Redshift is a managed service that proved to be easier to deploy, maintain, and scale than Vertica, and it aligned with the company’s concurrent migration to AWS.
“To bypass the storage issue, we decided to move to Amazon Redshift,” says Ravidat.
To replicate tables with the previous system, the team at Ankama manually plugged the replication from MySQL into Vertica. When they began using Redshift, non game-related data such as user account or payment information was hosted in Vertica, while game event data lived in Redshift. This data was interpreted using data visualization tool Tableau, which was connected to Vertica for some dashboards and to Redshift for others.
It took around three to four months for the team to adapt the data analytics for use in Redshift, and the process wasn’t that smooth to start with. Data that took Vertica five minutes to analyze took up to six hours in Redshift. Suddenly, the team couldn’t run queries in its new data warehouse.
“I realized I had to adapt the syntax of querying for Redshift and had to invest in best practice optimization for the connections between the tables and for the distribution,” adds Ravidat. “When we finished the optimization, the drop-in query time was dramatic—from two hours to two minutes.”
Changing the duality conversation
For a while, the system worked. Ankama ran two data warehouses—one on Vertica and one on Redshift. It also gave them two servers to pay for each month. The cost of the servers and storage were part of the inspiration that kickstarted Ankama’s full Redshift integration. For Ravidat, it made sense to reduce the costs that were now associated with the two data warehouses, and to minimize issues around storage and analytics.
The main issue that the company struggled with was that more and more of the reporting that required data stored in Redshift also required data stored in Vertica to link and interpret game-related events and non-game event data. This meant that they would have to develop and maintain additional replication tools from Vertica to Redshift, and increased the probability of errors in the process, which reduced confidence in the final data.
The process was accelerated by a partnership Ankama had formed with a company that would feed their data to machine learning algorithms to unpack deeper insights. This partnership decided the move. It was easier to query on Redshift, sending files on Amazon S3, than trying to implement new data flows from Vertica. Redshift included a number of tools that made it simpler to manage queries in the cloud and the entire data migration to the new partner.
“Fun fact: The company that was originally going to use our data to run the machine learning algorithms went bankrupt halfway through the project, but we decided to finish it anyway,” says Ravidat. “We were already motivated to complete the process because we wanted to reduce our costs and move to a more stable platform.”
Not only did the move cut costs in half, but it also gave Ankama fresh perspectives into games and universes that the studio hadn’t previously considered. Suddenly, the studio was paying less for a more stable infrastructure but also using the analytics and insights to boost customer retention and increase the quality of the game content for players. The move to Redshift also allowed for the organization to transform the quality of the data. Now data is cleaner and leaner, giving the studio the opportunity to open doors to new projects and ideas.
Since the migration, Ankama has also managed to reduce its storage space. When the migration first kicked off, the queries were time and space intensive, demanding hefty space on Redshift. But, after a few weeks, Ravidat decided to go for a smaller instance that further saved on costs. Initially Ankama was using 12TB of data, which has dropped down to 4TB in Redshift with the ability to scale back up to 12 as needed. With increased storage capacity, the team’s focus quickly shifted to optimization.
“Now we’re focusing on continual optimization of the queries and managing Redshift so we can best manage the space we have.”
Optimization. Operations. Transformation.
“Now we’re focusing on continual optimization of the queries and managing Redshift so we can best manage the space we have,” he adds. “Our goal is to understand the data coming from the games or from churn, and use this insight to refine future product developments, to understand the data points of origin, and to optimize these flows so they become easier to manage. This level of optimization will allow us to take data from any location and use it for the right applications and situations.”
The overall migration and shift to smaller instances was an easy process for Ravidat and the Ankama team, and has resulted in significant cost savings, deeper investment into the data, and improved access to data. It has also ignited fresh projects, solved some unexpected problems, and inspired Ravidat to dig deeper into tech and architecture to uncover unexpected optimization and best practice pathways. The result is an agile architecture that can scale to demand and that allows for improved usage of data across both S3 and Redshift.
“We have everything we want with the hot data in Redshift and the storage of cold data in S3,” concludes Ravidat. “We’ll continue optimizing the queries and cutting down the response times, and I’ll be paying close attention to the evolution of these technologies in the future.”