Scaling for success: WinZO’s event-driven automation for orchestrating AWS resources

Introduction

WinZO is a multi-lingual skill-gaming platform in India that aims to build a tech-driven gaming ecosystem and act as a one-stop-shop for online gamers in the country.

WinZO provides customers a combination of entertainment and fun, and an arena for users to test their gaming skills and interact with game lovers. The company partners with third-party developers to host games on WinZO, where users enjoy personalized multiplayer gameplay experiences. The company has onboarded more than 100 games across six formats that are culturally relevant to India. It consistently delivers a complete entertainment package via interactive engagement features. The platform is available in 12 languages including English, Hindi, Gujarati, Marathi, Bengali, and Bhojpuri, and a majority (up to 80%) of its player base play games in their vernacular languages.

The platform’s monetization is driven by a micro-transaction model, where users engage in small-ticket transactions within the WinZO ecosystem during gameplay. WinZO gamers use digital payment methods, predominantly Unified Payments Interface (UPI), to deposit funds in their WinZO wallet, which in turn is used to make in-game micro-transactions. WinZO has surpassed 175 million players, and games trigger over four billion micro-transactions each month.

WinZO’s differentiation in the gaming ecosystem is its support to game developers in resolving challenging tech issues and developing a competitive gaming layer. It also offers additional support in areas such as payments, customer service, and outreach, all of which are integrated with skill-based matchmaking capabilities, a crucial draw factor for games played in vernacular languages. Game developers register for the platform through a self-service console and can immediately begin monetizing their games. Game developers may also use the WinZO platform to gain access to technology like machine learning and big data analytics to improve their games.

To keep millions of gaming enthusiasts engaged on the platform, operators need to pre-provision and scale their infrastructure to meet spikes in traffic. With an optimal and efficient system architecture coupled with the use of cloud-native features like provisioning and scaling powered by Amazon Web Services (AWS), event-driven automation provides DevOps engineers peace of mind during important events.

To help emphasize the point, the WinZO Team state, “For online gaming companies, it is important to effectively manage high traffic volumes and hence infrastructure becomes crucial. While dedicated pre-provisioned capacity is a reliable approach to preventing adverse performance impacts, it comes with an associated additional cost. Striking the right balance is key to maintaining optimal performance and keeping the cloud infrastructure cost-optimized.”

In this blog post, we describe the challenges faced by the WinZO team with traditional scaling practices, review their approach to address those challenges, and then delve into the architecture of their custom automated scaling solution that leverages Serverless on AWS services and patterns, and is supplemented by AWS Enterprise Support for consultative architectural and operational guidance for their critical real-world events.

Traditional scaling practices and challenges

Managing multiple rapid surges of requests within sub-minute time periods presents notable hurdles for conventional auto-scaling methods. The key challenges include:

1. Granularity constraints for scaling policies: Traditional autoscaling encounters difficulties with a minimum granularity requirement of at least 1 minute for critical metrics like CPU and memory. This constraint impedes the ability to promptly activate scaling policies in response to sudden spikes in demand before that 1-minute cooling period between two scaling events.

2. Server provisioning and container start-up duration: The process of provisioning servers or hosts introduces inherent time delays. Additionally, if the system runs on containers, the start-up time and the warm-up period required for the application to become fully operational compound challenges in swiftly addressing increased request loads within really short periods of time.

3. Load balancer pre-warming Necessity: Load balancers require pre-provisioning to manage sudden traffic surges and can be a complex estimation challenge to provision correctly, especially for impromptu events like tournaments and matches.

4. Effective capacity planning: For any on-demand capacity that needs to be provisioned during higher-than-projected loads, which may often occur during tournaments, it is always advisable to design the systems as per the recommendations in the architectural and operational best practices to avoid unexpected issues that can impact the business. While planning for such events, there are options that help handle the unique workload requirements of the games tech industry.

Strategies for architecting the solution

As WinZO’s workloads use containers and microservice architecture, the team’s strategy involves pre-provisioning resources for each microservice to match anticipated levels of load, running automated load and stress tests that mimic real-world peak loads to ascertain system resiliency during peak periods within the containerized environment, and scaling down the containerized microservices after the load settles. This approach to dynamically managing containerized microservices has proven to be a cost-effective way to reduce overall system costs without impacting performance.

The following goals were established by WinZO when creating a solution for the aforementioned challenges:

Holistic scaling beyond Amazon Elastic Containers Service and Amazon EC2: Extend the solution’s capability to scale beyond Amazon Elastic Container Service containers, and Amazon EC2 auto-scaling, to encompass other backend components used by the system like Amazon DynamoDB provisioned read/write throughput and Amazon Relational Database Service (Amazon RDS) provisioned IOPS.
Config-driven microservices scaling: Use internal consumption data to identify microservices that need scaling and accurately quantify the need through historical trends data. Apply artificial (AI) and machine learning (ML)-based data-driven approaches to compute scaling settings for the various microservices.
Automate pre-provisioning for AWS infrastructure: Develop automation that uses a real-world match schedule to pre-provision AWS infrastructure (Amazon EC2) and scale up/out the required microservices a few hours before the match starts.
Load balancer pre-warming: Pre-warm the Elastic Load Balancers by generating a gradual synthetic load on the appropriate load balancers that need to be scaled prior to the match in order to scale them organically.
Customizable solution for future use cases: Design the solution to be configurable for future use cases such as marketing campaigns, ensuring flexibility and adaptability. A simple change to the config name for the API allows for automated scaling of different sets of services, such as ECS, DynamoDB, RDS, Amazon ElastiCache, and other layered AWS services.

ZOScaler: The solution

With these objectives in focus, WinZO developed a serverless event-driven orchestration system called the “ZOScaler“, using AWS Lambda functions, Amazon CloudWatch Events, ECS application autoscaling, and Amazon DynamoDB. This system ensures precise resource scaling, caters to dynamic resource demands, and orchestrates efficient pre-warming processes. The ‘ZOScaler’ revolves around three core Lambda functions, each assigned specific tasks.

WinZO’s internal service hosted on Amazon ECS initiates an event 2-3 hours prior to a crucial real-world match, sending a message to an Amazon Simple Notification Service (Amazon SNS) topic, which in turn triggers the ‘Scale-Up’ AWS Lambda function.

1. Scale-Up Lambda:

a. Checks for any ongoing events in the Amazon DynamoDB State table; if no current event is found, the function proceeds

b. Utilizes a scale-up configuration defined for each microservice, specifying scaling factors based on internal computed data

c. Calculates new min/max values using the scale-up function and invokes the ECS application’s autoscaling APIs to apply the configuration

d. Stores the new and old min/max values in DynamoDB for each unique scale-up event

e. Creates a CloudWatch Cron event rule to trigger the ‘Scale-Up-Notifier’ function every 2 minutes that checks and confirms the completion of scale-up of all the required ECS microservices

2. Scale-Up Notifier Lambda:

a. Every 2 minutes, checks the status of all the tasks running, until the time that the current count of tasks running exceeds the minimum tasks count

b. Upon confirming scale-up completion of all the microservices, it notifies stakeholders via Slack and also triggers an internal load test to synthetically and gradually prewarm the appropriate load balancers

c. Finally, with its job completed, it disables the ‘every 2 minutes’ triggering CloudWatch rule and enables the scale-down rule based on match end time

3. Scale-Down Function:

a. The CloudWatch cron rule triggered at match-end time, invokes this function, which, based on the State info in DynamoDB, calls ECS’ autoscaling APIs to revert min/max values to their pre-scale-up numbers

b. Notifies stakeholders on Slack and deletes the event-specific scale-down CloudWatch event rule

4. This automation is integrated with Atlassian’s Opsgenie, leveraging its alerting and monitoring capabilities. In the face of scale-up challenges, Opsgenie promptly issues alerts, facilitating rapid identification and resolution of potential issues, which significantly boosts the overall reliability and efficiency of the scaling processes.

ZOScalar automation achievements in 2024

ZOScaler achieved its objectives by maintaining uninterrupted service during peak traffic hours and optimizing resource usage, resulting in substantial cost savings of up to 70% for computing resources. Additionally, this automation showcased exceptional resilience by handling a 5x surge in overall traffic and a 10x increase on WinZO’s high throughput service, which provided ample proof of its ability and reusability. Further, as a customizable solution, it can apply to numerous other use cases for efficient scaling. It also streamlined the cloud infrastructure provisioning process and instilled confidence in WinZO’s DevOps team to deliver an exceptional user experience even in demanding scenarios.

AWS Countdown

An AWS offering that complements ZOScalar’s capabilities is AWS Countdown, a service designed for various cloud use cases, including migrations, modernizations, product launches, streaming, and go-live events. It’s an excellent service for preparing your infrastructure for high-traffic events, such as game launches, in-game events, promotions, and planned activities that can generate bursts of load on your game systems.

AWS Countdown helps assess operational readiness, identify risks, and plan capacity using proven playbooks. The Premium tier offers critical support across all phases of cloud projects, including design and post-launch retrospectives. Designated engineers provide proactive guidance and troubleshooting, participating in critical event calls for rapid issue resolution. This tier accelerates migrations, modernizations, and high-impact go-live events, enabling you to achieve business goals. It’s available for Business Support, Enterprise On-Ramp, and Enterprise Support customers as a monthly subscription for an additional fee.

Conclusion

Overall, ZOScaler, the cloud-native orchestrator, has been time-tested during multiple peak events and has demonstrated its value by ensuring 100% availability of WinZO’s platform, helping the company gain, engage, and retain players effectively. Built using AWS native serverless services and patterns, it has ensured 100% availability of WinZO’s crucial microservices and cut down its compute resource costs by up to 70%. This achievement underscores the strength of WinZO’s automated scaling strategy for critical events and its efficiency in maximizing resource usage. As a lighthouse project, it showcases the ability to effortlessly scale up for future high-traffic situations. The continuous support and guidance from the AWS team, coupled with AWS Countdown, has bolstered confidence. Looking ahead, WinZO is prepared to replicate this success and address similar challenges confidently through its AWS Enterprise Support partnership.

About WinZO

WinZO is the largest interactive entertainment platform in India. Launched in early 2018, the company partners with third-party developers to host games on its app, where users can enjoy personalized multiplayer gameplay experiences. The platform claims more than 175 million registered users and is available in 12 languages such as English, Hindi, Gujarati, Marathi, Bengali, and Bhojpuri. The WinZO platform facilitates over 4 billion micro-transactions per month across a portfolio of 100+ games. The company envisions a future where the WinZO platform delivers culturally relevant and enjoyable experiences in the Indian gaming ecosystem, monetized through a unique micro-transaction model. WinZO, a series-C-funded venture, has raised $100 million from Marquee gaming and entertainment investors such as Griffin Gaming partners, Courtside Ventures, and Maker’s Fund, all of which made their first investments in the Indian start-up ecosystem through WinZO.