How Dream11 built a multi-tenant user authentication service on Amazon EKS
This blog was co-authored by Qamar Ali Shaikh, AVP – Infrastructure & Platform Engineering, Dream11 and
Arvind Sharma, Engineering Manager – Developer Platform , Dream11.
With over 190+ million users, Dream11 is one of the world’s largest fantasy sports platform offering fantasy cricket, football, kabaddi, basketball, hockey, volleyball, handball, rugby, American football and baseball. Dream11 is the flagship brand of Dream Sports, India’s leading sports technology company. It has partnerships with several national and international sports bodies and cricketers.
In this blog post we will explore how Dream11 built a multi-tenant user authentication service on top of Amazon Elastic Kubernetes Service (Amazon EKS). We will also learn about some of the key design benefits for building it, key features offered by the service. and how they achieved outcomes like scalability, isolation and cost efficiency.
What we think about multi-tenancy and authentication at Dream11
Dream11 and its affiliates have been building numerous services and applications. Some of these services and applications require similar functionalities and features. These similarities suggest an opportunity to build reusable platform components. That way, anyone can consume instead of reinventing the wheel by building and managing such elements on their own.
Dream11 realized there is a need for a software architecture pattern where a single instance of a software application serves multiple clients, known as tenants. Each tenant can choose to either operate in isolation from the other tenant by having their own infrastructure and maintaining their data, configurations, user base, and customizations or they could run in shared multi-tenant infrastructure. This architecture is particularly advantageous in cloud computing where resource efficiency and scalability are essential.
The following characteristics of multi-tenancy architecture are Important for Dream11:
- Isolation: Each tenant’s data and configuration are logically separated from other tenants, ensuring privacy and security.
- Customization: Tenants can customize their environment to a certain extent without affecting other tenants.
- Shared resources: Resources such as servers, databases, and networks are shared among multiple tenants, optimizing resource utilization.
- Scalability: Multi-tenant systems can efficiently scale to accommodate new tenants without impacting the existing tenants.
- Cost efficiency: Shared resources and streamlined management contribute to cost savings for providers and tenants.
To read more about this pattern, see the articles Building a Multi-Tenant SaaS Solution Using AWS Serverless Services and Let’s Architect! Desigining architectures for multi-tenancy.”
Dream11 has developed a service using this architectural pattern – a centralized multi-tenant authentication service. This service is secure, dependable, and scalable. It complies with OAuth and OpenID Connect standards. It empowers tenants to validate and confirm their end-user’s identities using provided authentication data. Additionally, the service facilitates retrieving and exchanging fundamental profile details about the end-users. This is achieved in a manner that is both interoperable and reminiscent of REST principles.
Key features of the multi-tenant user authentication service
The following key features are built into the authentication service, which the tenant can use:
- Login with Dream11 – Dream11’s own OIDC flow allows the tenants to authenticate their users with Dream11 and give them a single tap OTP-less (one time password) login/registration mechanism. An inbuilt consent flow further aids this so the user can control what data is shared. This is available for different platforms like Android, iOS, and Web.
- Social logins – Facebook, Google and Apple
- Passwordless authentication – Signing users in without using a password. This is often done with OTPs sent to the end-user’s mobile phone or email.
- Configurable OTPs – Tenants can provide configuration to provide the strength, size and time-out values for the OTPs.
- Simple Login – User id & password-based
Additionally, the multi-tenant authentication service provides following functionalities to manage the tenants via rest API’s:
- Tenant authentication – A way to make sure that the requests flowing in the system are authenticated; that is, tenants are the ones making the requests, and a way for our tenants rotate, disable, and create new credentials to identify themselves.
- Tenant management: A location to store our tenant’s details like point of contact details, tenant’s name, and other tenant meta-data.
- Tenant onboarding – Onboarding of tenants via rest API
- Tenant configuration management– Ability to change and apply tenant configuration in real time via REST API.
How we built a multi-tenant user authentication service
Onboarding a new client is a seamless process where an admin or tenant can provide the configuration to be used by the authentication service to verify the end-user. Additionally, it invokes the tenant’s user information service to retrieve any user-specific details and attributes. This gives a tenant complete control of what kind of information can be returned to the mobile or web clients requesting authentication.
The steps below outline the sequence of events that are involved during the tenant onboarding process:
- The tenant registration service receives a request from either the web-based interface page or CLI application, including the tenant’s onboarding & configuration data.
- Tenant Registration service stores the tenant details and their configuration into a table inside the Amazon Aurora database.
- During the configuration setup, the tenant registration service performs a schema validation of the configuration object to ensure that the tenant provides appropriate configuration.
Here is a sample configuration that a tenant can provide while registering:
The high-level diagram below depicts how clients perform the user login through the Authentication service.
The steps below outline the sequence of events that are involved during the client authentication process:
- The end user initiates a login API call to the authentication services, providing its tenant-id and user credentials or mobile number. This request is sent to a fleet, which acts as a client gateway.
- Kong routes these client requests to the Authentication service running on Amazon EKS.
- Perform authentication
- Authentication service fetches the tenant configuration like OTP TTL, l channel from the Amazon Aurora database if not fetched already and caches it in Amazon ElastiCache.
- An OTP that is uniquely generated by the service for the user is sent to the end-user using the notification service either provided by the platform or by the user.
- The authentication service then performs the necessary verification like OTP verification or password match.
- If the authentication is successful, the authentication service will invoke the tenant’s user service, where the tenant can register/fetch a user or validate the subscription tier and authorization check.
- Perform authorization
a. The final response for the authentication/login request is sent back to the client, The response contains a valid token, and user attributes sent by the tenant’s user service.
b. These tokens are saved in Amazon Relational Database Service (Amazon RDS) and cached in Amazon ElastiCache for subsequent validation.
c. If the client is successfully authenticated and authorized at this stage, it can use the token provided to perform further operations. Otherwise, if authentication fails, the client needs to retry the login operation.
How we built a multi-tenant user authentication service on Amazon Elastic Kubernetes Service
Amazon EKS Architecture
The above architecture supports a highly available, fault-tolerant, and scalable application design using multiple Amazon EKS clusters in a single AWS region. By deploying the multi-tenant authentication service across multiple Amazon EKS clusters within the same region, we will achieve a higher redundancy and resilience than what a single cluster can provide.
We used Amazon Route 53 weighted traffic routing to manage the distribution of incoming requests across the two Amazon EKS clusters. We also utilized External DNS, an open-source Kubernetes add-on to dynamically manage DNS records based on Kubernetes resources. When used in conjunction with AWS Route 53, it automates creating, updating and deleting DNS records based on the state of Kubernetes services, iresources, or other custom DNS annotations. We used internet-facing and internal-facing application load balancers for routing the external and internal traffic, respectively, to the Amazon EKS clusters.
We used open source Istio as it provides the control needed to deploy canary services. A a new version of the authentication service by first testing it using a small percentage of user traffic. If all goes well, increase the new version, in gradual increments, while simultaneously phasing out the old version. If anything goes wrong along the way, we stop and roll back to the previous version of the service.
We used Horizontal Pod Autoscaler (HPA), a Kubernetes feature that automatically scales the number of pod replicas in a deployment or replica set based on observed metrics like CPU utilization or custom metrics. This is particularly useful for applications that experience variable loads.
The external metric we used for scaling the multi-tenant authentication service comes from an internally developed system called Scaler – a one-stop solution for managing our capacity during sporting events every day. Designed by our dedicated capacity and optimization team, Scaler ensures adequate resources during peak traffic times, leaving more room for innovation and creativity elsewhere. Scaler does this by analyzing historical data. The platform correlates from history the expected concurrent accesses and corresponding requests per second (RPS) demands for each microservice. With these insights, the system preemptively prepares infrastructural provisions tailored to meet those demands. All this happens without intervention from service owners or developers. The result? Uninterrupted functionality under extreme usage scenarios.
To read about how we build Scaler, please see the blog article How Dream11 uses an in-house scaling platform to scale resources on AWS for efficiency and reliability.
Amazon EKS, a managed Kubernetes service, provides the availability and scalability of the Kubernetes control plane nodes, handles the scheduling of our workload containers, manages the availability of our applications, stores cluster data, and manages other vital tasks. In addition to these we want to reduce the complexity of cluster management and deploying applications for our developers and teams. To do that, we are also working on building capabilities like:
- Kubernetes Namespace-as-a-Service (k8sNaaS) – The ability to share an Amazon EKS cluster with multiple teams and groups of users in a while thinking about their own exclusive & isolated namespace instead of getting into the complexities of managing workloads in a shared cluster.
- Developer Abstraction – Allowing developers to focus on application logic than the underlying Amazon EKS platform and infrastructure management.
- Data stores – Ability for teams and developers to deploy stateful applications on Amazon EKS.
What’s next for the multi-tenant authentication Service
While Dream11 and several affiliate companies have currently been onboarded successfully to use the central multi-tenant authentication service and have been able to achieve the benefits like isolation, scalability, resource sharing, Dream11 plans to build more features mentioned below:
- Web UI for managing tenants.
- Guest Login – Guest login allows users to interact with services without revealing personal information or going through a registration process.
- Allowing Dream11 tenants to register as OIDC third parties and offer their own OIDC flows to their partners.
- Single Sign On – Ability for users to seamlessly log in between multiple applications (mobile & web).
- MFA – An authentication method that requires the user to provide two or more verification factors to gain access.
- Tenant Analytics and Reporting – An interactive self-service interface to gain insights on spending and billing, usage stats, OTP Statuses, errors, and logs.