How Telkomsel migrated MyOrbit applications to Amazon EKS
This post was created in collaboration with Lutfi Ichsan Effendi, IT Cloud engineer at Telkomsel.
Telkomsel is a leading digital telecommunication company in Indonesia. Established in 1995, Telkomsel currently has 151 million subscribers with more than 121 million mobile data users. Telkomsel has multiple services from Digital Connectivity, Digital Platform, as well as Digital Services. In the digital connectivity, Telkomsel has large network coverage across Indonesia with 3G, 4G, and 5G services. One of the offerings in the Digital Connectivity landscape is Telkomsel MyOrbit, which is a home internet service that uses a WiFi modem backed by Telkomsel core cellular service.
When MyOrbit service launched back in July 2020, all of the supporting applications (such as customer web portal, order service, provisioning service, payment handling, etc.) were deployed in multiple locations. The supporting application includes:
- Frontend and blog website was running on AWS Singapore Region using Docker Swarm cluster on Amazon Elastic Compute Cloud (Amazon EC2) instances.
- API Gateway was running on a VMware platform in Telkomsel’s data center.
- Backend services was running on OpenShift Container Platform in Telkomsel’s data center.
The following diagram depicts the original architecture:
Telkomsel had some challenges with this setup:
- Time and effort to provision on-premises components, which takes approximately 5 days to complete.
- Silo-ed on-premises and cloud teams driving different cultures and practices.
- Capacity constraint in the on-premises environment that hinders proper non-functional testing (NFT).
Telkomsel decided to modernize MyOrbit architecture as part of Telkomsel’s digital transformation initiative.
Telkomsel decided to standardize their container orchestration platform by deploying containerized frontend and backend workloads to Amazon Elastic Kubernetes Service (Amazon EKS).
The decision to choose Amazon EKS was driven by the needs to have scalable open-standard container orchestration. Before the migration, Telkomsel performed a series of proof-of-concept activities to explore and test Amazon EKS functionality.
For the Kubernetes data plane, Telkomsel chose Amazon EC2 as the Kubernetes worker nodes. Kubernetes has autoscaling functionality that automatically scales resources up or down to meet changing demands. With the Kubernetes Cluster Autoscaler, Telkomsel can optimize the usage of the compute environment based on actual demands. Kubernetes Cluster Autoscaler implementation in AWS utilize Amazon EC2 Auto Scaling Group to manage nodes. It expands an autoscaling group to add Amazon EC2 instance(s) to the cluster when there is a pod that couldn’t be scheduled due to insufficient compute capacity. It also terminates Amazon EC2 instance(s) when utilization is low. Inside the Kubernetes cluster, Telkomsel also uses the Horizontal Pod Autoscaler that automatically scales the number of pods in the deployment based on the CPU usages. It enables the MyOrbit application to scale out to meet increased demands and also to scale in when the resources aren’t required.
When selecting the Amazon EC2 instance type for the Amazon EKS worker nodes, it is very important to consider not only CPU and memory requirement but also network performance and the maximum number of supported pods. Based on the performance testing result, Telkomsel identified the needs of low network latency and jitter as well as consistent 10 Gbps network throughput. Consequently, Telkomsel chose Amazon EC2 with enhanced networking performance and specific instance types with consistent 10 Gbps network performance.
For managing Kubernetes Ingress, Telkomsel chose AWS Load Balancer Controller. AWS Load Balancer Controller will provision the AWS Application Load Balancer (ALB) to satisfy Kubernetes Ingress resource. Using the native AWS ALB solution, Telkomsel received the benefit of scalability as the AWS ALB scaled automatically to handle incoming traffic as well as other benefits (e.g., Transport Layer Security [TLS] offloading, Server Name Indication [SNI], etc).
Telkomsel use Amazon CloudWatch for monitoring the MyOrbit application. They use Amazon CloudWatch metrics to monitor Amazon EKS cluster health and use Container Insights to collect and monitor application’s metrics and logs from all MyOrbit’s microservices. The Amazon CloudWatch dashboard and alarms are used for day-to-day operation and proactive detection of anomaly that happen on the cluster.
MyOrbit still uses a database on-premises to store PII (Personally Identifiable Information) data, which complies with regulation and internal architecture guidance. This decision to create a hybrid architecture requires high performance network connectivity. To achieve consistent performance and low latency access, Telkomsel use AWS Direct Connect Dedicated Connections. Currently, Telkomsel gets approximately 5 ms latency between the AWS Jakarta Region and Telkomsel’s data centers. For connectivity purposes, AWS Transit Gateway is used to connecting MyOrbit VPCs (Virtual Private Cloud) to the other supporting services and connect to the on-premises environment via AWS Direct Connect service.
Telkomsel’s internal architecture guideline provides directions related to cloud implementation:
- Use AWS in Jakarta Region only for its production workloads, which complies with the regulation of data residency and to provide improved latency for its subscribers in Indonesia.
- Separate the workload into multiple environments (such as production, non-production, and development). With the AWS landing zone implementation, Telkomsel use multiple account strategies to separate MyOrbit’s environment. In AWS, Telkomsel is now able to add one more environment just for NFTs.
As part of the migration, Telkomsel decided to use AWS native-managed services for MyOrbit’s integration component to help them address slow environment creation time and high operational overhead. These services included: Amazon Elastic File System, Amazon ElastiCache for Redis, Amazon MQ for RabbitMQ, and others, as depicted in the previous diagram.
Telkomsel migrated applications that previously ran on Docker Swarm, on-premises virtual machines, and on-premises OpenShift Container Platform cluster to Amazon EKS clusters. This section describes the migration approach.
For the application component, Telkomsel replatformed an existing container orchestration to Amazon EKS. To do that, Telkomsel converted the Docker Swarm stack definition and OpenShift YAML manifests to Kubernetes deployment and service manifest. Although OpenShift is compatible with Kubernetes, some syntax modifications were required to convert the OpenShift manifest to a Kubernetes manifest (e.g., convert OpenShift Routes object to Kubernetes Ingress object).
For integration component, Telkomsel also modernized existing Kong API gateway by rehosting it from VMware virtual machines to be Kubernetes deployment in Amazon EKS. Approximately 28 microservices were migrated to the Amazon EKS. Telkomsel use the GitLab Container Registry on-premises to store all container images for all MyOrbit’s microservices.
For the database component, Telkomsel took two approaches: replatform and rehost.
- Telkomsel replatformed their on-premises Redis to Amazon ElastiCache for Redis. MyOrbit application use Redis for session management.
- Telkomsel rehosted both InfluxDB and MongoDB. Amazon EC2 also being used to host InfluxDB and MongoDB. InfluxDB is used to store some monitoring metrics, while MongoDB used to store Whatsapp bot data. At the time this blog is written, Amazon DocumentDB (with MongoDB compatibility) wasn’t yet available in the Jakarta Region, Telkomsel still use self-managed MongoDB on Amazon EC2.
Telkomsel substantially reduced provisioning resources for MyOrbit from 5 days to 13 minutes. They can launch the entire stack of MyOrbit infrastructure, including Amazon EKS cluster, using Terraform in 13 minutes. With the elasticity of the AWS cloud, Telkomsel also has additional environment for NFT purposes, including perform load testing. Application load testing requires large resources and is difficult to do on-premises due to capacity constraint. Having the NFT environment in AWS enhanced software delivery in each sprint cycle. As an additional outcome, Telkomsel reduced the CPU and memory footprint required by 40% while increasing the performance of transaction per second up to 16%. These improvements allowed Telkomsel to reduce their cost per environment. With this success, Telkomsel became more confident utilizing Amazon EKS as their orchestrator for container workloads across the group.
In this post, we showed you how Telkomsel use Amazon EKS to successfully modernized MyOrbit and achieved fast deployment container applications in a secured, highly performant, scalable, and elastic Kubernetes cluster. Amazon EKS is capably running mission-critical containerized application at scale. For more information about Amazon EKS, you can visit Amazon EKS documentation. You can also visit the Amazon EKS Workshop page to get hands-on exercise to learn various capability of Amazon EKS.