Migrating from self-managed Kubernetes to Amazon EKS? Here are some key considerations

Overview

We talk to customers every day who are either planning a migration to Amazon Elastic Kubernetes Service (Amazon EKS) or who are in the middle of a migration to Amazon EKS. These customers may start with a self-managed Kubernetes deployment but as Kubernetes footprints scale up, it becomes quite cumbersome to manage a Kubernetes platform. At a certain scale, self-managing Kubernetes adds significant operational overhead, which is undifferentiated and takes time and resources away from core business applications. For this reason, many of our customers have chosen to offload the undifferentiated heavy lifting of managing Kubernetes in production to AWS. A few of the reasons why we see our customers moving to Amazon EKS are:

A managed control plane: Maintaining a highly available control plane for production applications is a daunting task. As their Kubernetes workloads have scaled up, customers find that the effort with managing their control plane is keeping them from getting through a backlog of feature requests for their business applications.
Security and compliance: Several customers operate in regulated environments and are required to certify against security and compliance standards such as FedRAMP, HIPAA, PCI, and more. EKS is fully certified and compliant on these regulatory standards and customers don’t need to certify their control plane against these benchmarks.
Cost: Amazon EKS customers only pay for the control plane at the rate of $0.10 (USD) per cluster hour. This is typically less costly than standing up and running a highly available and resilient Kubernetes control plane on EC2 for production workloads.
Scalability: Leveraging the global scale of AWS, customers can deploy applications and rapidly scale up to meet their workload requirements. In addition, as your cluster grows, the EKS control plane will dynamically scale up to meet the workload requirements.
Compute options: EKS offers two different deployment models for managed worker nodes. Users can choose managed node groups, where AWS manages the lifecycle of worker nodes in the customer’s VPC managed, or AWS Fargate, the serverless deployment option for containers. EKS managed node groups natively support a variety of cost-optimized compute options including Spot, Graviton, and AMD based instances. Managed node groups also provide customers the ability to choose the most cost-optimized compute for each workload.
Reliability & Availability: Amazon EKS offers a 99.95% uptime SLA. In order to do this, EKS runs the Kubernetes control plane across multiple AWS Availability Zones, automatically detects and replaces unhealthy control plane nodes, and provides on-demand, zero downtime upgrades and patching.

Okay, maybe you know this already and are ready to move to Amazon EKS. Migrating to Amazon EKS is relatively straightforward. Amazon EKS is fully upstream Kubernetes, so if your application works with Kubernetes, it should work on Amazon EKS. However, there are design and implementation differences that you will need to review and plan in advance.

Plan

In addition to planning out the migration project and downstream dependencies, there are a couple of technical aspects customers should plan for in order to avoid surprises during the migration. EKS uses upstream Kubernetes, therefore, applications that run on Kubernetes should natively run on EKS without the need for modification. Here are some key technical considerations to evaluate during this migration.

Kubernetes versions: To avoid any API inconsistencies during the migration, plan to migrate workloads to the same version of upstream Kubernetes in EKS. If you are on a version of Kubernetes not currently supported by EKS, updating your existing clusters to the most current version of Kubernetes supported by EKS may be time consuming and not necessary. In this case, it is important to review and identify deprecated APIs and address them during the migration.
Security:
- Authentication: EKS supports integration with IAM, where a. user or role that initially created the cluster is granted admin access and b. access is granted by adding users/roles to the aws-auth ConfigMap. You can also authenticate access to an EKS cluster with an OpenID Connect (OIDC) identity provider and use Kubernetes Roles and ClusterRoles to assign permissions to the roles, then bind the roles to the identities using Kubernetes RoleBindings and ClusterRoleBinding
- IAM roles for service accounts: Kubernetes applications often require access to AWS APIs to provision or access cloud resources. To do so, they will need to be granted permissions. A critical security best practice is the principle of ‘least privilege.’ IAM roles for service accounts allows limiting the scope of granting permissions based on the assignment of specific IAM roles to a service account and the pod that uses that service account.
Networking
- VPC CNI: Amazon EKS integrates with Amazon VPC CNI to enable native VPC networking. The VPC CNI assigns pods the same IP address inside Kubernetes as they do in the VPC network resulting in increased transparency, observability, and debug-ability.
- AWS Load Balancer Controller: To leverage new feature releases and unlock additional configuration capability, AWS provides the AWS Load Balancer Controller. This supports not only Kubernetes services, but also ingress resources and can create Application Load Balancers (ALB) for web applications.
Storage
- EBS/EFS/FSx CSI Controllers: Kubernetes was designed to support stateful workloads with resources such as Persistent Volumes (PV) and Persistent Volume Claims (PVC). The initial implementation to interface with different storage infrastructures, like Elastic Block Storage (EBS) and Elastic File Storage (EFS) on Kubernetes was done with Volume Plugins. However, any new changes or fixes to the Volume Plugins required changes to upstream Kubernetes, which leads to long lead times on required updates. To alleviate this, the Container Storage Interface (CSI) was introduced to allow updates and fixes to storage drivers independent of Kubernetes releases. AWS supports CSI drivers for EBS, EFS, and FSx for Lustre. Depending on your workload requirements you can select from EBS, EFS, or FSx to provide storage for your stateful workloads.

Test

Typically, we see customers doing initial functional testing on a new EKS cluster to ensure their applications perform as expected. Because Amazon EKS is certified Kubernetes Conformant, you can be confident that your applications leverage the same APIs as open-source Kubernetes. Then, customers validate and test integrations of 3rd party or open-source tooling to verify that the versions are compatible with EKS. Lastly, if necessary, customers perform load testing to understand how EKS handles peak workload patterns and scales to meet business needs. In addition, we recommend a security audit and Well Architected Review to ensure configuration follows best practices guidance and free from security vulnerabilities.

This is also a good time to reevaluate tooling and review AWS recommended best practices for cluster configuration. It is an AWS best practice to automate as much as possible. For cluster deployment, a common practice is Infrastructure as Code (IaC) and there are several options:

CloudFormation is a popular AWS-native solution used by many of our customers
eksctl, the official CLI tool of EKS, uses a declarative configuration format and outputs CloudFormation to deploy the cluster.
The AWS Cloud Development Kit (CDK) uses the familiarity and expressive power of programming languages to model your application infrastructure.
3rd party solutions like Terraform also work well.

After automating the deployment of the EKS cluster, you can use your CI/CD pipeline to deploy and test your applications on the new cluster.

Migrate

There are a few different strategies for migration. In some cases, customers may want to back up and restore the state of the cluster. Some of our customers have leveraged 3rd party solutions like Velero and Druva to back up the state of their clusters. On the other hand, we have worked with customers who simply redirect their CI/CD pipeline to deploy into the EKS cluster.

Finally, cutting over and moving customers to the new cluster: leverage a blue/green deployment and use weighted DNS routing with Route53 or any other traffic management solution to slowly move traffic over to the new cluster and gradually shut down the old cluster.

Conclusion

In this post, we have discussed a high-level strategy for a migration. Each customer must reconcile this with organizational needs. We believe that Amazon EKS offers the best experience with Kubernetes for companies that are running workloads at scale. When carefully planned, we can reduce risk and migrate workloads without impacting production workloads. Customers such as New Relic have planned and successfully completed application migrations in a short amount of time. With EKS, customers such as Snap, Lyft, HSBC, and DeliveryHero have offloaded the undifferentiated heavy lifting of managing Kubernetes to AWS and focused on delivering value to their end users.

Containers