Containers

Running Workload on Amazon EKS in Local Zones with a failover strategy

Introduction

Update 08/05/22: We updated the title and conclusion to improve the accuracy of wording.

AWS Local Zones are a type of infrastructure deployment that places compute, storage, and other select AWS services close to large population and industry centers. Customers can build and deploy applications close to end users to enable real-time gaming, live streaming, augmented and virtual reality, virtual workstations, and more.

On the other hand, Local Zones come with the limitation that only single Availability Zone (AZ) is available, Amazon EC2 instance types are limited, and only Application Load Balancer (ALB) is available for load balancing. Hence, for applications on Amazon Elastic Kubernetes Service (Amazon EKS), it requires additional configurations to run Amazon EC2 worker nodes, and ALBs for Amazon EKS clusters.

In the post, we show:

  • Deploy an Amazon EKS cluster, with a self-managed node group running in the local zone, and a managed node group in the region with Amazon EKS Blueprint.
  • Consideration for shared data storage using Amazon Elastic File System (Amazon EFS), and database synchronization for Local Zones and Region using AWS Database Migration Service (AWS DMS)
  • Architect in an active-standby configuration using Amazon Route 53 to manage the traffic between the Local Zone and Region.

Our goal is to provide an example for running the WordPress application workload in the Local Zone on Amazon EKS, and failover to the region in the rare event of an issue in the Local one.

Solution Overview

The following diagram shows the high-level architecture, for running a WordPress website on Amazon EKS in the Local Zone.

The customer facing endpoint is a Route 53 domain (demo.lindarren.com) and has a failover policy to the primary site in the Local Zone (demo.primary.lindarren.com) and backup site (demo.backup.lindarren.com) in the AZs in the Region.

When the customer is connecting to the primary site (Local Zone), the request is served by the ALB in the Local Zone, and the backend servers are hosted by Kubernetes pods, running on the self-managed Amazon EC2 nodes. The backend database in the Local Zone is an Amazon EC2 instance with MariaDB installed.

For the backup site in the Region, there is an ALB and Kubernetes pods running on a managed node group. The backup database is running on Amazon Relational Database Service (Amazon RDS). Amazon RDS is a managed Database-as-a-Service (DBaaS) that makes it easy for IT administrators to set up, operate, and scale relational databases in the cloud. For data replication, we use AWS Database Migration Service (AWS DMS) to replicate data from the Amazon EC2 database instance in the Local Zone to the Amazon RDS instance in the Region.

For persistent storage, the files are stored on an Amazon EFS filesystem. Currently, it’s not supported to create Amazon EFS mount targets in the Local Zone subnets. Consequently, it’s necessary to make a few changes to make Amazon EFS CSI (Container Storage Interface) driver DaemonSetin the Local Zone to mount an Amazon EFS filesystem for the pod.

Deployment in the Local Zone

For the application deployment, we use the combination of Kubernetes YAML files and Terraform modules. We use Terraform to create AWS resources such as Amazon Virtual Private Cloud (Amazon VPC), Amazon EKS, Amazon EC2, Amazon EFS, Amazon RDS, AWS DMS, Amazon Route 53, etc. For the application in Kubernetes, we use the YAML manifest files and WordPress in this post.

Prerequisites

  • An AWS account with the Administrator permissions. To use Amazon EKS Blueprint, using AWS Identity and Access Management (AWS IAM) Role is required and IAM User is not supported due to the issue. For setup details, please refer to the docs here.
  • Installation of the latest version AWS Command Line Interface (AWS CLI) (v2 recommended), kubectl, eksctl, Git, Terraform, jq (1.6 recommended) and System Manager Session Plugin.
  • A domain name that you own (e.g.,lindarren.com) and a hosted zone in Amazon Route 53. The domain name is necessary because we are using Amazon Route 53 domains and TLS (Transport Layer Security) certificates.
  • A shell environment. An IDE (Integrated Development Environment) environment such as Visual Studio Code or AWS Cloud9 is recommended. Please make sure that you configured IAM role credentials on your own instead of AWS Cloud9’s temporary credentials. For details, please find the guide here.
  • Opt-in the Local Zone that you would like to run your workload in.
  • An existing TLS certificate for web hosting as a resource in AWS Certificate Manager (ACM).

Now clone the source code to your working directory and configure a few aliases.

git clone https://github.com/aws-samples/eks-localzone-demo.git
# A few optional shorthands
alias tf=terraform
alias k=kubectl 

Walkthrough

Step 1. Deploy VPC

The first thing we’ll need to provision for this architecture is a VPC, containing both Local Zone and AZs for the Amazon EKS cluster and database instances. There are three public and three private subnets in the AZs, one private, and one public subnet in the local zone.

In the main.tf, we use vpc module to create the subnets in the AZs. For Local Zone subnets, we create aws_subnet resources.

...

resource "aws_subnet" "public-subnet-lz" {
  vpc_id                  = module.vpc.vpc_id
  availability_zone       = local.lzs[0]
  ...
}

resource "aws_subnet" "private-subnet-lz" {
  ...
}

resource "aws_route_table_association" "public-subnet-lz-rta" {
  subnet_id      = aws_subnet.public-subnet-lz.id
  route_table_id = module.vpc.public_route_table_ids[0]
}

resource "aws_route_table_association" "private-subnet-lz-rta" {
  subnet_id      = aws_subnet.private-subnet-lz.id
  route_table_id = module.vpc.private_route_table_ids[0]
}

To create the VPC, let’s review and define the input variables. The VPC is in us-east-1 , the Local Zone is us-east-1-bos-1a. You need provide a name and vpc_cidr for the VPC.

cd deploy/01-vpc
vim demo.auto.tfvars
name         = "demo-vpc" 
vpc_cidr     = "10.0.0.0/16"
cluster_name = "lindarr-demo" # Name of EKS Cluster, for subnet tagging 
region       = "us-east-1"
lzs          = ["us-east-1-bos-1a"]

Deploy the VPC infrastructure using terraform CLI.

terraform init
terraform apply -auto-approve

In the output, or run terraform output to get the VPC ID and subnets IDs, including the subnets in the AZs and the local zone. For deploying additional resources like Amazon RDS, Amazon EKS, and Amazon EFS in the upcoming steps, you can use the subnet IDs from the output here.

➜  01-vpc git:(main) ✗ terraform output
private_subnets = [
  "subnet-04bfbdb56eab20f3f",
  "subnet-0282d89055cab1760",
  "subnet-0e3d213bfb21127fa",
]
private_subnets_local_zone = "subnet-0179a7e06585a551f"
public_subnets = [
  "subnet-0d05de32e811f03c4",
  "subnet-0c2d26c64af1f9889",
  "subnet-0e5495f6c4218f5aa",
]
public_subnets_local_zone = "subnet-0b49a2a528a2d2e68"
vpc_id = "vpc-0c544fbcafdbbb035"
vpc_id_cidr = "10.0.0.0/16"

Step 2. Deploy Amazon EKS Cluster

Next, we use Amazon EKS Blueprint to create an Amazon EKS Cluster, including the Kubernetes Control Plane, a managed node group in the region, and a self-managed node group in the Local Zone.

Change the working directory to 02-eks and edit the variables.

cd ../02-eks
vim demo.auto.tfvars

Modify the variables, mainly the resource IDs of VPC and subnets (by copying the partial output from the first module)

vpc_id = "vpc-0c544fbcafdbbb035"
private_subnets = [
  "subnet-04bfbdb56eab20f3f",
  "subnet-0282d89055cab1760",
  "subnet-0e3d213bfb21127fa",
]
private_subnets_local_zone   = "subnet-0179a7e06585a551f"
cluster_name                 = "my-eks-demo-cluster"
domain_name_in_route53 = "lindarren.com"

In the main.tf, we use the module Amazon EKS Blueprints for Terraform to create the Amazon EKS cluster. It makes creating an Amazon EKS cluster easier, especially for creating self-managed node groups in Local Zone subnets. Also note, in the Local Zone, the instance types is limited and the Amazon EBS volume type is limited to gp2.

In addition, the security group rules of self-managed nodes are restrictive in the module, so we add additional rules so self-managed nodes and managed nodes can communicate without issues. These additional rules are required for mixed managed node group and self-managed node group workloads to avoid CoreDNS queries being blocked by security group rules.

Below are some code snippet in the main.tf:

...

  # EKS Self-Managed Node Group in Local Zone
  self_managed_node_groups = {
    self_mg_4 = {
      node_group_name    = "self-managed-ondemand"
      instance_type      = "t3.xlarge" # instance types are limited in the local zone
      launch_template_os = "amazonlinux2eks" # or bottlerocket 
      block_device_mappings = [
        {
          device_name = "/dev/xvda"
          volume_type = "gp2" # Local Zone supports gp2 volumes only 
          volume_size = "100"
        },
      ]
      subnet_ids = [var.local_zone_private_subnet_id]
    },
  }

...
  # https://github.com/aws-ia/terraform-aws-eks-blueprints/issues/619
  # Allow Connection from other nodes 
  node_security_group_additional_rules = {
    egress_all = {
      description      = "Node all egress"
      ...
    }
  }
...


resource "aws_security_group_rule" "allow_node_sg_to_cluster_sg" {
  # Self-managed Nodegroup to Cluster API/Managed Nodegroup all traffic
  source_security_group_id = module.eks_blueprints.worker_node_security_group_id
  security_group_id        = module.eks_blueprints.cluster_primary_security_group_id
  ...
 }

resource "aws_security_group_rule" "allow_node_sg_from_cluster_sg" {
  # Cluster API/Managed Nodegroup to Self-Managed Nodegroup all traffic
  source_security_group_id = module.eks_blueprints.cluster_primary_security_group_id
  security_group_id        = module.eks_blueprints.worker_node_security_group_id
  ...
 }

In the eks_blueprints_kubernetes_addons module in the main.tf, you can enable several add-ons directly, and the module creates both IAM Roles for Service Accounts and installs the helm charts. I use Amazon EFS CSI driver, AWS Load Balancer Controller and External DNS, so I set the respective keys and their values to be true. These add-ons will be deployed by the Terraform module after Amazon EKS cluster creation.

module "eks_blueprints_kubernetes_addons" {
  ...
  enable_amazon_eks_aws_ebs_csi_driver = true
  enable_aws_load_balancer_controller = true
  enable_metrics_server               = true
  enable_external_dns       = true
  ...
}

Now let’s move forward and create the EKS Cluster by running terraform commands. The EKS cluster, node group creation, and add-on installation will take approximately 20 minutes to complete.

terraform init
terraform plan 
terraform apply -auto-approve

After the command completed, the EKS cluster and nodes on the local zone and add-ons have been deployed. By running kubectl get node you will be able to find node(s) in us-east-1-bos-1a zone, and running kubectl get pod you can find aws-load-balancer-controller and external-dns pods are running and ready.

➜  ~ aws eks update-kubeconfig \
--name $(tf output eks_cluster_id | jq . -r) \
--region us-east-1 
Updated context arn:aws:eks:us-east-1:091550601287:cluster/lindarr-demo in /Users/lindarr/.kube/config

# Some node(s) are running in local zone
➜  ~ kubectl get node --label-columns failure-domain.beta.kubernetes.io/zone
NAME                          STATUS   ROLES    AGE     VERSION               ZONE
ip-10-0-11-232.ec2.internal   Ready    <none>   14d     v1.22.6-eks-7d68063   us-east-1b
ip-10-0-15-39.ec2.internal    Ready    <none>   3d17h   v1.22.6-eks-7d68063   us-east-1-bos-1a
...

# AWS LB Controller and External DNS are running 
➜  ~ kubectl get pod -n kube-system
NAME                                                         READY   STATUS      RESTARTS   AGE
aws-load-balancer-controller-75bd4dfcbd-bwdqt                1/1     Running     0          11d
aws-load-balancer-controller-75bd4dfcbd-kx8l5                1/1     Running     0          3d16h
aws-node-bxzkb                                               1/1     Running     0          11d
aws-node-p8bm7                                               1/1     Running     0          3d16h
coredns-7f5998f4c-886lb                                      1/1     Running     0          11d
coredns-7f5998f4c-cv5b8                                      1/1     Running     0          3d16h
ebs-csi-controller-588dffc699-vh8gb                          5/5     Running     0          3d16h
ebs-csi-controller-588dffc699-zkxxh                          5/5     Running     0          3d16h
ebs-csi-node-898nj                                           3/3     Running     0          11d
ebs-csi-node-b4b5r                                           3/3     Running     0          3d16h
efs-csi-controller-9d944546-9s6cz                            3/3     Running     0          11d
efs-csi-controller-9d944546-gtmc9                            3/3     Running     0          3d16h
efs-csi-node-7klzk                                           3/3     Running     0          8d
efs-csi-node-wzwlc                                           3/3     Running     0          3d16h
kube-proxy-n6s4q                                             1/1     Running     0          14d
kube-proxy-vhdrx                                             1/1     Running     0          3d16h
metrics-server-694d47d564-zxfrs 

# External DNS is running 
➜  ~ kubectl get po -n external-dns 
NAME                           READY   STATUS    RESTARTS   AGE
external-dns-96c667c79-88zcv   1/1     Running   0          25d

Step 3. Deploy Amazon EFS filesystems and targets

In the following sections, we deploy the necessary AWS resources for WordPress, which is our demonstration application for this post. If you are going to deploy your own application and would like to know some caveats related to Amazon EKS in the Local Zone (especially for ALB Ingress), you can fast forward to the Step 6 directly.

WordPress on Kubernetes requires a persistent volume to store the application and data. Amazon EFS is chosen for this demonstration, because we need the storage to be accessed from the Local Zone and AZ. On the other hand, if your application accesses the PV frequently and requires low latency, consider Amazon EBS for the Persistent Volume, and you need to have other mechanism to replicate or backup data on Amazon EBS volumes from Local Zone to the Available Zones.

Now let’s create Amazon EFS resources using Terraform, get the Amazon EFS filesystem, and get the access point IDs for the volumeHandle of PV.

cd ../03-efs
vim demo.auto.tfvars # Edit the VPC and subnet IDs

tf init 
tf plan 
tf apply -auto-approve
➜  03-efs git:(main) ✗ terraform output
efs_ap_id    = "fsap-03b76858b781b84ff"
efs_id       = "fs-08312777c25f61ee9"
volumeHandle = "fs-08312777c25f61ee9::fsap-03b76858b781b84ff"

Since the Amazon EFS mount targets are not supported in the Local Zone, we make some tweaks on the Amazon EFS CSI driver so that Amazon EFS mount points in the Region can be resolved in the Local Zone worker nodes without errors. Use kubectl to patch the Amazon EFS CSI manifests and add hostAliases for the Amazon EFS mount points.

➜  03-efs git:(main) ✗ terraform output
efs_ap_id = "fsap-046d60b356c84b394"
efs_id = "fs-024f950b4c448cc67"
efs_mount_target = [
  "10.0.10.87",
  "10.0.11.151",
  "10.0.12.41",
]
volumeHandle = "fs-024f950b4c448cc67::fsap-046d60b356c84b394"


➜  ~ vim efs-node-patch.yaml

spec:
  template:
    spec:
      # Add host Aliases here so that EFS mount points can be resolved on Local Zones
      # Otherwise, DNS resolution will fail if the CoreDNS pod is running on local zone
      # Or fail randomly, if one coredns pod is on AZ and another pod is on Local Zone 
      hostAliases:
      - hostnames:
        - fs-08312777c25f61ee9.efs.us-east-1.amazonaws.com
        ip: 10.0.10.26
      - hostnames:
        - fs-08312777c25f61ee9.efs.us-east-1.amazonaws.com
        ip: 10.0.12.4
      - hostnames:
        - fs-08312777c25f61ee9.efs.us-east-1.amazonaws.com
        ip: 10.0.11.140
➜  03-efs git:(main) ✗ kubectl patch daemonset -n kube-system efs-csi-node --patch-file efs-node-patch.yaml
Warning: spec.template.spec.nodeSelector[beta.kubernetes.io/os]: deprecated since v1.14; use "kubernetes.io/os" instead
daemonset.apps/efs-csi-node patched

Step 4. Deploy Amazon EC2 Database Instances, Amazon RDS, and AWS DMS

For the database tier, we run MariaDB on Amazon EC2 in Local Zone as the primary site, Amazon RDS in the Region, and use AWS DMS to replicate the tables and records from Amazon EC2 instance to Amazon RDS. We create the resources using the Terraform module located in 04-database directory.

If you haven’t created required AWS DMS role before, please specify the create_iam_roles = true in the demo.auto.tfvars file. If you don’t have an SSH key pair, please refer to the docs to create one and replace the key name my_ssh_key_name in the .tfvars file.

cd ../04-database
vim demo.auto.tfvars # Edit the VPC and subnet IDs
private_subnets = [
  "subnet-01f9037408ae338ad",
  "subnet-0f30e01d3f9addd62",
  "subnet-0096b2f4142dbdae2",
]
private_subnets_local_zone = "subnet-0f19d51410f6167ac"

ssh_key_name   = "my_ssh_key_name" # Relace the ssh_key_name to your SSH key 
vpc_cidr_block = "10.0.0.0/16"

vpc_id = "vpc-0a65e88418d47f0ee"
 
create_iam_roles = true # set to true if IAM role required are not created before
terraform init 
terraform plan 
terraform apply -auto-approve

...

➜  04-database git:(main) ✗ tf output
db_ec2_instance_id = "i-019b9172637105e4e"
db_ec2_instance_ip = "10.0.15.200"
ec2_mariadb_password = <sensitive>
rds_endpoint = "demo-test-mariadb-instance.cdyids0dslnl.us-east-1.rds.amazonaws.com:3306"
rds_password = <sensitive>

➜  04-database git:(main) ✗ terraform output rds_password
"bbFVta-ExampleRDSPassword"

You can use SSH into the DB instance via a bastion host (details can be found in this blog), or use SSM to login the instance and configure the database. Please ensure that you have installed Session Manager plugin mentioned in the documentation.

➜  04-database git:(main) aws ssm start-session \
  --region us-east-1 \
  --target $(tf output db_ec2_instance_id | jq . -r)
   

After successfully starting the MariaDB server, we run the scripts below to create wordpress database and wordpress user for WordPress post data. Also, grant the necessary privileges for replication. Replace wordpress99 for the user database password.

bash 

sudo mysql -sfu root -e "GRANT ALL PRIVILEGES ON wordpress.* to 'wordpress'@'%' IDENTIFIED BY 'wordpress99';"
sudo mysql -sfu root -e "GRANT SUPER, RELOAD, PROCESS, REPLICATION SLAVE, REPLICATION CLIENT ON *.* TO wordpress@'%';"
sudo mysql -sfu root -e "FLUSH PRIVILEGES;"

sudo systemctl stop mariadb

Next, we will use AWS DMS to replicate data changes from MariaDB on Amazon EC2 at later stage, so we need to enable and configure bin-log for replication. Here is the modification for MariaDB on Amazon EC2.

sudo tee /etc/my.cnf.d/server.cnf<<EOT
[mysqld]
log_bin=/var/lib/mysql/bin-log
log_bin_index=/var/lib/mysql/mysql-bin.index
expire_logs_days= 2
binlog_format= ROW
EOT

sudo systemctl start mariadb

# Ctrl^D twice to exit the shell and session 

To achieve high-availability for the database tier, we create an Amazon RDS of MariaDB in region as stand-by replica and use AWS DMS to replicate data between MariaDB on Amazon EC2 and Amazon RDS for MariaDB.

In the terraform module, we create the required AWS resources, including IAM roles, AWS DMS instance, source and target endpoints, as well as the AWS DMS Replication task and use Full load + CDC to copy all data from MariaDB on Amazon EC2 and continuously replicate data changes to Amazon RDS for MariaDB.

With the below snippet of the of table-mappings JSON file, AWS DMS replicates all tables in wordpress database from MariaDB on Amazon EC2.

# table-mappings.json
{
    "rules": [
        {
            "rule-type": "selection",
            "rule-id": "1",
            "rule-name": "1",
            "object-locator": {
                "schema-name": "wordpress",
                "table-name": "%"
            },
            "rule-action": "include"
        }
    ]
}

Step 5. Deploy Amazon Route 53

We are going to deploy Amazon Route 53 records and Health Checks for both ALB in the local zone and AZ with CNAME records.

The domain demo.primary.lindarren.com is used for the ALB in the Local Zone, and is registered by the external-dns controller we installed in the Amazon EKS Blueprint module. Similarly, the domain demo.backup.lindarren.com is used for the ALB in the Region.

The Heath Checks perform on both sites. When the primary site (Local Zone) is healthy, then the DNS record demo.lindarren.com resolves to demo.primary.lindarren.com. On the other hand, when the primary site (Local Zone) does not return successful results, the DNS record demo.lindarren.com resolves to demo.backup.lindarren.com.

Here’s code snippet of how we setup our Health Check and Failover Policy using Terraform.

resource "aws_route53_health_check" "localzone" {
  fqdn              = local.endpoint_local_zone
  resource_path     = "/"
  type              = "HTTPS"
  port              = 443
  failure_threshold = 5
  request_interval  = 30
  tags = {
    Name = "Health Check for Ingress in Local Zone"
  }
}

...

resource "aws_route53_record" "localzone" {
  zone_id         = data.aws_route53_zone.main.zone_id
  name            = "${local.app_name}.${local.domain_name}"
  records         = [local.endpoint_local_zone]
  set_identifier  = "primary"
  type            = "CNAME"
  ttl             = 60
  health_check_id = aws_route53_health_check.localzone.id
  failover_routing_policy {
    type = "PRIMARY"
  }
}

You can edit the main.tf, modify the local variables and run terraform CLIs below to deploy the Amazon Route 53 resources.

cd ../05-route53
vim demo.auto.tfvars 
endpoint_local_zone = "demo.primary.lindarren.com"
endpoint_region     = "demo.backup.lindarren.com"
domain_name    = "lindarren.com."
app_name       = "demo"
terraform init 
terraform apply -auto-approve

Step 6. Deploy Kubernetes Application

After AWS resources are created, we are going to deploy the Kubernetes resources for our application.

We have two deployments for WordPress app. One is in the Local Zone, with nodeAffinity requires topology.kubernetes.io/zone in the Local Zone us-east-1-bos-1. Another deployment has NotIn operator, so that the pods launch in the Region. Modify the value of variable WORDPRESS_DB_HOST to be the private IP of your DB instance in the output of step 4.

cd ../06-kubernetes
vim wordpress-deployment.yaml
kind: Deployment
apiVersion: apps/v1
metadata:
  name: wordpress
  ...
spec:
  ...
    spec:
      affinity:
        nodeAffinity:
          requiredDuringSchedulingIgnoredDuringExecution:
            nodeSelectorTerms:
            - matchExpressions:
              - key: topology.kubernetes.io/zone
                operator: In
                values: # Modify the value to the local zone of yours 
                - us-east-1-bos-1 
      containers:
      - image: wordpress:php7.1-apache
        name: wordpress
        env:
        - name: WORDPRESS_DB_HOST # REPLACE IT WITH THE PRIVATE IP OF DB INSTANCE
          value: "10.0.15.185"

For the shared storage, enter the volumeHandle with the Amazon EFS ID and Amazon EFS AccessPoint ID, in the output of step 3.

# Editing wordpress-deployment.yaml
apiVersion: v1
kind: PersistentVolume
metadata:
  name: wordpress-efs-pv
spec:
...
  csi:
    driver: efs.csi.aws.com 
    volumeHandle: "fs-08312777c25f61ee9::fsap-0250aa2272226c8d4"

For the ingress in the Local Zone, we define the value of alb.ingress.kubernetes.io/subnets to be the public subnet ID in the Local Zone, in the terraform output of step 1. For the ingress in the region, we use the AWS Load Balancer Controller to discover public subnets using tags. Configure annotation external-dns.alpha.kubernetes.io/hostname to allow External DNS Controller to register ALB to Amazon Route 53 automatically. We use Amazon Certificate Manager (ACM) and bind it to the ALB for HTTPS listener. If you haven’t an existing ACM certificate, please refer to the AWS ACM documentation- Request a public certificate using the Console.

# Editing wordpress-deployment.yaml
 
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: wordpress-ingress
  annotations:
    # public subnet in local zone 
    alb.ingress.kubernetes.io/subnets: "subnet-0b49a2a528a2d2e68"
    alb.ingress.kubernetes.io/scheme: internet-facing
    # provide the AWS ACM Certificate ARN
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:091550601287:certificate/75cad826-f2f2-45e5-8bfe-c9b722d635d7
    # provide the domain name
    external-dns.alpha.kubernetes.io/hostname: demo.primary.lindarren.com

...

For the deployment in the Region, also configure the DB_HOST
environment variable to use the DB Instance in the Local Zone. We would like to make sure only one database (DB Instance in the Local Zone currently) is writable and Amazon RDS endpoint is connected after failover.

vim wordpress-backup.yaml
apiVersion: apps/v1 
kind: Deployment
metadata:
  name: wordpress-region
  labels:
    app: wordpress-region
spec:
  ...
      containers:
      - image: wordpress:php7.1-apache
        name: wordpress
        env:
        - name: WORDPRESS_DB_HOST
          value: "10.0.15.185"

For the Ingress to register an Amazon Route 53 record in the hosted zone and attach an ACM certificate.

# wordpress-backup.yaml
apiVersion: networking.k8s.io/v1
kind: Ingress
metadata:
  name: wordpress-ingress-backup
  annotations:
    # provide the external domain name 
    external-dns.alpha.kubernetes.io/hostname: demo.backup.lindarren.com
    alb.ingress.kubernetes.io/scheme: internet-facing
    # provide the AWS ACM Certificate ARN 
    alb.ingress.kubernetes.io/certificate-arn: arn:aws:acm:us-east-1:091550601287:certificate/75cad826-f2f2-45e5-8bfe-c9b722d635d7

Define the secret values of Amazon RDS password and DB Instance user password in the kustomization.yaml

secretGenerator:
- name: db-instance-pass
  literals:
  - password=wordpress99 
- name: rds-pass
  literals:
  - password=bbFVta-RDSExamplePassword # Replace with RDS Password in Terraform output
- name: mysql-pass
  literals:
  - password=rae2ooQu6uj6AiQu5mie0die4thuZu # This is for the mysql stateful, but not used currently 
resources:
  - wordpress-deployment.yaml  
  - wordpress-backup.yaml  
  - mysql-deployment.yaml  

Deploy the Kubernetes resources by running kubectl

➜  06-kubernetes git:(main) ✗ kubectl apply -k . 

After the resources are created, open the web browser to open WordPress website, and start the initial setup. After the setup is completed, go to https://demo.lindarren.com/ to ensure the WordPress is running.

After the installation are completed, the final step is starting the AWS DMS replication task. Go to AWS DMS Console, find Endpoints > Source Connections, and run test if it failed before.

Use the commands below (or use the AWS DMS’ Console) to start or resume the replication.


cd ../04-database 

DMS_REPL_TASK_ARN=$(tf output dms_repication_task_arn | jq . -r)

aws dms start-replication-task \
    --replication-task-arn $DMS_REPL_TASK_ARN \
    --start-replication-task-type start-replication \
    --region us-east-1
    
{
    "ReplicationTask": {
        "ReplicationTaskIdentifier": "demo-localzone-replication-task",
        "SourceEndpointArn": "arn:aws:dms:us-east-1:091550601287:endpoint:WPOUJ3ON74LXPHYKE4DOFIYXDAZFMSNKI3Z3S3Q",
        "TargetEndpointArn": "arn:aws:dms:us-east-1:091550601287:endpoint:YYYCLSMAYTHHLPXTINE3IM4AL4OOJRFBEZEZNKI",
        "ReplicationInstanceArn": "arn:aws:dms:us-east-1:091550601287:rep:RCISNMDF3F7VM5IVMSRVCPFXBRYLUHCXP2BC5SQ",
        "MigrationType": "full-load-and-cdc",
        "TableMappings": "{\n    \"rules\": [\n        {\n            \"rule-type\": \"selection\",\n            \"rule-id\": \"1\",\n            \"rule-name\": \"1\",\n            \"object-locator\": {\n                \"schema-name\": \"wordpress\",\n                \"table-name\": \"%\"\n            },\n            \"rule-action\": \"include\"\n        }\n    ]\n}",
        "ReplicationTaskSettings": "......",
        "Status": "starting",
        "ReplicationTaskCreationDate": "2022-07-22T15:16:56.808000+08:00",
        "ReplicationTaskStartDate": "2022-07-22T15:55:04.720000+08:00",
        "ReplicationTaskArn": "arn:aws:dms:us-east-1:091550601287:task:YDR7LINDFJVSVKNB7IMGKKXFRJBWWOZFMBOGICQ"
    }
}

Check the AWS DMS Console and find the tables are being replicated from MariaDB Instance to Amazon RDS.

Check the Amazon Route 53 Console. The Health Checks should show that both primary site and backup site are healthy. The domain is now resolved to ALB in the Local Zone. You can use dig commands to verify the DNS resolution.

➜  ~ kubectl get ingress
NAME                       CLASS   HOSTS   ADDRESS                                                                  PORTS   AGE
wordpress-ingress          alb     *       k8s-default-wordpres-ed46143e74-1394360021.us-east-1.elb.amazonaws.com   80      12d
wordpress-ingress-backup   alb     *       k8s-default-wordpres-8d75cd8cec-1858496555.us-east-1.elb.amazonaws.com   80      9d

➜  ~ dig k8s-default-wordpres-ed46143e74-1394360021.us-east-1.elb.amazonaws.com +short
68.66.115.193
68.66.113.46

➜  ~ dig demo.lindarren.com +short
demo.primary.lindarren.com.
68.66.113.46
68.66.115.193

Step 7. Failover Test from Local Zone to Region

Now we are going to simulate some failure on the pods in the Local Zone, and failover to the backup site in the Region. Run the commands below to disrupt the Kubernetes deployment in the Local Zone.

# Scale the Pod Replica to 0 and terminate DB Instance
kubectl scale --replicas=0 deploy/wordpress

Wait for a few minutes, and you can find the Amazon Route 53 Health Checks saying that the primary site is unhealthy.

Now run dig to resolve the domain. We are still able to open the WordPress website since the ALB, so the Amazon EC2 worker nodes, (not including database at the moment) tier are directed to the resources in the AZ when the Local Zone is not healthy.

In the post, the database failover is performed at the application level, but can be implemented with a Kubernetes service or a custom domain in the future. To switch from DB Instance to Amazon RDS, we change the database connection endpoints in the Kubernetes Deployment and restart the deployment. Get the secret name from kubectl get secret and put the secret name in the environment variable of the wordpress-backup deployment.

➜  06-kubernetes git:(main) ✗ aws ec2 stop-instances --region us-east-1 \
  --instance-ids $(tf -chdir=../04-database output db_ec2_instance_id | jq . -r) 

➜  06-kubernetes git:(main) ✗ kubectl get secret
NAME                          TYPE                                  DATA   AGE
db-instance-pass-95cd7tdbdf   Opaque                                1      2d7h
default-token-b5q6v           kubernetes.io/service-account-token   3      2d8h
mysql-pass-ft5b2tdk5m         Opaque                                1      2d7h
rds-pass-g48k9fdbhc           Opaque   

➜  06-kubernetes git:(main) ✗ kubectl edit deploy/wordpress-region                           
                                                                                       1      2d7h

# wordpress-backup.yaml
# get the RDS endpoint 
# tf -chdir=../04-database output rds_endpoint, removing port number 
---
apiVersion: apps/v1 
kind: Deployment
metadata:
  name: wordpress-region
  labels:
    app: wordpress-region
spec:
  ...
      containers:
      - image: wordpress:php7.1-apache
        name: wordpress
        env:
        - name: WORDPRESS_DB_USER
          value: "admin"
        - name: WORDPRESS_DB_HOST
          value: "demo-localzone-test-mariadb-instance.cdyids0dslnl.us-east-1.rds.amazonaws.com"
        - name: WORDPRESS_DB_PASSWORD
          valueFrom:
            secretKeyRef:
              name: rds-pass-g48k9fdbhc
              key: password
...

After performing the command, you can try to connect to the WordPress website and confirm that the website is still working.

Step 8. Destroy All the Resources

After playing with the demo application around, destroy all of the resources using the command below to save costs:

cd ../06-kubernetes
kubectl delete -k . 

cd ../05-route53
terraform destroy -auto-approve

cd ../04-database
terraform destroy -auto-approve

cd ../03-efs
terraform destroy -auto-approve

cd ../02-eks
terraform destroy -auto-approve

cd ../01-vpc
terraform destroy -auto-approve

Conclusion

The goal of this post was to show how to architect an application using Amazon EKS on a Local Zone with a failover strategy.

We showed you how to:

  1. Deploy VPC and Amazon EKS Cluster in the Region and Local Zone. Define self-managed node groups in the Local Zone subnet, managed node groups in the AZ subnets, and install add-ons by specifying the respective values being true in the Amazon EKS Blueprint module.
  2. Deploy an Amazon RDS instance and a DB Amazon EC2 Instance, and how to use AWS DMS to replicate the tables and records from Local Zone to the Region. The failover of database is performed by the customer.
  3. Create an Amazon EFS filesystem and make tweaks of the Amazon EFS CSI driver so that the worker node in the Local Zone can mount Amazon EFS target without errors.
  4. Create Amazon Route 53 Health Checks and records and used failover policy so that the customer is connecting to the Local Zone as the primary site, and failover to backup site when the primary site is not available.

Hopefully, you will be able to follow along with this post and are now equipped to build applications for your projects on a Local Zone. For more details, check out the documentation links below. Happy building!

Resource Links