Performing Route 53 health checks on private resources in a VPC with AWS Lambda and Amazon CloudWatch
If you have ever used Amazon Route 53 health checks to monitor resources, you know that monitored resources must have public IP addresses. This is because Route 53 health checkers are public and they can only monitor hosts with IP addresses that are publicly routable on the internet.
You may want to monitor your resources with private IP addresses or private domain names in VPCs. You could associate that health check with a record in a Route 53 hosted zone (public or private) to achieve a failover scenario when the primary record is unhealthy.
You can use an AWS CloudFormation template to perform TCP, HTTP, and HTTPS health checks for private resources in a VPC. The term private resource refers to any resource in a VPC not accessible over the internet.
This post explains the process involved. Using this solution, you need only enter the required parameters and CloudFormation does the magic!
This solution consists of the following services:
1. AWS Lambda: Performs the TCP/HTTP/HTTPS health check and pushes the metric and logs to CloudWatch.
2. Amazon CloudWatch Events: Invokes the Lambda function every minute.
3. Route 53: Creates the health check that monitors the private resource based on the CloudWatch alarm.
4. IAM: Creates a role used by Lambda to perform health checks.
5. CloudFormation: Creates all resources from the stack template. After the stack launches, you see the health check on the Route 53 console.
After the Lambda function is invoked by CloudWatch Events, Lambda pushes a metric to CloudWatch. The CloudWatch metric determines whether the resource is healthy or unhealthy based on the metric pushed by Lambda.
CloudWatch also receives the logs from the Lambda function. The logs provide more information about the health check status and the reasons for health check failures or success.
CloudWatch creates an alarm used by Route 53 to determine the health status of the private resource.
You need to select a private subnet for Lambda. This private subnet must have internet access, either by using a NAT Gateway or NAT instance. A public subnet with an internet gateway will not work.
The monitored resource should allow access from the private subnet. For example, you may want to monitor an instance with IP address 10.10.10.5 using HTTP on port 80. Configure the security group and network access lists associated with the instance 10.10.10.5 to allow traffic from the Lambda subnet CIDR range on the ports that are monitored (port 80, in this case). Without this access, the health check fails.
To access the CloudFormation template, download the template in JSON format. Launch the CloudFormation stack in the same region as the monitored resource.
Open the CloudFormation console.
Choose Create Stack, Template is ready.
Under Specify template, choose Upload a template file. On your local computer, select the downloaded CloudFormation template.
The CloudFormation template allows you to add the following parameters:
1. Stack name: Enter a name to identify the CloudFormation stack.
2. Protocol: Enter the health check protocol for the private resource. This can be TCP, HTTP, or HTTPS.
3. IP address or Domain Name: Enter the IP address or the domain name of the private resource to be monitored.
4. Port: Enter the port number that is used to monitor the resource.
5. Path: Optional. For example, in “example.com/test.htm” the path is “test.htm”.
6. Lambda Subnet: This is the subnet where the Lambda function is launched.
7. Lambda VPC: Select the VPC containing the Lambda subnet.
Choose Next and acknowledge that CloudFormation can create IAM resources in the account. Choose Create Stack.
When the stack is created, the status changes to CREATE_COMPLETE. You can also view the resources that are created by selecting the stack and choosing Resources.
Based on the parameters that you selected, CloudFormation creates a Lambda function and IAM role for the function.
CloudWatch Events invokes the Lambda function every minute. The Lambda function checks the health of the resource and then sends the value representing the health of the resource back to CloudWatch.
- A value of “1” represents a healthy resource.
- A value of “0” represents an unhealthy resource.
CloudWatch metric creates a custom namespace called Route53PrivateHealthCheck and stores the history for the health check there.
Open the CloudWatch console, select Metrics, Route53PrivateHealthCheck.
Choose the dimension that matches the protocol for the health check. For example, if you are monitoring the domain “example.com” on HTTP, choose HTTP Health Check. Select the name of the domain or IP address being monitored, and view the metrics.
CloudWatch then creates an alarm from the metric stored in the Route53PrivateHealthCheck custom namespace, and uses it to monitor the health check status of the resource.
Route 53 uses this alarm to create a Route 53 health check and determine if the resource is healthy. The health check has the same name as the CloudFormation stack.
The Lambda function sends logs to CloudWatch to report on health check failures or successes. You can view the reason for health check failures on CloudWatch Logs. This is extremely helpful in knowing why health checks fail or pass. CloudWatch creates a log group and metric using the name of the CloudFormation stack. For example, if the name of the stack is “example”, the log group name is /aws/lambda/example.
The following screenshot is an example of a log showing an unhealthy response. The health check did not pass because the web server is returning an HTTP 404 (Not Found) response code.
The following screenshot is an example of a log showing a healthy response. The health check is successful because the server is able to establish a TCP connection.
The following screenshot is an example of a log showing an incorrect Lambda subnet was selected when launching the CloudFormation template. To fix this, the Lambda subnet must be a private subnet with internet access via a NAT Gateway or NAT instance.
You may attach this health check to a Route 53 record set, for example a Route 53 failover recordset. The failover time occurs between two and three minutes.
- For HTTP and HTTPS health checks, the resource that is monitored must respond with an HTTP status code of 2xx or 3xx within two seconds.
- For TCP health checks, the resource that is monitored must be able to establish a TCP connection within four seconds.
AWS service costs apply to the resources created by the CloudFormation template, which include the following:
- Lambda function
- CloudWatch events
- CloudWatch metric
- Route 53 health check
Deleting the CloudFormation stack will delete all the resources created to perform the health check, including the Route 53 health check.
In this post, you learned how to monitor resources with private IP addresses or private domain names in VPCs. I showed you how you can set this up using a CloudFormation template. CloudFormation provisions the AWS services (Route 53, Lambda, CloudWatch) that are used in monitoring the resource in the VPC.
You may associate the Route 53 health check with a record set in a Route 53 hosted zone (public or private) to achieve a failover scenario when the primary record is unhealthy or use it in conjunction with other Route 53 routing policies.
About the Author
Chukwuemeka Orunta is a Cloud Support Engineer at AWS. He enjoys working on large scale networks and distributed systems. His technical interests include coding and finding ways to automate complex tasks. When he is not helping customers, he is an active park runner and an unrepentant soccer player with interests in history and philosophy.
|Blog: Using AWS Client VPN to securely access AWS and on-premises resources|
|Learn about AWS VPN services|
|Watch re:Invent 2019: Connectivity to AWS and hybrid AWS network architectures|