Networking & Content Delivery

Scaling NLB target groups by connections

When workload performance depends on the number of networking connections, traditional load balancing metrics like CPU load or memory utilization do not provide the information you need in order to make scaling decisions. In this post, we explore a solution that automatically scales backend connections of a Network Load Balancer (NLB) target group based on a fixed number of network connections. To do this, we create a custom Amazon CloudWatch metric and use it to monitor and scale target groups. This solution uses Amazon CloudWatch, AWS Lambda, and Amazon Simple Notification Service (Amazon SNS) to automate NLB target scaling.

Solution overview

Our example application, shown in figure 1, is hosted within a VPC, behind an NLB that forwards traffic to a Target Group. There are multiple targets (instances) in the Target Group that send connection metrics to CloudWatch Custom Metrics. Our first step to create a CloudWatch alarm by creating a custom metric. Then, that alarm sends the notification to an Amazon SNS topic. After that, Amazon SNS invokes a Lambda function that deregisters or registers a target in the Target Group according to the payload information provided by Amazon SNS.

Target invokes connection limit alarm

Figure 1: Target invokes connection limit alarm

Lambda is invoked to deregister the target

Figure 2: Lambda is invoked to deregister target

As described previously, our connection-based scaling method is based on a custom CloudWatch metric. The following example shows how to set up a sample custom CloudWatch metric from your application server on Amazon Elastic Compute Cloud (Amazon EC2) as part of a Target Group and NLB.

Sending metrics from your application on the EC2 instance to a CloudWatch custom metric

We start by publishing a custom CloudWatch metric named “Connections”. This metric will use “EC2-Connections” as the namespace and “Count” as the unit. The count value is the number of established connections. Count collected by using the netstat command on an EC2 instance. If your application supports it, this connection value can also be derived from your applications.

The following example shows how to set the dimensions of Connections, our custom CloudWatch metric, to Name=InstanceId. We will retrieve the value of InstanceId retrieved from Amazon EC2 metadata. For detailed information on publishing custom CloudWatch Metrics, review publish custom metrics documentation.

publish custom metrics

Figure 3: publishing custom metrics

In the console, verify that the published custom metric is in CloudWatch by selecting CloudWatch > All metrics. You should see it in under Custom namespaces.

View custom metrics

Figure 4: Viewing custom metrics

This custom CloudWatch metric ‘Connections’ from the ‘EC2-Connections’ namespace is an input to our Lambda function.

Lambda function details

This Python Lambda function parses an environmental variable named TARGET_GROUP_ARN and the OldStateValue, NewStateValue, Target ID from the payload of the Amazon SNS topic. Then, it checks the Alarm state and deregisters the target if the new state is alarm. Then, it registers the target back if the new state is OK.

import boto3
import json
import logging
import os

logger = logging.getLogger()
logger.setLevel(logging.INFO)
target_group_arn = os.environ['TARGET_GROUP_ARN']

def lambda_handler(event, context):
    logger.info("Event: " + str(event))
    message = json.loads(event['Records'][0]['Sns']['Message'])
    logger.info("Message: " + str(message))

    alarm_name = message['AlarmName']
    old_state = message['OldStateValue']
    new_state = message['NewStateValue']
    #reason = message['NewStateReason']
    target_id = message['Trigger']['Dimensions'][0]['value']
    elbv2_client = boto3.client('elbv2')

    # Check Alarm State
    if new_state == 'ALARM' and old_state == 'OK':
        #Check if instance is still in the Target Group. if not, do nothing
        h_response = elbv2_client.describe_target_health(
            TargetGroupArn=target_group_arn,
            Targets=[
                {
                    'Id': target_id
                }
            ]
        )

        # Deregister the target from the target group
        response = elbv2_client.deregister_targets(
            TargetGroupArn=target_group_arn,
            Targets=[
                {
                    'Id': target_id
                }
            ]
        )

        # Check if the target was successfully deregistered
        if response['ResponseMetadata']['HTTPStatusCode'] == 200:
            print('Target {} successfully deregistered from target group {}.'.format(target_id,target_group_arn))
        else:
            print('Error: Target {} could not be deregistered from target group {}.'.format(target_id,target_group_arn))

    elif new_state == 'OK' and old_state == 'ALARM':
        # Register the target back to the target group
        r_response = elbv2_client.register_targets(
            TargetGroupArn=target_group_arn,
            Targets=[
                {
                    'Id': target_id
                }
            ]
        )
        # Check if the target was successfully registered
        if r_response['ResponseMetadata']['HTTPStatusCode'] == 200:
            print('Target {} successfully registered from target group {}.'.format(target_id, target_group_arn))
        else:
            print('Error: Target {} could not be registered from target group {}.'.format(target_id, target_group_arn))
    else:
        print('New Alarm State is {}, Old Alarm State is {}. No Action Needed'.format(new_state, old_state))

Deploying the solution

You can deploy this solution into your AWS account using an AWS CloudFormation template.

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account.
  • An existing NLB and target group with Amazon EC2 targets that have connection limits. For details on how to do this, refer to the Create a target group for your Network Load Balancer entry in our documentation.
  • A custom CloudWatch metric as described previously to provide input to the following CloudFormation template.

Deploying through CloudFormation template

In this section we will deploy the following:

  • AWS Identity and Access Management (IAM) role for Lambda
    • lambda-target-register-role
    • Allows writing to CloudWatch logs, publish and subscribe permissions to Amazon SNS, register and de-register targets permissions.
  • Lambda function. Refer to the “Go over the Lambda Function” section for the code of the Lambda function.
    • NLBTargetRegister
  • Amazon CloudWatch Alarm to send notifications to the following Amazon SNS Topic:
    • TgRegistrationAlarm
  • Amazon SNS Topic that is subscribed to the previous Lambda function:
    • TGRegistrations
  • Amazon CloudWatch logs
    • target-deregister, with seven days retention

Steps to deploy the CloudFormation template

  1. Download the yaml file.
  2. Navigate to the CloudFormation console in your AWS account.
  3. Choose Create stack.
  4. Choose Template is ready, upload a template file, and navigate to the yaml file that you just downloaded.
  5. Choose Next.
  6. Give the stack a name (max. length 30 characters), and select Next.
  7. For parameter ‘TargetGroupARN’ enter the ARN of the target group that needs to be scaled, for parameter ‘CustomMetricNamespace’ enter a custom namespace identifier to publish custom metrics to, and for parameter ‘CustomMetricName‘ enter the application connections custom metric name in CloudWatch.
  8. Add tags if desired, and select Next.
  9. Scroll to Capabilities at the bottom of the screen, and check the box I acknowledge that AWS CloudFormation might create IAM resources with custom names, and then Create stack.
  10. Wait for the stack creation to complete.

Once deployed successfully, whenever a target in the specified target group exceeds 90% of the max connections allowed, an alarm is generated that sends a notification to Amazon SNS and invokes a Lambda function to deregister a target from the target group. In this case, the alarm value was set at 90 (90% of the maximum connections possible) a target can support, but you should choose the value that works for your use case.

If you plan to run this solution for multiple target groups in an AWS account and Region, then review any naming conflicts that could occur in the Lambda, Amazon SNS, IAM and CloudWatch Alarm resources.

At this point, you have created a CloudWatch Alarm that invokes an Amazon SNS topic when it detects 10 or more connections. The Amazon SNS topic invokes a Lambda function to run a Python script that deregisters the instance from the Amazon Elastic Load Balancer (Amazon ELB) Target Group that was specified in the environmental variable of the Lambda function

Cost considerations

This solution uses a Lambda function that makes API calls. It also creates a CloudWatch alarm and Amazon SNS topic to notify on threshold breach. All pricing details are available on the Amazon SNS, Amazon CloudWatch, and AWS Lambda pricing pages.

Cleanup

If you decide that you no longer want to keep the dashboard and associated resources, then you can navigate to CloudFormation in the AWS Management Console, choose the stack you deployed earlier, and choose Delete. Once that finishes, all of the resources you created should be deleted. Should you want to add this cleanup mechanism back in at any point, you can create a stack again from the CloudFormation yaml.

Conclusion

This solution helps deregister and re-register targets from your NLB target groups based on alarms set for your application connection limits. An alternative way to achieve target group scaling by connections is by running the same Lambda code on your target EC2 instances as a script to deregister and re-register itself to the target group based on connection limits tracked on the instance.

About the authors

Jamie Wenzel

Jamie is a Principal SA networking specialist in the EC2 Networking. Jamie is part of the application networking organization contributing to the design of application networking products and services. He is an avid public speaker at re:invent, re:inforce, lofts, summits and twitch. He has been with amazon for 6+ years and is passionate about helping people and organizations in their cloud journeys.

Scott Chang

Scott Chang is a Solutions Architect at AWS based in San Francisco. He has over 14 years of hands-on experience in Networking also familiar with Security and Site Reliability Engineering. He works with one of major strategic customers in west region to design highly scalable, innovative and secure cloud solutions.

Karthik Chemudupati

Karthik Chemudupati is a Principal Technical Account Manager (TAM) with AWS, focused on helping customers achieve cost optimization and operational excellence. He has more than 18 years of IT experience in software engineering, cloud operations and automations. Karthik joined AWS in 2016 as a TAM and worked with more than dozen Enterprise Customers across US-West. Outside of work, he enjoys spending time with his family.