Using AWS Lambda with Auto Scaling Lifecycle Hooks

Nathan Mcguirt, AWS Solution Architect

Using automation to extend Auto Scaling functionality

Auto Scaling provides customers a great way to dynamically scale applications, and we frequently meet customers with new and interesting use cases who want to extend Auto Scaling with additional actions. For example, notifying an auditing system of a new instance launch or taking some sort of extra action on the instance like attaching a secondary network interface. To support these types of use cases, Auto Scaling supports adding a hook to the launching and terminating stages of the Auto Scaling instance lifecycle, which will send an SNS notification and then hold the instance in a pending state waiting for a callback to the API. It also includes a user configurable timeout and default action if your external operation doesn’t complete in a timely fashion or returns an error.

By attaching a Lambda function to the lifecycle hook by way of SNS, we can add a virtually limitless number of custom actions to our Auto Scaling group. For example, we recently had a customer with a requirement for their Auto Scaling instances to have a secondary network interface in an isolated administrative subnet in order to meet their compliance requirements. They achieved this by using a lifecycle hook to a Lambda function that created an Elastic Network Interface (ENI) in the appropriate subnet and attached it to the instance. For this use case, it was critical that that the instance have the secondary ENI before going into production, and they were able to meet this requirement by configuring the lifecycle hook to abandon the instance after a timeout period or if the Lambda function were to fail.

Demonstration: Adding Secondary Elastic Network Interface to Auto Scaling Instances

In this post, we’ll walk through a demonstration of how to implement the above use case of adding a secondary network interface to Auto Scaling instances. A caveat for console users, the functionality to add a lifecycle hook to an Auto Scaling group is available through the API only, so we’ll be using the AWS CLI for this demonstration rather than the AWS Console.

Prerequistites and Caveats

If you’re doing to follow along in your own account, you should have the following completed first:

Your VPC, Subnets and LaunchConfig are prepared and you have their ID’s recorded.
Your Auto Scaling group is configured with a desired size of 0 (the hook will only apply to new instances joining the pool.)
You have a subnet and security group for the secondary ENI configured (and the subnet is in the same availability zone as the one you’re using for your Auto Scaling instances)
You have AWS CLI installed and configured, and are using credentials with sufficient permissions.

A couple of caveats before we begin, the secondary ENI must be in the same availability zone the instance was launched in (the same zone as the primary ENI). For the sake of simplicity, the code we’re using in this demo isn’t aware of multiple subnets and availability zones. It will attempt to create the secondary ENI in whatever subnet is passed to it. If you’re going to do this in production, you’ll also likely want to configure a second lifecycle hook on instance termination and a Lambda function to delete the secondary ENI of the terminating instance. Also, pay attention to naming and capitalization as you create resources. The names of Auto Scaling groups and Lambda functions are case sensitive. You can name these resources however you wish, but consistent naming will help you avoid troubleshooting later.

Part 1: Configure the Notification Topic

First, we’ll need to create the Simple Notification Service notification topic.

$ aws sns create-topic --name ENI-Demo-Topic

In the response, Note the ARN of the topic for later use.

{
"TopicArn": "arn:aws:sns:us-west-2:012345678901:ENI-Demo-Topic"
}

Part 2: Configure an IAM role to allow posting to the SNS topic

The lifecycle hook uses an IAM role to send messages to SNS, so before we can create the hook we need to have the role prepared. To create the IAM role, we’ll need two policies, the first is a trust policy allowing Auto Scaling to assume the role, and the second gives the role permission to publish to the SNS topic. We’ll create the role in two steps, first creating the role and submitting the trust policy, and the second to apply the inline policy.

With the CLI, it’s a sometimes easier to first put the policies into text files to manage them, rather than trying to escape them in your shell, so we’ll do that here. We can specify these files on the command line later. The assume role policy (sometimes called trust policy) should look as follows, and we’ll save it in a text file called SNS-Role-Trust-Policy.json.

{
  "Version": "2012-10-17",
  "Statement": [ {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "autoscaling.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
  } ]
}

The next policy will be the inline policy granting the role access to publish to our SNS topic, and should look as follows. We’ll save it as SNS-Role-Inline-Policy.json for the example. Don’t forget to replace the example ARN with the one from your SNS topic from step 2.

{
  "Version": "2012-10-17",
  "Statement": [ {
      "Effect": "Allow",
      "Resource": "arn:aws:sns:us-west-2:012345678901:ENI-Demo-Topic",
      "Action": [
        "sns:Publish"
      ]
  } ]
}

With these policies ready, we can make the call to create the role as shown below, specifying the text files with the policy documents.

$ aws iam create-role \
--role-name ENI-Demo-Topic-Publisher-Role \
--assume-role-policy-document file://SNS-Role-Trust-Policy.json

And we then apply the inline policy to it.

$ aws iam put-role-policy \
--role-name ENI-Demo-Topic-Publisher-Role \
--policy-name AllowPublishToEniDemoTopic \
--policy-document file://SNS-Role-Inline-Policy.json

Part 3: Configure the Lambda Function’s IAM role

With the lifecycle hook in place, the next step is to configure the Lambda function. Lambda functions need an IAM role to give them their execution permissions, so we’ll start there. If you are using the CloudFormation sample, you can skip this, the IAM role has already been configured by CloudFormation for you.

First, like the above IAM role, we’ll need some policy documents, and will go ahead and put them in text files first for ease of use. We’ll start with the role’s Assume Role Policy (Trust Policy) which we’ll save as Lambda-Role-Trust-Policy.json, and it’s contents should be as follows:

{
  "Version": "2012-10-17",
  "Statement": [ {
      "Sid": "",
      "Effect": "Allow",
      "Principal": {
        "Service": "lambda.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
  } ]
}

Next we need the inline policy that defines what the Lambda function is allowed to do. We’ll save it as Lambda-Role-Inline-Policy.json, and it should look like the below.

{
  "Version": "2012-10-17",
  "Statement": [ {
      "Effect": "Allow",
      "Resource": "*",
      "Action": [
        "ec2:DescribeInstances",
        "ec2:CreateNetworkInterface",
        "ec2:AttachNetworkInterface",
        "ec2:DescribeNetworkInterfaces",
        "autoscaling:CompleteLifecycleAction"
      ]
  },
  {
    "Action": [
      "logs:CreateLogGroup",
      "logs:CreateLogStream",
      "logs:PutLogEvents"
    ],
    "Resource": "arn:aws:logs:*:*:*",
    "Effect": "Allow"
  } ]
}

With the policies prepared, we can create the role, specifying the Assume Role (trust) Policy.

$ aws iam create-role \
--role-name ENI-Demo-Lambda-Role \
--assume-role-policy-document file://Lambda-Role-Trust-Policy.json

Note the ARN of the new role, we’ll need it when we go to set up the Lambda function. Once the role is created, we apply the inline policy, specifying the file we created earlier.

$ aws iam put-role-policy \
--role-name ENI-Demo-Lambda-Role \
--policy-name CreateAndAttachEnisWithLogging \
--policy-document file://Lambda-Role-Inline-Policy.json

Part 4: Put the lifecycle hook

Next, we’ll put the lifecycle hook on the Auto Scaling group. Lifecycle hooks support a metadata field that can be used to embed information specific to the hook in the message. In this case, we’ll specify the resource ids of the admin subnet and security groups, which are specific to this Auto Scaling group. This way, our Lambda function could be used across multiple different Auto Scaling groups. These metadata items should reflect the subnet and security groups you’d like to use for the secondary ENI. For ease of use on the CLI, we’ll write that metadata to a text file as we did with IAM. It should look like the below, and we’ll save it as Lifecycle-Hook-Metadata.json.

{
  "SubnetId":"subnet-abcdefg0",
  "SecurityGroups":["sg-abcdefg0"]
}

Now, we can put the lifecycle hook.

$ aws autoscaling put-lifecycle-hook \
--notification-metadata file://Lifecycle-Hook-Metadata.json \
--lifecycle-hook-name ENI-Demo-Hook \
--auto-scaling-group-name ENI-Demo-ASG \
--notification-target-arn arn:aws:sns:us-west-2:0123456789012:ENI-Demo-Topic \
--role-arn arn:aws:iam::0123456789012:role/ENI-Demo-Topic-Publisher-Role \
--lifecycle-transition autoscaling:EC2_INSTANCE_LAUNCHING \
--heartbeat-timeout 60

Part 5: Create the Lambda function

With the rest of the building blocks in place, we can create the lambda function. We’ll include the main function here, but our sample code uses an extra library called Async to make it easier to have node.js do things like wait for object an to be ready. External packages are supported in Lambda by adding the additional modules along with your code and package them in a zip file. Instructions on how to package the code and module are available in our previous blog post on using packages in AWS Lambda. You’ll need to use Async version greater than 1.0.0 for this sample.

Below is the sample code for our lambda function. It receives the message from SNS as a parameter in JSON format and unpacks the message to get the required parameters from Auto Scaling, such as the instance, subnet, and security group IDs. With this data, the Lambda function can create the network interface. Once the interface has been created, the function waits, polling for the interface state to show ‘available’ and then attaches it to the instance.

// This is sample Node.js code for AWS Lambda, to attach a secondary Elastic 
// Network Interface to an instance. To use this function, create an Auto Scaling
// lifecycle hook on instance creation notifying a SNS topic, and 
// subscribe the lambda function to the SNS topic.
// Sane values for Memory and Timeout are 128MB and 30s respectively.


var AWS = require('aws-sdk');
var ec2 = new AWS.EC2();
var as = new AWS.AutoScaling();

var async = require('async');

exports.handler = function (notification, context) {
  // Log the request
  console.log("INFO: request Recieved.\nDetails:\n", JSON.stringify(notification));
  var message = JSON.parse(notification.Records[0].Sns.Message);
  var metadata = JSON.parse(message.NotificationMetadata);
  console.log("DEBUG: SNS message contents. \nMessage:\n", message);
  console.log("DEBUG: Extracted Message Data\nData:\n", metadata);

  // Pull out metadata
  var instanceId = message.EC2InstanceId;
  var subnetId = metadata.SubnetId;
  var securityGroups = metadata.SecurityGroups;

  //define a closure for easy termination later on
  var terminate = function (success, err) {
    var lifecycleParams = {
      "AutoScalingGroupName" : message.AutoScalingGroupName,
      "LifecycleHookName" : message.LifecycleHookName,
      "LifecycleActionToken" : message.LifecycleActionToken,
      "LifecycleActionResult" : "ABANDON"
    };
    //log that we're terminating and why
    if(!success){
      console.log("ERROR: Lambda function reporting failure to AutoScaling with error:\n", err);
    }else{
      console.log("INFO: Lambda function reporting success to AutoScaling.");
      lifecycleParams.LifecycleActionResult = "CONTINUE";
    }
    //call autoscaling
    completeAsLifecycleAction (lifecycleParams, function lifecycleActionResponseHandler (err){
      if(err){
        context.fail();
      }else{
        //if we successfully notified AutoScaling of the instance status, tell lambda we succeeded
        //even if the operation on the instance failed
        context.succeed();
      }
    });
  }; 
    
  //Create the interface and wait for it to be ready
  createEni(subnetId, securityGroups, function CreateEniCallback(err, eniId){
    if(err){
      console.log("ERROR: Could not create ENI. Errors:\n", err);
      terminate(false,err);
    }
    //Wait for the ENI to be 'available'
    waitEniReady(eniId, function waitEniReadyCallback (err){
      if(err){
        console.log("ERROR: Failure waiting for ENI to be ready");
        terminate(false,err);
      }
      //attach it to the instance
      attachNetworkInterface(eniId, instanceId, function attachNetworkInterfaceCallback(err,data){
        if(err){
          console.log("ERROR: Could not attach ENI. Error Data:\n", err);
          terminate(false,err);
        }else{
          console.log("INFO: Successfully attached ENI");
          terminate(true, err);
        }
      });
    });
  });
};

function attachNetworkInterface (networkInterfaceId, instanceId, callback){
  //Attaches an ENI, passes the AttachmentId to callback.
  var nic_params = {
    'NetworkInterfaceId' : networkInterfaceId,
    'InstanceId' : instanceId,
    'DeviceIndex' : 1 // Should be safe to assume index 1 is available
  };
  ec2.attachNetworkInterface(nic_params, function evaluateEniAttachment(err,data) {
    if (err) {
      console.log("ERROR: ENI Attachment failed.\nDetails:\n", err);
      callback(err, null);
    } 
    console.log("INFO: ENI Attached.\nDetails:\n", data);
    callback(null, data.AttachmentId);
  });
}

function createEni(subnetId, securityGroups, callback){
  //Create a network interface, pass the Interface ID to callback 
  var eniCreationParams = {
    "SubnetId":subnetId,
    "Groups":securityGroups
  };
  console.log("DEBUG: CreateEni Params:\n",eniCreationParams);
  ec2.createNetworkInterface(eniCreationParams, function createEniCallback(err, data) {
    if (err) {
      console.log("ERROR: ENI creation failed.\nDetails:\n", err);
      return callback(err, null);
    } 
    console.log("INFO: ENI Created.\nData:\n", data);
    return callback(null, data.NetworkInterface.NetworkInterfaceId);
  });
}

function waitEniReady (eniId, waitEniReadyCallback){
  //terminate is the termination function if there's an issue.
  var getEniParams={
    "NetworkInterfaceIds":[
      eniId
    ]
  };
  console.log("INFO: Waiting on ENI to be ready:", eniId);
  var eniStatus = undefined;
  async.until(
    function isReady (err) { return eniStatus === "available"; },
    function getEniStatus(getEniStatusCallback){
      ec2.describeNetworkInterfaces(getEniParams,function handleGetEniResponse(err,data){
        eniStatus = data.NetworkInterfaces[0].Status;
        console.log("DEBUG: ENI status is:", eniStatus);
        getEniStatusCallback(err);
      });
    },
    function waitEniReadyCallbackClosure(err){
      if(err){
        console.log("ERROR: error waiting for ENI to be ready:\n",err);
      }
      waitEniReadyCallback(err);
    }
  );
}

function completeAsLifecycleAction(lifecycleParams, callback){
  //returns true on success or false on failure
  //notifies AutoScaling that it should either continue or abandon the instance
  as.completeLifecycleAction(lifecycleParams, function(err, data){
    if (err) {
      console.log("ERROR: AS lifecycle completion failed.\nDetails:\n", err);
      console.log("DEBUG: CompleteLifecycleAction\nParams:\n", lifecycleParams);
      callback(err);
    } else {
      console.log("INFO: CompleteLifecycleAction Successful.\nReported:\n", data);
      callback(null);
    }
  });
}

Below are the CLI commands to create the lambda function. In a later part, we’ll have to come back and set the required permissions on it to allow SNS to invoke the function, but for now we’re just going to create it. We’ll create this function with the name ENI-Demo-Lambda-Func, configure the function to use the IAM role, and set the timeout for 30 seconds. The timeout should allow plenty of time for the resources to create and become ready during execution, in most cases the function will not require that much time. Note the fileb:// prefix on the URI for the zip file.

aws lambda create-function \
--function-name ENI-Demo-Lambda-Func \
--zip-file fileb://ENI-Demo-Lambda-1-0.zip \
--runtime nodejs \
--role arn:aws:iam::012345678901:role/ENI-Demo-Lambda-Role \
--handler ENI-Demo-Lambda-1-0.handler \
--timeout 30

Part 6: Subscribe the Lambda function to the SNS topic

Now that the lambda function has been created, we need to subscribe it to the SNS topic so that it will recieve the messages from the lifecycle hook.

To subscribe the Lambda function to the SNS topic, we’ll call the SNS Subscribe action, specifying the ARN for the topic and the Lambda function, with Lambda as the protocol. The command looks like this:

aws sns subscribe --protocol lambda \
--topic-arn arn:aws:sns:us-west-2:012345678901:ENI-Demo-Topic \
--notification-endpoint arn:aws:lambda:us-west-2:012345678901:function:ENI-Demo-Lambda-Func

Part 7: Grant permissions on the lambda function to the SNS topic

And Finally, with the function created and subscribed, we can set the permissions on it that allow the SNS topic to invoke the function. This is done with the AddPermission call to Lambda.

aws lambda add-permission \
--function-name ENI-Demo-Lambda-Func \
--statement-id 1 \
--action "lambda:InvokeFunction" \
--principal sns.amazonaws.com \
--source-arn arn:aws:sns:us-west-2:012345678901:ENI-Demo-Topic

Testing and Reviewing the Logs

To test, simply edit your Auto Scaling group to increase the desired size, causing an instance to be added to the group. Aside from just waiting to see the secondary ENI, you can monitor and troubleshoot in a few different ways. From Auto Scaling, you can describe the instance status for the group. New instances will be in a pending state until the function succeeds, fails or times out. If successful, they’ll show as in service. If it fails or times out, they’ll be abandoned and replaced with a new instance (that’s the default behavior we configured above.) From CloudWatch, you can look at the Invocation and Errors metric for the function (or view graphs from the Lambda web console.) If the Lambda function has CloudWatch logs access, which is included in the example policy above, the Lambda function will create a log group for itself and then each function execution will create a new log stream, creating log events for any output from the function. The example code here is configured for detailed logging of it’s actions. We’ll demonstrate what that looks like here.

First, we need to find the appropriate log stream. The following command will list the log streams within the Log Group for the function.

aws logs describe-log-streams --log-group /aws/lambda/ENI-Demo-Lambda-Func

From the output, choose the execution you want to review and run the following command, using the log-stream from the previous step.

$ aws logs get-log-events --log-group-name /aws/lambda/ENI-Demo-Lambda-Func \
--log-stream-name 2015/08/11/[HEAD]0123456789abcdef0123456789abcdef \
--start-from-head

And below is a short sample of what the output looks like:

{
    "ingestionTime": 1439317268902, 
    "timestamp": 1439317254204, 
    "message": "2015-08-11T18:20:54.178Z12345678-0123-0123-0123-0123456789ab\tDEBUG: ENI status is: available\n"
}, 
{
    "ingestionTime": 1439317268902, 
    "timestamp": 1439317254584, 
    "message": "2015-08-11T18:20:54.582Z12345678-0123-0123-0123-0123456789ab\tINFO: ENI Attached.\nDetails:\n { AttachmentId: 'eni-attach-01234567' }\n"
}

AWS Compute Blog