AWS Partner Network (APN) Blog

Automatically Delete Terminated Instances in Chef Server with AWS Lambda and CloudWatch Events

If you’re struggling with cleaning up terminated instances registered with third-party services, you can use a combination of several AWS services to help out. A common example is dealing with the remnants of terminated instances from Auto Scaling groups.  Throughout this post, I’ll talk about how you can use AWS Lambda and Amazon CloudWatch Events to automatically remove instances from Chef Server when an EC2 Instance is terminated.  Though I’ll discuss a specific use case around Chef Server, you can also apply this pattern to other, similar situations where this might be useful.

CloudWatch Events allows us to define a rule to trigger an AWS Lambda function any time an instance enters the “terminated” state.  We can then use the Lambda function to talk to a Chef Server instance and delete that instance so we aren’t keeping track of things that no longer exist. I’ll use AWS Key Management Service (KMS) to encrypt sensitive data, and allow the Lambda function to decrypt it.

This post assumes the reader is already familiar with Chef Server, Lambda, CloudWatch Events, AWS Identity and Access Management (IAM), and KMS. The rest of the post provides a high-level discussion of using these technologies together. To accompany the post, I’ve created a working solution you can reference when building your own.

If you’re not yet a Chef Server user be sure to check out the Chef Server Quick Start, which lets you get up and running with a free trial of Chef Server in less than an hour.

Before We Begin

The post and reference code assumes that you’re using Chef Server version 12.  You’ll need to create a user in Chef Server with the ability to query for and delete nodes.  You’ll also need the private key (.pem file) for this user, which we will encrypt in the next section.  I recommend that you create a new user for this and not use an existing user.

The Lambda function described below, and in the sample code, is expecting that all nodes/instances managed by Chef have an attribute called ec2_instance_id with a value set to the EC2 instance ID (e.g., i-abcde123). Alternatives to this approach are described in the sample code, but ultimately, as long as you’re storing instance IDs somewhere in Chef Server, this approach should work.

You’ll also need to create an IAM role for your Lambda function and add the AWSLambdaBasicExecutionRole AWS managed policy to the role.  You shouldn’t need to add any other policies to the role.

Encrypt the Chef User’s Private Key

Chef Server uses public key cryptography to authenticate API requests. This requires the client to sign a hash of requests using a valid private key. In this example, we’ll use KMS to encrypt a copy of our Chef Server certificate (with its associated private key) and then decrypt it on the fly with the Lambda function as needed. This allows us to safely store our Chef Server credentials at rest in encrypted form without the risk of unauthorized users discovering the decryption key needed to access the credentials.

    1. Create a customer master key (CMK) in KMS and note the keyId that is automatically generated. Make sure the IAM user you want to use for encrypted the Chef Server certificate and the Lambda role created in the previous section are added as a key users in KMS. Your IAM user needs kms.encrypt permissions to encrypt the certificate, while your Lambda user (via an IAM role) needs kms.decrypt permissions at runtime to access the certificate.
    2. Encrypt your Chef Server certificate with the CMK you created in step 1.  For example, using the AWS CLI tools, type the command:
aws kms encrypt --key-id KEY_ID_FROM_STEP_1 –plaintext 

3. You will receive a response with a CiphertextBlob if successful.  An example of a successful response will look like this:

    "KeyId": "arn:aws:kms:us-east-1:123456789000:key/14d2aba8-

4. Copy this CiphertextBlob into a new file and store it in the same directory as the Lambda function; this is required so it can be packaged up with the function itself. I’ve used encrypted_pem.txt as the file name in my example, given the encrypted object is a certificate and private key, which is commonly name with the .pem file extension. Note the CiphertextBlob output is base64 encoded by the AWS CLI unless you send the output to a binary file using the fileb:// parameter. See the AWS KMS CLI help for more information on input and output encoding.

Considerations for the Lambda Function

Your Lambda function will need to authenticate with your Chef Server in order to query for and delete nodes.  The Chef Server authentication documentation lists information for helping authenticate your client (Lambda function).  Note that the recommended Python module (PyChef) does not support Amazon Linux, which Lambda runs on, as of this post, but a quick fix can be added before uploading your Lambda function to correct this.  See this comment or the sample code for more information.  You can also implement your own authentication provider in the language of your choice using the Chef documentation.

To make requests to the Chef Server you’ll need the URL (e.g., https://chefserver.domain/organizations/aws), the username of a user with permissions (described above), and the encrypted private key from KMS we made above.  Your function should already have the required permissions to decrypt the private key if you followed the previous sections.  You will need to add some code to decrypt the private key that we encrypted in the previous section.  You can find an example of decrypting a KMS key in the KMS documentation, or check out the Slack Integration Blueprints for AWS Lambda for a great example.  The only other thing left to do is to query for and delete the nodes from Chef Server. I’ll discuss that next.

The CloudWatch Event rule, which we will set up last, will give us the instance ID of the terminated instance when triggered. No other unique identifying attributes (i.e., IP addresses or FQDN) will be given, so we need to make sure that the instance ID is present within Chef’s metadata.  This is why I previously recommended that you include the ec2_instance_id attribute for each Chef node, if you aren’t already storing the instance IDs elsewhere.  Again, there are other approaches, such as using AWS Config, for retrieving metadata about the terminated instance to use to identify a node.

The Lambda function should look at the event object passed in, which will contain data from CloudWatch Events, and parse it to retrieve the instance ID.  In Python, this can be done with event['detail']['instance-id'].  With the instance ID, you can now make your Chef Server request to delete that node.  For example, in PyChef, you can use either the Nodes or the Search interface, as appropriate, to find the nodes and then delete them.

To deploy the Lambda function, you can use AWS CloudFormation or your tool of choice. I recommend using a tool like CloudFormation so you can easily deploy the function in multiple regions and automatically version your function as changes are made.  The sample code uses Hashicorp’s Terraform to deploy the Lambda function and provision other infrastructure components like IAM roles and the CloudWatch Event rule, which we’ll discuss next.

CloudWatch Event Rule

To tie it all together, we need a CloudWatch Event that triggers the Lambda function created above whenever an instance is terminated.  From the CloudWatch console:

  1. Select Events, and then choose Create rule.
  2. For event source, choose EC2 instance state change notification. Choose Specific state(s), and then Terminated.  You can leave the Any instance option selected.
  3. In the Targets section, choose Add Target, and then make sure that Lambda function is selected. Make sure the Function box has the name of the Lambda function you created above.  Your configuration should look something like this:

  1. To finish setting up the event rule, choose Configure details, and then give your rule a name and description.
  2. Finally, choose Create rule.

Now any time an instance is terminated, the Lambda function will run and delete the node from Chef Server, which keeps things neat and tidy.

Wrapping Up

By combining the forces of AWS Lambda, CloudWatch Events, and KMS, we’ve created a simple solution to keep our Chef Server organized and up to date automatically. You can also apply this process to other situations that require automatic cleanup after terminated instances. This is just another example that demonstrates the usefulness of combining AWS services to simplify maintaining your DevOps infrastructure.