AWS Cloud Operations & Migrations Blog

Distributing your AWS OpsWorks for Chef Automate infrastructure

Organizations that manage many nodes over larger geographical AWS Regions may wish to reduce latency and load between nodes in their AWS OpsWorks for Chef Automate implementation. By distributing nodes between multiple servers, organizations encounter the challenge of how to ensure that cookbooks and other configurations are consistently deployed across two or more Chef Servers residing in one or more Regions. To accomplish this, customers can make use of several supplemental AWS services that will drive the process of distributing cookbooks to one or more Chef Automate instances.

Overview

One situation large-scale Chef users might run into is that the number of nodes managed by their OpsWorks for Chef Automate (OWCA) server exceeds its capacity. Customers can switch to a bigger Amazon EC2 instance by using their most recent backup, but this limit may be reached at some point. Additionally, latency in communications may be experienced in globally-distributed environments.

This is easy to overcome by distributing management of nodes to multiple OpsWorks for Chef Automate instances. However, this approach introduces the challenge to synchronize cookbooks across multiple OWCA instances. This can be accomplished with the use of AWS CodeCommit, AWS CodeBuild, AWS CodePipeline, and optionally AWS Lambda. For this blog post, we will show you how customers can scale their Chef-managed infrastructure across two Regions.

By using CodePipeline and CodeCommit, a cookbook developer simply has to push changes to a central repository. CodePipeline will trigger off this commit and send the updated repository contents to CodeBuild for processing. With simple scripting, CodeBuild will pull dependencies from Chef Supermarket, and upload the needed cookbooks to each Chef Automate instance in the account (or accounts). By using the chef-client cookbook, nodes will check in automatically on a pre-configured schedule. In this blog post, an Invoke stage in the pipeline can use AWS Lambda and AWS Systems Manager to run chef-client on any nodes in the environment. This step is optional, but useful for testing scenarios where it is helpful to deploy changes more rapidly.

Setup

To set this up, we use an AWS CloudFormation template. We will walk through each resource to be added to the template. Prior to doing so, at least one OpsWorks for Chef Automate instance must be created. The starter kit for each instance will contain a private key in the [Starter-Kit]/.chef/ directory. Though this key can be used for authentication from CodeBuild, we recommend that you create a separate user in the Chef Automate console, and assign it a public/private key pair. Follow the instructions on Chef.io for reference. At minimum, this user account will require Committer permissions. The private key for this user can be saved in an Amazon S3 bucket in your account, which will be accessed by CodeBuild to authenticate with the Chef Automate server during the cookbook upload process. It’s important to ensure that access to this bucket is tightly controlled with appropriate bucket policies. An alternative to consider would be storing the keys in AWS Key Management Service (KMS).

Components

Parameters

This template requires only one input parameter, KeyBucket, which corresponds to the Amazon S3 bucket that contains the private keys for each Chef Automate instance to be synchronized with this cookbook repository.

{
    "Parameters": {
        "KeyBucket": {
            "Type": "String",
            "Description": "Name of S3 bucket which contains the OWCA private keys.",
            "AllowedPattern": "^[a-z0-9][a-z0-9-.]*$",
            "MinLength": 3,
            "MaxLength": 63,
            "ConstraintDescription": "Please provide a valid S3 bucket name."
        }
    }
}

IAM Permissions

Two AWS Identity and Access Management (IAM) roles will be needed for the pipeline to function correctly.

The first IAM role, BuildRole, will be used by CodeBuild during build tasks, and will need the default permissions for CodeBuild containers. Additionally, the role will need access to the Amazon S3 bucket containing the private keys for authentication with each Chef Automate instance (referenced in the Parameters section of the template).

The second IAM role, FunctionRole, will be used by AWS Lambda to execute chef-client on each instance in the environment. In addition to the Amazon CloudWatch Logs permissions required by Lambda execution roles, this role will require the ability to send commands via AWS Systems Manager.

{
    "Resources": {
        "BuildRole": {
            "Type": "AWS::IAM::Role",
            "Properties": {
                "AssumeRolePolicyDocument": {
                    "Version": "2012-10-17",
                    "Statement": [ {
                        "Effect": "Allow",
                        "Principal": {
                            "Service": [ "codebuild.amazonaws.com" ]
                        },
                        "Action": [ "sts:AssumeRole" ]
                    } ]
                },
                "Path": "/",
                "Policies": [ {
                    "PolicyName": "CodeBuildS3WithCWL",
                    "PolicyDocument": {
                        "Version": "2012-10-17",
                        "Statement": [
                            {
                                "Effect": "Allow",
                                "Action": [
                                    "s3:Get*",
                                    "s3:List*"
                                ],
                                "Resource": [
                                    { "Fn::GetAtt": [ "ArtifactBucket", "Arn" ] },
                                    {
                                        "Fn::Join": [ "", [
                                            { "Fn::GetAtt": [ "ArtifactBucket", "Arn" ] },
                                            "/*"
                                        ] ]
                                    },
                                    {
                                        "Fn::Join": [ "", [
                                            "arn:aws:s3:::",
                                            { "Ref": "KeyBucket" }
                                        ] ]
                                    },
                                    {
                                        "Fn::Join": [ "", [
                                            "arn:aws:s3:::",
                                            { "Ref": "KeyBucket" },
                                            "/*"
                                        ] ]
                                    }
                                ]
                            },
                            {
                                "Effect": "Allow",
                                "Resource": [
                                    {
                                        "Fn::Join": [ "", [
                                            "arn:aws:logs:",
                                            { "Ref": "AWS::Region" },
                                            ":",
                                            { "Ref": "AWS::AccountId" },
                                            ":log-group:/aws/codebuild/*"
                                        ] ]
                                    },
                                    {
                                        "Fn::Join": [ "", [
                                            "arn:aws:logs:",
                                            { "Ref": "AWS::Region" },
                                            ":",
                                            { "Ref": "AWS::AccountId" },
                                            ":log-group:/aws/codebuild/*:*"
                                        ] ]
                                    }
                                ],
                                "Action": [
                                    "logs:CreateLogGroup",
                                    "logs:CreateLogStream",
                                    "logs:PutLogEvents"
                                ]
                            },
                            {
                                "Effect": "Allow",
                                "Resource": [ "arn:aws:s3:::codepipeline-*" ],
                                "Action": [
                                    "s3:PutObject",
                                    "s3:GetObject",
                                    "s3:GetObjectVersion"
                                ]
                            },
                            {
                                "Effect": "Allow",
                                "Action": [ "ssm:GetParameters" ],
                                "Resource": {
                                    "Fn::Join": [ "", [
                                        "arn:aws:ssm:",
                                        { "Ref": "AWS::Region" },
                                        ":",
                                        { "Ref": "AWS::AccountId" },
                                        ":parameter/CodeBuild/*"
                                    ] ]
                                }
                            }
                        ]
                    }
                } ]
            }
        },
        "FunctionRole": {
            "Type": "AWS::IAM::Role",
            "Properties": {
                "AssumeRolePolicyDocument": {
                    "Version": "2012-10-17",
                    "Statement": [ {
                        "Effect": "Allow",
                        "Principal": {
                            "Service": [ "lambda.amazonaws.com" ]
                        },
                        "Action": [ "sts:AssumeRole" ]
                    } ]
                },
                "Path": "/",
                "Policies": [ {
                    "PolicyName": "LambdaBasicExecutionWithSSM",
                    "PolicyDocument": {
                        "Version": "2012-10-17",
                        "Statement": [
                            {
                                "Effect": "Allow",
                                "Action": [
                                    "ssm:SendCommand",
                                    "ssm:GetCommandInvocation"
                                ],
                                "Resource": "*"
                            },
                            {
                                "Effect": "Allow",
                                "Action": [
                                    "logs:CreateLogGroup",
                                    "logs:CreateLogStream",
                                    "logs:PutLogEvents"
                                ],
                                "Resource": [
                                    {
                                        "Fn::Join": [ "", [
                                            "arn:aws:logs:",
                                            { "Ref": "AWS::Region" },
                                            ":",
                                            { "Ref": "AWS::AccountId" },
                                            ":log-group:/aws/lambda/*"
                                        ] ]
                                    },
                                    {
                                        "Fn::Join": [ "", [
                                            "arn:aws:logs:",
                                            { "Ref": "AWS::Region" },
                                            ":",
                                            { "Ref": "AWS::AccountId" },
                                            ":log-group:/aws/lambda/*:*"
                                        ] ]
                                    }
                                ]
                            },
                            {
                                "Effect": "Allow",
                                "Action": [
                                    "codepipeline:PutJobSuccessResult",
                                    "codepipeline:PutJobFailureResult"
                                ],
                                "Resource": "*"
                            }
                        ]
                    }
                } ]
            }
        }
    }
}

Amazon S3 bucket

For use in CodePipeline to store artifacts, the ArtifactBucket resource is created and referenced when creating the pipeline itself.

{
    "Resources": {
        "ArtifactBucket": {
            "Type": "AWS::S3::Bucket"
        }
    }
}

CodeCommit repository

The CodeCommit repository being created will act as the Chef Repo to store cookbooks. The repository structure should adhere to the following format:

 .
 ├── .chef
 │   ├── knife.rb
 │   ├── ca_certs
 │   │   └── opsworks-cm-ca-2016-root.pem
 │   └── [CHEF_SERVER_NAME]
 │       └── config.yml
 ├── Berksfile
 ├── buildspec.yml
 └── cookbooks/
{
    "Resources": {
        "Repo": {
            "Type": "AWS::CodeCommit::Repository",
            "Properties": {
                "RepositoryDescription": "Cookbook repository for multiple region OWCA deployment.",
                "RepositoryName": "owca-multi-region-repo"
            }
        }
    }
}

.chef

The .chef/ directory structure is slightly different than what is normally included in the OpsWorks for Chef Automate starter kit. The modifications that follow will allow the knife utility to use environment variables when determining which Chef Automate instance to communicate with and which certificate file to use.

The knife.rb file below uses the yaml gem to parse configuration data from config.yml. The correct configuration file is determined based on the value of the CHEF_SERVER environment variable.

require 'yaml'
 
CHEF_SERVER = ENV['CHEF_SERVER'] || "NONE"
current_dir = File.dirname(__FILE__)
base_dir = File.join(File.dirname(File.expand_path(__FILE__)), '..')
env_config = YAML.load_file("#{current_dir}/#{CHEF_SERVER}/config.yml")

log_level                :info
log_location             STDOUT
node_name                'pivotal'
client_key               "#{current_dir}/#{CHEF_SERVER}/private.pem"
syntax_check_cache_path  File.join(base_dir, '.chef', 'syntax_check_cache')
cookbook_path            [File.join(base_dir, 'cookbooks')]

chef_server_url          env_config["server"]
ssl_ca_file              File.join(base_dir, '.chef', 'ca_certs', 'opsworks-cm-ca-2016-root.pem')
trusted_certs_dir        File.join(base_dir, '.chef', 'ca_certs')

To determine the correct Chef Automate instance to communicate with, each instance should have its own directory underneath .chef/, corresponding to the name of the instance. Within this directory, the config.yml file must follow the below format.

server: 'https://[SERVER_FQDN]/organizations/default'

Currently, each Chef Automate instance directory does not contain the private key needed to communicate with the server. It is not recommended to include authentication information such as SSL/API keys or passwords within files committed to source control systems. Instead, steps have been added to the build process to include the needed keys from S3 instead.

Berksfile

Within the Berksfile, any cookbooks contained in the cookbooks/ directory must be referenced in the format that follows. This will indicate to Berkshelf that the cookbook can be found within the local repository.

# Local Cookbooks
cookbook '[COOKBOOK_NAME]', path: 'cookbooks/[COOKBOOK_NAME]'

Cookbooks that are imported from Chef Supermarket can be included as normal.

source 'https://supermarket.chef.io'
 
# Supermarket Cookbooks
cookbook '[COOKBOOK_NAME]'

buildspec.yml

The buildspec.yml file provides instructions to CodeBuild for downloading dependencies and uploading the cookbooks to each Chef Automate instance. To use the berks command, ChefDK must be installed during the build process. In this example the installation package is downloaded from Chef.io. Alternatively, ChefDK can be packaged and installed preemptively on a custom build environment to reduce build times. As shown in the following example, these are the major steps for the build process:

  1. Download and install ChefDK.
  2. Copy the private keys to authenticate to two Chef Automate instances from S3.
  3. Run berks install to download and install any dependencies.
  4. Upload cookbooks to both Chef Automate instances.
version: 0.2
 
phases:
  install:
    commands:
      - "wget https://packages.chef.io/files/stable/chefdk/1.5.0/ubuntu/14.04/chefdk_1.5.0-1_amd64.deb"
      - "dpkg -i ./chefdk_1.5.0-1_amd64.deb"
  build:
    commands:
      - "aws s3 cp s3://[KEY_BUCKET]/[CHEF_SERVER_1]/private.pem ./.chef/[CHEF_SERVER_1]/private.pem"
      - "aws s3 cp s3://[KEY_BUCKET]/[CHEF_SERVER_2/private.pem ./.chef/[CHEF_SERVER_2]/private.pem"
      - "CHEF_SERVER=[CHEF_SERVER_1] berks install"
      - "CHEF_SERVER=[CHEF_SERVER_2] berks install"
      - "CHEF_SERVER=[CHEF_SERVER_1] berks upload --no-ssl-verify"
      - "CHEF_SERVER=[CHEF_SERVER_2] berks upload --no-ssl-verify"
  post_build:
    commands:
      - "echo 'Complete'"

This build specification will be ingested by CodeBuild during the build stage of the pipeline.

{
    "Resources": {
        "BuildProject": {
            "Type": "AWS::CodeBuild::Project",
            "Properties": {
                "Artifacts": { "Type": "CODEPIPELINE" },
                "Description": "Installs cookbook dependencies from Berksfile and uploads to one or more OWCA servers.",
                "Environment": {
                    "ComputeType": "BUILD_GENERAL1_SMALL",
                    "Image": "aws/codebuild/ubuntu-base:14.04",
                    "Type": "LINUX_CONTAINER"
                },
                "ServiceRole": {
                    "Fn::GetAtt": [ "BuildRole", "Arn" ]
                },
                "Source": { "Type": "CODEPIPELINE" }
            }
        }
    }
}

Lambda function

The Lambda function, ChefClientFunction, uses AWS Systems Manager via Boto3 to call sudo chef-client on a list of instances provided in the function code. The example below uses two instance IDs passed in a list. However, Boto3 can be leveraged further to generate lists of nodes by tags or other relevant properties. This is left as an exercise for the reader.

{
    "Resources": {
        "ChefClientFunction": {
            "Type": "AWS::Lambda::Function",
            "Properties": {
                "Code": {
                    "ZipFile": {
                        "Fn::Join": [ "\n", [
                            "from __future__ import print_function",
                            "",
                            "import boto3",
                            "",
                            "ssm = boto3.client('ssm')",
                            "code_pipeline = boto3.client('codepipeline')",
                            "",
                            "def lambda_handler(event,context):",
                            "    job_id = event['CodePipeline.job']['id']",
                            "",
                            "    try:",
                            "        response = ssm.send_command(",
                            "            InstanceIds=[",
                            "                '[INSTANCE_ID_1]',",
                            "                '[INSTANCE_ID_2]'",
                            "            ],",
                            "            DocumentName='AWS-RunShellScript',",
                            "            Comment='chef-client',",
                            "            Parameters={",
                            "                'commands': ['sudo chef-client']",
                            "            }",
                            "        )",
                            "",
                            "        command_id = response['Command']['CommandId']",
                            "        print('SSM Command ID: ' + command_id)",
                            "        print('Command Status: ' + response['Command']['Status'])",
                            "",
                            "        # Include monitoring of job success/failure as needed.",
                            "",
                            "        code_pipeline.put_job_success_result(jobId=job_id)",
                            "",
                            "        return",
                            "    except Exception as e:",
                            "        print(e)",
                            "",
                            "        code_pipeline.put_job_failure_result(jobId=job_id, failureDetails={'message': e.message, 'type': 'JobFailed'})",
                            "",
                            "        raise e"
                        ] ]
                    }
                },
                "Description": "Executes chef-client on specified nodes.",
                "Handler": "index.lambda_handler",
                "Role": {
                    "Fn::GetAtt": [ "FunctionRole", "Arn" ]
                },
                "Runtime": "python2.7",
                "Timeout": "10"
            }
        }
    }
}

Pipeline

Lastly, the Pipeline deployed will use each of the components to deploy a cookbook to both Chef Automate servers simultaneously.

{
    "Resources": {
        "OWCAPipeline": {
            "Type": "AWS::CodePipeline::Pipeline",
            "Properties": {
                "ArtifactStore": {
                    "Location": { "Ref": "ArtifactBucket" },
                    "Type": "S3"
                },
                "RoleArn": {
                    "Fn::Join": [ "", [
                        "arn:aws:iam::",
                        { "Ref": "AWS::AccountId" },
                        ":role/AWS-CodePipeline-Service"
                    ] ]
                },
                "Stages": [
                    {
                        "Actions": [
                            {
                                "ActionTypeId": {
                                    "Category": "Source",
                                    "Owner": "AWS",
                                    "Provider": "CodeCommit",
                                    "Version": "1"
                                },
                                "Configuration": {
                                    "BranchName": "master",
                                    "RepositoryName": {
                                        "Fn::GetAtt": [ "Repo", "Name" ]
                                    }
                                },
                                "Name": "ChefRepo",
                                "OutputArtifacts": [
                                    { "Name": "Cookbooks" }
                                ]
                            }
                        ],
                        "Name": "Source"
                    },
                    {
                        "Actions": [
                            {
                               "ActionTypeId": {
                                    "Category": "Build",
                                    "Owner": "AWS",
                                    "Provider": "CodeBuild",
                                    "Version": "1"
                                },
                                "Configuration": {
                                    "ProjectName": { "Ref": "BuildProject" }
                                },
                                "InputArtifacts": [
                                    { "Name": "Cookbooks" }
                                ],
                                "Name": "Berkshelf"
                            }
                        ],
                        "Name": "Build"
                    },
                    {
                        "Actions": [
                            {
                                "ActionTypeId": {
                                    "Category": "Invoke",
                                    "Owner": "AWS",
                                    "Provider": "Lambda",
                                    "Version": "1"
                                },
                                "Configuration": {
                                    "FunctionName": { "Ref": "ChefClientFunction" }
                                },
                                "Name": "LambdaChefClient"
                            }
                        ],
                        "Name": "Invoke"
                    }
                ]
            }
        }
    }
}

Summary

By distributing nodes across multiple OpsWorks for Chef Automate instances, the average workload per instance can be reduced, allowing management of considerably more instances as your infrastructure grows. Using several AWS services, it is a simple task to ensure consistency and ease of management of cookbooks across servers, Regions, and even AWS accounts.

About the Author

Nick Alteen is a Lab Development Engineer at Amazon Web Services. In his role, he enjoys process automation and configuration management. Along with this, he supports customers on best practices and on-boarding OpsWorks services.