How to Export EC2 Instance Execution Logs to an S3 Bucket Using CloudWatch Logs, Lambda, and CloudFormation

This blog was updated on December, 6, 2023. The updates included updating the AWS CloudFormation template to use Python 3.11 instead of Python 2.7. Other minor changes were made to AWS Lambda, to accommodate Python version changes.

“We want to get execution logs from our EC2 instances into S3,” my customer said. “Then we can store them and process them later, for optimization, audit, and security review, and so on. We’d like to do it in our CloudFormation stacks, as that’s our execution standard. Can you help us?”

This blog post shows you how to build a solution for this problem. We’ll build it using Amazon CloudWatch Logs, AWS Lambda, and some useful capabilities in AWS CloudFormation for customizing EC2 instances.

How it works

To export the logs, we add some components to the CloudFormation stack that builds the EC2 instance. The following diagram and code samples show how this solution works in a stand-alone fashion. Later, we’ll discuss other ways to integrate these components into your production infrastructure.

Architecture diagram

We use a CloudFormation stack (1) to create the components shown. When everything is up and running, we have an EC2 instance running a CloudWatch Logs agent (2). The agent routes the configured logs to a CloudWatch Logs log group (3). A Lambda function (4) that’s subscribed to the log group picks up each log and writes it to an existing Amazon S3 bucket (5). Note that there’s some delay from the time a log message is created on the EC2 instance to the time it appears in the S3 bucket.

To configure the EC2 instance, we use a neat feature in CloudFormation, the CloudFormation helper functions. These helper functions can be combined to install and update a variety of software packages, configure them, start services, and more. We’ll show you how in this blog.

If you’d like to skip ahead and see the code in action, go to “Running the Solution.”

The implementation

The implementation consists of the following four files, which we’ll discuss later:

1. Cwexport-master-template.yaml: This template creates a security group and IAM role for our EC2 instance, and calls two embedded CloudFormation templates to do the real work.

2. Cloudwatchlogsexport.yaml: This template creates the CloudWatch log group the logs will be sent to, and defines the Lambda function that will perform the export from the log group to S3. It then creates a CloudWatch Log subscription to automatically send the CloudWatch log streams to the Lambda function.

3. Cloudwatch-log-lambda.zip: This zip file contains the code for the Lambda function, packaged along with its prerequisites.

4. Run-ec2-instance.yaml: This template creates the EC2 instance, installs the CloudWatch Log Agent, configures it to export the desired logs, and performs a specified task on startup (in this case, calculating digits of Pi).

In practice we first set up the CloudWatch log group and export to Amazon S3, and then set up and configure the EC2 instance.

Exporting logs from CloudWatch Logs to S3

In Cloudwatchlogsexport.yaml, we first set up the CloudWatch Logs log group itself ("AWS::Logs::LogGroup").

Next, we define the Lambda function that will perform the actual export. We’ve packaged our Python code into a Lambda deployment package for uploading and deployment by CloudFormation. The function (cloudwatch-log-lambda.py) requires two environment variables, s3BucketName and s3KeyPrefix, to tell it where the log files should be exported to. We specify these variables as inputs to the master CloudFormation script, which passes them to the Lambda function at execution time.

The Lambda function receives logs from CloudWatch. For each invocation it unzips the received log file, converts it from JSON into a dictionary, then writes it out to S3 with our naming convention (<s3KeyPrefix>/<logGroup>/<logStream>/<timestamp>). You can see the source code of the Lambda here.

Then, we associate the Lambda function and the CloudWatch log group via an “AWS::Logs::SubscriptionFilter” tag, specifying the Lambda function and the log group it’s subscribing to. This subscription will trigger the Lambda function when new logs are written to the CloudWatch log group, and pass the new log to it.

Customizing your EC2 instance

The last CloudFormation template, Run-ec2-instance.yaml, starts our EC2 instance. We use CloudFormation tags to customize the properties of the EC2 instance: specifying the EC2 instance type, AMI, the IAM role, VPC, and so on.

In our case, we also want to automatically install some software packages (notably, the CloudWatch Logs agent); configure some files; and then start some services – all before the actual processing specified in our UserData section is triggered. We can do so by using some capabilities that CloudFormation provides.

First, we use an AWS::CloudFormation::Init tag in the metadata section of our EC2 definition to define customizations. Then we call a set of provided CloudFormation helper scripts from our EC2 instances’s UserData to implement the definitions before we move on to doing the work we’ve set up the EC2 instance to do. We describe the tag and definitions next.

The AWS::CloudFormation::Init tag

We use the AWS::CloudFormation::Init type to include metadata for our Amazon EC2 instance. The metadata can later be accessed by the helper scripts. When we execute the helper scripts from our EC2 instance’s UserData, the script looks for resource metadata specified in the AWS::CloudFormation::Init metadata key and uses that information to customize the instance. Here’s the beginning of our definition:

  EC2Instance:
    Type: 'AWS::EC2::Instance'
    Metadata:
      'AWS::CloudFormation::Init':
        configSets:
          default:
            [config]
        config:
           …

The CloudFormation Init tag supports a wide variety of capabilities, which we encourage you to explore. For this EC2 instance we’ll use three: packages, files, and services.

Using CloudFormation::Init to install software

We can use the packages tag to download and install pre-packaged applications and components. For this blog post, we want to install one yum package: awslogs, the AWS CloudWatch Logs agent. Because the software is installed by the cfn-init helper script, we’re limited to the package formats it supports. Currently, on Linux the package formats are apt, msi, python, rpm, rubygems, and yum. We specify the package manager, then each package’s name, and a list of (possibly empty) versions. Because we aren’t sensitive to the version here, we’ll leave the version tag blank. In this case, cfn-init installs the latest version if the package isn’t already installed, or leaves it at the current version if the package is installed.

        config:
          packages:
            yum:
              awslogs: []
          ...

Configuring files

Next, we want to use CloudFormation to create or modify the CloudWatch Log agent configuration files on the EC2 instance. We can do this by using the “files” key of our AWS::CloudFormation::Init. There are several options for creating the files. Here, we include the desired content for the files directly in the CloudFormation template. We create two CloudWatch agent configuration files and two CloudFormation helper configuration files:

/etc/awslogs/awscli.conf: This short example shows how an entire file is specified within this tag. The file will contain the default Region, and a plugin setting:

          files:
            '/etc/awslogs/awscli.conf':
              content: !Sub |
                [default]
                region = ${AWS::Region}
                [plugins]
                cwlogs = cwlogs
              mode: '000644'
              owner: root
              group: root

/etc/awslogs/awslogs.conf: We specify the local files we want to send to CloudWatch Logs, along with the log stream name to send them to, and other characteristics. The list being exported includes the CloudFormation execution logs. We encourage you to extend the list to include logs for your application or other logs of interest.

We’ve configured each selected EC2 instance local log file to go to a log stream that consists of the stack name, followed by the EC2 instance ID, followed by the name of the file. This way, the files associated with one instance are grouped together and are easily findable in CloudWatch. This naming convention can easily be changed.

The CloudWatch Logs agent lets us specify the size of the batches, the time stamp formats, the encoding, and more. By default, the logs are sent using enable gzip http content encoding to send compressed payloads to CloudWatch Logs. This decreases CPU usage, lowers Network Out, and decreases put latency. Our Lambda function unzips the files before writing them out.

/etc/cfn/cfn-hup.conf and /etc/cfn/hooks.d/cfn-auto-reloader.conf: These two configuration files are used to configure the cfn-hup daemon. This daemon detects changes in the EC2 resource metadata and runs user-specified actions when a change is detected. This allows you to make configuration updates on your running Amazon EC2 instances through the UpdateStack API action.

Starting and restarting services

The third section in the metadata that we specify is the services key. This key defines which services should be enabled or disabled when the instance is launched. The services key also allows you to specify dependencies on sources, packages and files. If a restart is needed after files have been installed, cfn-init will take care of the service restart. In our example, we use this key to restart two services after we’ve created the config files for awslogs and cfn-hup.

          services:
            sysvinit:
              awslogs:
                enabled: true
                ensureRunning: true
                packages:
                  yum:
                  - awslogs
                files:
                - '/etc/awslogs/awslogs.conf'
                - '/etc/awslogs/awscli.conf'
              cfn-hup:
                enabled: true
                ensureRunning: true
                files:
                - '/etc/cfn/cfn-hup.conf'
                - '/etc/cfn/hooks.d/cfn-auto-reloader.conf'

Finally: Executing the definitions

Having specified the files to install, config files to set up, and services to be started, we need to ensure that the cfn-init helper function is run to implement these specifications. In the EC2 instance’s UserData section, we first update the helper scripts and ensure that other yum packages are up-to-date. Then, we execute cfn-init, telling it which resource and CloudFormation stack to use as parameters.

Lastly, we call the cfn-signal helper function to let CloudFormation know that our EC2 instance is up and ready. We pass it the name of the stack, the Region, and the resource we’re sending a signal for. This signal causes CloudFormation to switch the EC2 instance to “Complete”.

  UserData:
    'Fn::Base64': !Sub |
    #!/bin/bash -x
    # Use the line below to ensure the CloudFormation helper scripts are updated to the latest version
    # Per: http://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/cfn-helper-scripts-reference.html
    yum install -y aws-cfn-bootstrap
    yum update -y     
    /opt/aws/bin/cfn-init -v --stack ${AWS::StackName} --resource EC2Instance --configsets default --region ${AWS::Region}
    /opt/aws/bin/cfn-signal -e $? --stack ${AWS::StackName} --region ${AWS::Region} --resource=EC2Instance

Now, we’re ready to actually perform the work of this EC2 instance. In our example, it’s a dummy job that calculates digits of Pi. We’re actually more interested in the log files created.

Running the solution

To see this solution in operation in us-east-1, choose the “Launch Stack” button below.

Choose “Next”, then update the parameters shown below for your environment: TheLogsBucketName, VPCId, VPC Subnet and s3BucketForResults (leave the Template Locations as is). Use S3 buckets that are in us-east-1.

Click “Next” twice, acknowledge that AWS CloudFormation might create IAM resources with custom names, and click “Create”. Then wait for the CloudFormation master stack and its two nested stacks to reach a status of “CREATE_COMPLETE”.

After our EC2 instance is up and running, the files we specified in our CloudWatch Logs agent configuration are exported to CloudWatch. Here’s an example of our log group, as seen in the CloudWatch Logs console:

The Lambda function receives each log from CloudWatch and unzips it. It then writes the event file out to Amazon S3, giving it an S3 key that consists of the given S3 key prefix, followed by the log group, the log stream name (i.e., the filename), and a timestamp.

If the logs don’t appear in Amazon S3 as expected, check the Lambda function’s CloudWatch Logs log group for execution errors. The CloudWatch logs for the Lambda function can be found in a log group named: /aws/lambda/<mainstackname>-CWExportStack-CWEc2LogsLambdaFunction-<id>.

The power of change sets

Earlier, we configured a cfn-hup daemon. The cfn-hup helper is a daemon that detects changes in resource metadata and runs user-specified actions when a change is detected. This allows you to make configuration updates on your running Amazon EC2 instances through the UpdateStack API action.

Let’s demonstrate the power of the cfn-hup configuration that’s in our CloudFormation stack. In our files definition, we provided the following configuration:

'/etc/cfn/hooks.d/cfn-auto-reloader.conf':  
    content: !Sub |
        [cfn-auto-reloader-hook]
        triggers=post.update
        path=Resources.EC2Instance.Metadata.AWS::CloudFormation::Init
        action=/opt/aws/bin/cfn-init --verbose --stack=${AWS::StackName} --region=${AWS::Region} --resource=EC2Instance
        runas=root

This configuration tells cfn-hup to re-run cfn-init after the stack has been updated.

In our case, we’ll make two changes to our EC2 instance. First, we’ll add an additional software package to install: jq. Imagine that there’s a part of our software stack that’s only invoked rarely, and we’re either missing a package or wish to upgrade a version in our running system. Secondly, we’ll add an additional log file export, cloud to our CloudWatch Logs configuration.

These changes have been made for you, in another CloudFormation template: run-ec2-instance-changed.yaml. To execute this change through the console, do the following steps.

In the CloudFormation console, select your nested “EC2InstanceStack”.
Choose Actions -> Update Stack.
You’ll receive a warning that “Performing operations directly on a nested stack may result in an unstable state where it is out-of-sync with the root stack to which it belongs.” But we’ll be very careful here, so select the option “Yes, Update.”
Next, select the revised template. Enter the following URL: https://aws-bigdata-blog.s3.amazonaws.com/artifacts/cloudwatchlogsexport2s3/run-ec2-instance-changed.yaml, and choose Next.
Click through the next two pages that show the existing template parameters. We won’t be modifying any of them.
On the next page, CloudFormation shows the results of analyzing the differences between the new CloudFormation template and the currently running one. The following screenshot shows the differences. We are modifying the EC2 instance only. The changes we’ve made result in “Replacement: False”, telling us that the existing EC2 instance will be modified. Other results there are “True,” which causes the resource to be replaced; or, “Conditional,” which means the resource property will be dynamically evaluated at execution time to determine if it requires replacement or not.

7. Choose “Update”.

Then, in the CloudFormation console, you can see the update actions as they occur, and their results, as in the screenshot below.

After waiting a little while for the changes and logs to propagate, you can check the CloudWatch log group. You’ll see a new log stream there, for the cloud-init.log definition we added.

In our CloudWatch log group (or in S3), you can review cfn-hup.log and see the moment it’s notified of the CloudFormation template change, and starts the cfn-auto-reloader-hook action.

Then, in cfn-init.log, you can see the installation of the jq software package, and the rewriting of the other files specified in our CloudFormation template.

Now we’ve successfully updated our running EC2 instance, with additional software and changed configurations. All through our CloudFormation stack. And, we know that the CloudFormation template we now have reflects our running environment. Sweet!

Remember to delete the CloudFormation stack when you’re done, to stop incurring fees. If you delete the stack right way, testing the solution will cost only a few cents.

Adapting for production use

To implement, copy the files from s3://aws-bigdata-blog.s3.amazonaws.com/artifacts/cloudwatchlogsexport2s3/, and modify as needed for your environment.

This implementation demonstrates useful capabilities. However, you should consider modifying some details for production use.

In cloudwatchlogsexport.yaml, we’ve created a new log group. Since for demo purposes we’ll be deleting the log group when the CloudFormation stack is removed, we’ve added a commented out line “#DeletionPolicy: Retain” at the end of the definition. Because we’re writing the logs to S3, we don’t need to retain the log group. However, you may also choose to retain the group for some time after the run before expiring it, for additional validation. Also, because there is a delay in writing out the logs created, if you aggressively delete the CloudFormation stack as soon as the EC2 instance is terminated you might remove the Lambda function before the last logs have made it to the S3 bucket. We recommend that you wait a few minutes before deleting the CloudFormation stack.

You can also choose to use one log group for many CloudFormation stacks and EC2 instance executions. Using that approach, you would create the CloudWatch Log group and Lambda function once (run Cloudwatchlogsexport.yaml alone), and leave them in place for the long term. As each EC2 instance gets created (run-ec2-instance.yaml), pass it the name of the log group to use. Since we are using the instance ID in the log stream ID, the different executions will still be clearly identified in CloudWatch and in Amazon S3.

Conclusion

By integrating the CloudWatch Logs agent into our CloudFormation stack, your EC2 instance logs can easily be exported to an S3 bucket. After the logs are in S3, you have a myriad of additional options. You can make the EC2 instance logs part of your data lake. You can process them using some analytics tools, such as Amazon QuickSight. You can set up S3 lifecycle rules to automatically archive them for audit purposes, using Amazon Glacier. Because all the components are automated using a CloudFormation stack, you can ensure that the logs are being exported and stored consistently. For example, you can integrate this solution into requests to provision infrastructure, to ensure that audit trails for all such requests are created in a consistent fashion.

The same CloudFormation components can be used to install, configure, and customize many other combinations of software.You’re limited only by your imagination. We hope this blog inspires you to extend your use to other use cases beyond the ones we’ve described here.

About the Author

Veronika Megler, PhD, is a Senior Consultant, Big Data, Analytics & Data Science, for AWS Professional Services. She enjoys helping customers adopt new technologies to solve new problems and to solve old problems more efficiently and effectively. In her spare time she is passionate about conservation, travel to interesting, beautiful or historic places, expanding her knowledge of arcane subjects, and searching for ultimate expression in Argentine tango.

AWS Cloud Operations & Migrations Blog