AWS Database Blog

Optimize AWS KMS decryption costs for Database Activity Streams

In regulated industries like healthcare and finance, auditing database activity is a top priority. Companies need to record the actions performed by database users and administrators to maintain compliance and security.

AWS offers robust auditing for databases through Database Activity Streams (DAS). Integrated with Amazon Relational Database Service (Amazon RDS) and Amazon Aurora, DAS produces details of all actions performed to your databases, including SQL commands, connection details, and changes to the database schema (refer to Monitoring database activity streams for more information). DAS is integrated with AWS Key Management Service (AWS KMS), which allows encrypting and decrypting of the data flowing through the activity stream. However, frequent decryption calls might lead to unpredictable costs as your database workloads scale and the volume of audits grow.

In this post, we describe a potential solution that helps you optimize the costs associated with the auditing functionality of AWS managed databases. You can reduce the number of AWS KMS calls required for decrypting activity logs and minimize the risk of exceeding call rate limits by implementing a caching strategy for encryption keys.

Solution overview

The following diagram shows the high-level architecture of the solution.

The solution uses the following AWS services and features:

The solution presented in this post implements logic to cache the data encryption key used to decrypt DAS payloads. By caching the data key, the number of API calls to AWS KMS can be reduced.

Security considerations

Caching encryption keys can significantly reduce cost and optimize your deployment, However, it’s crucial to maintain a robust security posture to prevent unauthorized access to sensitive data. The following are key security best practices to consider:

The workflow consists of three phases: auditing, caching, and validation. The steps are as follows:

  1. A user performs changes on the database.
  2. The DB cluster or instance sends encrypted activity payloads (audit log) to Kinesis Data Streams.
  3. A Lambda function acts as record processor. It calls the kms:Decrypt API to decrypt the data encryption key (DEK) created when the stream was launched. Then, using the AWS Encryption SDK, AWS Lambda will decrypt the activity payload using the previously decrypted DEK. The Lambda function then caches the data key for further decryptions until either the function timeout is reached or the cache limit is met.
  4. After the payload is decrypted, the Lambda function can export the logs for further analysis to destinations such as Amazon S3, Amazon OpenSearch, and third party monitoring tools.
  5. The security and operation teams can validate the solution using CloudWatch Logs and AWS CloudTrail to determine the potential savings:
    • CloudWatch shows the Lambda function logs, which include the invocation time and runtime events.
    • CloudTrail records the number of AWS KMS calls.

Prerequisites

DAS has the following requirements and limitations when it comes to implementing it with Amazon Aurora PostgreSQL-Compatible Edition.

Create a KMS key for the database activity stream

The database activity stream requires a KMS key in order to encrypt and decrypt the logged database activity. For more information, see Creating keys. Complete the following steps to create your KMS key:

  1. On the AWS KMS console, choose Customer managed keys in the navigation pane.
  2. Choose Create a key.
  3. For Key type, select Symmetric. For more information about symmetric encryption, refer to Symmetric encryption KMS keys.
  4. For Key usage, select Encrypt and decrypt.
  5. Choose Next.
  6. For Alias, enter the name for the key as PostgreSQL-DAS. You can modify the alias accordingly based on your database engine.
  7. For Description, enter an optional description, such as Key for Amazon Aurora PostgreSQL Database Activity Streaming (DAS).
  8. Choose Next
  9. Under Define key administrative permissions, choose the IAM users and roles allowed to administer the key.
  10. Choose Next.
  11. Review/create the key policy and choose Finish.

When you create a KMS key, you can specify the key policy for the new KMS key. If you don’t provide one, AWS KMS creates one for you. The default key policy that AWS KMS uses differs depending on whether you create the key in the AWS KMS console or you use the AWS KMS API.

Refer to Permissions for AWS services in key policies.

Start the database activity stream

DAS provides a near-real-time stream of the activity in your database cluster. Refer to How database activity streams work for more information.

To start the database activity stream on Amazon Aurora PostgreSQL, complete the following steps:

  1. On the Amazon RDS console, in the navigation pane, choose Databases.
  2. Select the Aurora cluster on which you want to start an activity stream.
  3. On the Actions menu, choose Start database activity stream.
  4. For AWS KMS key, enter the KMS key you created earlier.
  5. For Database activity stream mode, you can select the mode streams to be synchronous or asynchronous based on your requirements.
  6. In the Scheduling section, select Immediately.
  7. Choose Start database activity stream.
  8. On the Kinesis Data Streams console, wait for the status of the data stream to show as Active.

Create the Lambda execution role

The function’s purpose is to act as a record processor, reading the data from the Kinesis data stream. It caches the data key until the caching thresholds and Lambda runtime set are met.

Before you launch the Lambda function, you need to create a Lambda execution role with permissions to read records from the Kinesis data stream, write the function’s logs to CloudWatch, and decrypt the data key using the AWS KMS Decrypt API.

Complete the following steps:

  1. On the IAM console, create a new role, and name it das-data-processor-execution-role. You will need to add the role as principal for the KMS policy if the default policy wasn’t used. Refer to Permissions for AWS services in key policies
  2. Add the permissions required. Refer to the following JSON policy as an example.

IAM policies vary based on your setup and security requirements. Refer to Applying the principles of least privilege for the best practices.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "kms:Decrypt"
            ],
            "Resource": "arn:aws:kms:YOUR_REGION:YOUR_ACCOUNT_ID:key/YOUR_KEY_ID"
        },
        {
            "Effect": "Allow",
            "Action": [
                "kinesis:DescribeStream",
                "kinesis:GetRecords",
                "kinesis:GetShardIterator"
            ],
            "Resource": "arn:aws:kinesis:YOUR_REGION:YOUR_ACCOUNT_ID:stream/aws-rds-das-cluster-XXXXXXXXXXXXXXXXXXXXXXXXXXXXXI"
        },
        {
            "Effect": "Allow",
            "Action": [
                "logs:CreateLogGroup",
                "logs:CreateLogStream",
                "logs:PutLogEvents"
            ],
            "Resource": "arn:aws:logs:YOUR_REGION:YOUR_ACCOUNT_ID:*"
        }
    ]
}

Change the ARNs accordingly for the different services created in the previous sections.

Create the Lambda function

The following is the sample Lambda function using the AWS SDK for Python (Boto3). Complete the following steps to create the Lambda function:

  1. On the Lambda console, in the navigation pane, choose Functions.
  2. Choose Create a function.
  3. Select Use a blueprint and choose Process records sent to a Kinesis stream.
  4. For Function name, enter das-data-processor.
  5. For Existing role, choose the role you created in the previous section.
  6. Choose Create function.
  7. Update the Lambda code from the code source section and deploy the new version. For more details, refer to Lambda console.
  8. Set the function timeout, memory allocation, and other parameters according to your specific needs. See AWS Lambda Pricing and Configuring function timeout (console) for more information.

You can find the function code on the AWS Samples GitHub repo.

Within the code, change the following values based on your specific setup. Follow the best practices when it comes to using variables on Lambda. See Securing environment variables for more information.

REGION_NAME = "us-east-1" # Your Region
RESOURCE_ID = "cluster-XXXXXXXXXXXXXXXYYYYYYYZZZZ"  # Your Aurora Cluster ID
STREAM_NAME = "KINESISSTREAMNAME"  # Your Kinesis Data Stream name

Within the code, the following class manages the cache. In particular, “getDecrypted” looks for the presence of the key in the cache before calling the AWS KMS decryption API.

KMSDataKeyCache():

def __init__(self, session):

# Initialize the KMS client and a simple dictionary for caching keys

self.kms_client = session.client('kms', region_name=REGION_NAME)

self.key_cache = {}

def getDecrypted(self, data_key_decoded):

# Attempt to retrieve the decrypted key from cache or decrypt it using KMS

if data_key_decoded in self.key_cache:

return self.key_cache[data_key_decoded]

else:

# Decrypt the key using KMS and store it in the cache

data_key_decrypt_result = self.kms_client.decrypt(

CiphertextBlob=data_key_decoded,

EncryptionContext={'aws:rds:dbc-id': RESOURCE_ID})

self.key_cache[data_key_decoded] = data_key_decrypt_result['Plaintext']

return data_key_decrypt_result['Plaintext']

While implementing caching might help to save costs and enhance performance, make sure to analyze the security trade-offs. Cached plaintext data keys in memory could be accessed by anyone with runtime access to the Lambda function. Consider adding extra logic to set a timeout on cache keys, and restrict Lambda function access with least-privilege policies. Please refer to the best practices section for more information.

Create a Kinesis trigger

Complete the following steps to create a trigger for your Lambda function:

  1. On the Lambda console, navigate to the configuration page of the Lambda function you created.
  2. Choose Add trigger.
  3. Specify the Kinesis data stream related to the database activity stream.
  4. The batch size value determines the batch size that your Lambda function can process. Change the batch size from the default value to 10000, this will allow each invocation to process more records, potentially reducing the need for higher concurrency.
  5. Batch window: allows you to wait as long as 300s to build a batch before invoking a function. This helps to reduce the number of invocations.

Refer to Quotas and limits and Batching behavior for more information.

Finding the optimal balance between the Kinesis data stream capacity, batch size and batch window is crucial. In our scenario, the goal is to process the maximum number of records per function invocation to achieve the most cost-effective solution. However, the outcome will vary based on factors such as processing frequency, transaction volume, DB cluster size and the number of payloads.

For more information about best practices, refer to Best practices for consuming Amazon Kinesis Data Streams using Lambda and Using AWS Lambda with Amazon Kinesis.

Function libraries

When working on the code, make sure you have initialized the necessary Python libraries and packaged them as a .zip file with the correct version. In this case, we used Python 3.10 as the runtime, then we packaged the libraries into a Lambda layer to reduce the size of the function package and attached it to the recently created function.

Refer to Working with Lambda layers, Adding layers to functions, and Lambda runtimes for more information.

Validate function logs

Once you create the Lambda function and the Kinesis trigger is enabled, the records processing is initialized and some invocations in CloudWatch should appear. Validate the function logs using CloudWatch to determine if the function is able to successfully process the decrypted payloads.

Within the payload, an encrypted string represents one or more activity events as a base64 byte array. When you decrypt the string, the result is a record in JSON format. In our case, we see successful decrypted payloads.

Let’s create a new table called employees in the database and validate the decrypted activity string in the CloudWatch logs. Refer to Connecting to a DB instance running the PostgreSQL database engine for more information.

From the CloudWatch log event view, filter by the table name. You should be able to see the activity string with the details of the operation performed. You can access the function logs from the Monitor tab or directly from the CloudWatch log group’s view. See Using Amazon CloudWatch logs with AWS Lambda for more information.

The following event log shows that the payload has been decrypted, including the audit of the operation performed during the table creation process.

Refer to databaseActivityEventList JSON array for more information.

Test the solution

In this section, we validate the records being processed using CloudWatch and CloudTrail to determine the recurrence for AWS KMS decryption calls.

We need to analyze the cache hit ratio. First, we look at CloudWatch metrics to calculate the invocation frequency, then we look at the calls performed to the decryption API to determine if the function runtime coincides with the decrypt operation. For more information, refer to Logging AWS KMS API calls with AWS CloudTrail.

  1. On the Lambda console, navigate to the function.
  2. On the Monitor tab, choose Logs.
  3. Validate the invocation time, then the duration time in milliseconds, which gives you the maximum runtime for the function.

In the following example, with our data stream capacity configured as provisioned with 8 shards, we experience 8 concurrent Lambda invocations. This is a common pattern since by default, Lambda invokes one consumer per Kinesis shard.

See Polling and batching streams for more information.

The average function execution time is 5 minutes based on the findings below.

Function logs in Amazon CloudWatch

Function logs in Amazon CloudWatch

  1. Navigate to the CloudTrail console and filter by the decrypt operation. Typically, you can filter by user name, which is the Lambda function name.
  2. Validate the decrypt operations performed between 13:45 and 14:00 (UTC+01:00) as well the subsequent calls.

Based on this filter, the timing of the Decrypt API calls match the function runtime. This indicates that the function successfully retains the data key during execution. A fresh API call to retrieve the key is only necessary if the function experiences a timeout or if the cache timeout for KMS key is met.

Refer to Querying AWS CloudTrail logs for more information.

Findings

Before caching, every Kinesis data stream payload was equal to a decrypt operation call to AWS KMS. For customers with large volumes of data in addition to strict compliance and regulatory requirements, the solution from the cost perspective was not attractive.

From the encryption perspective, Amazon Aurora retains a single KMS data encryption key per database instance for up to six hours. However as long as the KMS is accessible, each data encryption key will be rotated every hour. Based on this, the cluster size and data volume might influence the data key caching strategy.

Having an Amazon Aurora cluster with two DB instances, means we will need to cache two individual encryption keys during an hour lapse before key rotation. On the other hand, the data key caching operation is performed per function invocation. The Lambda function is concurrently processing records from two DB instances. It means the cache ratio will depend on the function execution time, the number of records processed and if the cache timeout is reached.

Sample calculation

Let’s assume that Amazon Aurora through Database activity streams emits 1000 records per minute. In case the records are evenly distributed to the available Kinesis shards, 125 records are being processed per shard in our setup, having 8 shards in total. Please see enable shard level metrics to determine the exact value of records per shard.

Cache Hits : The number of times a Lambda function was able to use a cached decryption key instead of making a new Decrypt API call to KMS.

Cache misses: The number of times a Lambda function had to call the decryption API to KMS.

Total Number of Payloads: The total number of records processed by the Lambda function during the time frame in question.

Cache hit ratio

For 5,000 records (over 5 minutes), subtract the 8 initial misses for cache hits: 5,000−8 (cache misses) =4,992 cache hits.

This calculation is simplified. The actual behavior and efficiency gains from this solution vary based on several factors, such as the Lambda runtime settings, DB cluster size, Kinesis stream capacity, the volume of records and the cache timeout settings.

Refer to AWS Key Management Service Pricing and Data Key caching details for more information.

Clean up

When you’re done testing the solution, make sure you delete the resources you created to avoid unexpected charges:

  1. On the Amazon RDS console, stop the database activity stream. It will automatically delete the Kinesis stream.
  2. Delete the KMS key created for the stream.
  3. On the Lambda console, delete the function.

Note that keeping the function will not incur any additional costs if it doesn’t run.

Conclusion

In this post, we showed how to optimize costs related to your database auditing solution with Aurora or Amazon RDS managed databases. The proposed solution allows highly regulated customers to analyze Aurora database activity streams implementing security best practices, while ensuring the encryption costs are under control and optimized.

The following are some of the benefits achieved with this solution:

  • Improved performance by avoiding repeated AWS KMS requests
  • Reduced cost by lowering the number of AWS KMS charges
  • Help in staying below AWS KMS service limits as application usage scales

You can expand the solution by incorporating AWS or third-party observability services for deeper visualization and correlation. For more details, refer to How to use AWS Security Hub and Amazon OpenSearch Service for SIEM and Using the Centralized Logging with OpenSearch console.

For any questions or suggestions about this post, leave a comment.


About the Authors

Yeferson Bernal is a Solutions Architect based in Amsterdam. In his role, he works closely with independent software vendors in the Benelux region, providing best practices for architecting scalable solutions on AWS. Yeferson brings a wealth of experience in cloud computing, having worked with startups, partners, and various integrators across Latin America, Canada, and Europe.

Mattia Berlusconi is a Specialist Data Solutions Architect based in Milan, Italy. In his role, he helps customers of any size, pursuing their business objectives along their cloud journey. He works with customers’ architects to identify the best solution for them, with a specific focus on the most innovative options for getting value out of their data. Mattia brings years of experience from the field in migrating and innovating databases and data platforms to speed up customers adopting the cloud.

Sumeet Patel is an AWS software development engineer based in Canada. In his role he builds software solutions at scale using AWS and has been delivering and refining features for RDS Database Activity Streams over the past few years.