Solving federal log retention requirements with AWS account-level subscription filters

Login.gov is a shared technology service within the General Services Administration (GSA) Technology Transformation Services (TTS) that provides authentication and identity verification capabilities to government agencies. As the public’s “one account for government,” Login.gov gives the public the opportunity to use a single account to securely access services across participating government websites. The Login.gov team implemented a robust long-term log retention system that solved multiple architectural challenges while using Amazon Web Services (AWS) account-level subscription filters to provide capabilities that other approaches couldn’t match.

As a federal authentication platform, Login.gov needed to preserve all production logs long-term and establish a single source of truth. The original concept was to maintain “hot logs” retained for a short period in Amazon CloudWatch while immediately archiving to Amazon Simple Storage Service (Amazon S3) for long-term storage. This approach eventually evolved into a larger data warehouse initiative involving AWS services such as Amazon Redshift, but the immediate priority was establishing reliable log archival storage. The requirements extended beyond simple storage—the Login.gov team needed to retain production logs for 10 years to meet federal compliance mandates, capture 100% of logs without any data loss, manage logs across multiple accounts and Regions, maintain existing security integrations, and support automatic log capture for new resources. This project was also a prerequisite for a multi-Region recoverability initiative because the GSA security team required this logging infrastructure to be in place before creating replicas of infrastructure in other Regions to ensure immediate capture of log groups for various resources such as Amazon Aurora databases.

Evaluating solutions

Multiple approaches were evaluated. The evaluation began by assessing Amazon Data Firehose, a common recommendation in AWS documentation for log archival to Amazon S3. However, the requirements demanded guaranteed delivery with zero data loss and granular control over S3 partitioning to organize logs by account, Region, and log group and stream rather than consolidated files.

The second approach evaluated was individual CloudWatch subscription filters, which offered more control but presented two considerations. We needed a solution that could coexist with existing subscription filters for security operations center (SOC) integration, and we wanted automatic coverage for any new log groups created in the account. A concrete example involved some log files that already had a subscription filter scanning for decrypt actions on password digest or personally identifiable information (PII) encryption keys. The subscription filter monitored for potential malicious activity involving customer data and sent alerts accordingly. In production, this log group already had two filters—one for our own CloudWatch alerting and one sending to the SOC’s Elastic stack for log processing. Finally, it was account-level filters that provided a viable path forward that worked alongside these existing filters.

Architecture

The architecture uses several AWS services in a specific configuration. In the source account, we implemented an account-level subscription filter that operates Regionally, using a policy name format of account_ID-region for metadata preservation. The destination account contained an Amazon CloudWatch Logs destination, a single data stream using Amazon Kinesis Data Streams in a selected Region, an AWS Lambda function for processing, and Amazon S3 for long-term storage.

Understanding the cross-account relationship between components is critical. The account-level subscription filter policy is created in the source account, whereas the CloudWatch Logs destination is created in the destination account in the same Region. The CloudWatch Logs destination and Kinesis data stream must be in the same account but can be in different Regions. This meant that a single Kinesis data stream in the selected Region could receive data from destinations in another Region because destinations can send data across Regions within the same account. This architecture meant we could have an account-level subscription filter policy for every account where we wanted to enable logging, and a CloudWatch Logs destination for every account, all created in the log archive account—while only needing one Kinesis data stream and one Lambda function to do all the processing.

The data flows as follows:

Account-level subscription filters capture all log groups automatically and send them to CloudWatch destinations in the log archive account.
These destinations forward logs to a central Kinesis data stream.
The Kinesis data stream triggers a Lambda function that processes batches of records, extracts metadata from the policy name, and writes organized logs to Amazon S3 in a hierarchical structure.

Implementation details

The account-level subscription filter is implemented using the aws_cloudwatch_log_account_policy resource. The key insight is using the policy name to pass account and Region metadata through the processing pipeline. The following configuration is based on a Terraform documentation pattern for account-level subscription filter. For more information about LogGroupName, refer to put-account-policy:

resource "aws_cloudwatch_log_account_policy" "subscription_filter" {
  policy_name = "${var.account_id}-${var.region}"  # Format used for metadata extraction
  policy_type = "SUBSCRIPTION_FILTER_POLICY"
  
  policy_document = jsonencode({
    DestinationArn = var.cloudwatch_destination_arn
    FilterPattern  = ""  # Empty pattern captures all logs
    Distribution   = "ByLogStream"
  })

  # Exclude Lambda processor log group to prevent infinite loop
  selection_criteria = var.excluded_log_groups != [] ? (
    "LogGroupName NOT IN ${jsonencode(var.excluded_log_groups)}"
  ) : null
}

Kinesis data stream configuration

The Kinesis data stream is configured with a 24-hour retention period, providing a sufficient buffer for any throttling or delay issues to be resolved. For high-volume log ingestion, enhanced fan-out is required to match the performance level demanded by the data flow. The stream also uses on-demand sharding, automatically expanding capacity as needed without manual intervention.

Lambda function configuration

The Lambda function configuration proved crucial for reliable processing. We developed the Lambda code with our security engineer, and it performs a straightforward function: it decodes incoming records, extracts partition keys and sequence numbers for naming S3 objects, filters out control messages sent from CloudWatch to verify subscription readability, and processes data messages using the account_ID-region policy name to partition logs appropriately in Amazon S3. The following Lambda trigger configuration is based on AWS documentation with example values:

# Lambda event source mapping for Kinesis trigger
# Reference: https://registry.terraform.io/providers/hashicorp/aws/latest/docs/resources/lambda_event_source_mapping

resource "aws_lambda_event_source_mapping" "kinesis_trigger" {
  event_source_arn  = aws_kinesis_stream.log_stream.arn
  function_name     = aws_lambda_function.log_processor.arn
  starting_position = "TRIM_HORIZON" #Resume from last position on restart
  batch_size    = 8                  # Records per batch
  parallelization_factor     = 10    # Concurrent batches per shard
  maximum_record_age_in_seconds = -1 # No age limit - process all records
  maximum_retry_attempts   = -1 # Unlimited retries - never drop data
  bisect_batch_on_function_error          = true
}

The Lambda function pattern for processing CloudWatch logs from Kinesis Data Streams follows this pattern. For more information, refer to Log group-level subscription filters:

s3_client = boto3.client('s3')
DESTINATION_BUCKET = 'log-archive-bucket'
 
def lambda_handler(event, context):
    for record in event['Records']:
        # Decode the Kinesis record
        payload = base64.b64decode(record['kinesis']['data'])
        data = json.loads(payload)
        
        # Skip control messages (subscription validation)
        if data.get('messageType') == 'CONTROL_MESSAGE':
            continue
            
        # Process data messages only
        if data.get('messageType') == 'DATA_MESSAGE':
            # Extract account and region from policy name
            # Policy name format: "account_id-region"
            filter_name = data.get('subscriptionFilters', [''])[0]
            account_id, region = filter_name.split('-', 1)
            
            # Extract log metadata
            log_group = data.get('logGroup')
            log_stream = data.get('logStream')
            
            # Use partition key and sequence for unique object naming
            partition_key = record['kinesis']['partitionKey']
            sequence = record['kinesis']['sequenceNumber']
            
            # Build hierarchical S3 path
            s3_key = f"cloudwatch-logs/{account_id}/{region}{log_group}/{log_stream}/{partition_key}-{sequence}.json"
            
            # Write newline-delimited JSON to S3
            log_events = '\n'.join(json.dumps(e) for e in data['logEvents'])
            s3_client.put_object(
                Bucket=DESTINATION_BUCKET,
                Key=s3_key,
                Body=log_events
            )

One essential implementation detail is that the Lambda function’s own log group must be excluded from the account-level filter. Without this exclusion, a feedback loop occurs where the function’s execution logs are captured by the filter, triggering another execution, which creates more logs, resulting in an infinite cycle. AWS documentation specifically recommends this exclusion for any Lambda function processing CloudWatch logs. The Terraform implementation must include a selection criteria block that adds excluded log groups to prevent this scenario.

Amazon S3 organization

Our Amazon S3 organization follows a hierarchical structure, organizing logs by account level, Region level, log group level, and stream level. Each S3 object contains 1–30 log messages in newline-delimited JSON format, providing efficient storage while maintaining readability for machine processing.

The structure in Amazon S3 would be akin to the following:

s3://log-archive-bucket/cloudwatch-logs/
  [#Account ID] 123456789012/
    [#Region] us-west-2/
      aws/lambda/
        [#Log Group] my-function/
          2024/01/15/
            [#Log Stream] [$LATEST]abc/
              partkey1-seq001.json  # 1-30 log messages each
              partkey1-seq002.json/...

Key discovery: The power of policy names

A key discovery in our implementation was the importance of the account-level subscription filter’s policy name. By structuring the policy name as account_ID-region, we could pass essential metadata through the processing pipeline for proper Amazon S3 partitioning. This was necessary because Kinesis data streams don’t inherently carry information about which account or Region the data originated from. The filter name became our mechanism for preserving this critical metadata throughout the processing pipeline.

Results

During testing in sandbox environments, we demonstrated successful handling of high log volumes with zero data loss. The system automatically captures new log groups, maintains efficient organization for machine processing, works alongside existing subscription filters, and provides Regional separation for security compliance. The Kinesis data stream’s on-demand sharding automatically expands capacity as needed, enabling the solution to scale with log volume.

Conclusion

The solution at Login.gov is deployed in production, and it has shown a robust, scalable system that meets federal compliance requirements while maintaining efficient log processing and storage organization. This implementation represents our specific solution based on Login.gov’s requirements, drawing from actual development and testing experience.

Our journey from evaluating Data Firehose to implementing account-level subscription filters demonstrates how understanding specific compliance and operational requirements leads to selecting the right architectural patterns. By using AWS service capabilities creatively—particularly the policy name field for metadata preservation—we developed a solution that meets current needs while providing a foundation for future scalability.

For organizations facing similar log retention challenges, we recommend reviewing the AWS documentation on CloudWatch account-level subscription filters and evaluating whether this pattern fits your compliance and operational requirements.

AWS Public Sector Blog

Solving federal log retention requirements with AWS account-level subscription filters

Evaluating solutions

Architecture

Implementation details

Kinesis data stream configuration

Lambda function configuration

Amazon S3 organization

Key discovery: The power of policy names

Results

Conclusion

Resources

Follow

Learn

Resources

Developers

Help