AWS Storage Blog
Amazon S3 audit logging, Part 3: Analyzing S3 Metadata journal tables for object lifecycle tracking
This is Part 3 of our three-part series on Amazon S3 audit logging. In Part 1, we covered server access logs for HTTP-level requests and performance analysis. In Part 2, we covered S3 data events in AWS CloudTrail for identity-focused security investigations.
As data volumes grow and storage costs become a significant line item, organizations need clear visibility into how their stored objects change over time. Are lifecycle policies transitioning objects to cheaper storage tiers on schedule? Are encryption standards being maintained across all objects? Are retention policies being enforced correctly for regulatory compliance? Without a reliable way to track object state changes—storage class transitions, encryption modifications, tag updates, and retention status—teams are left guessing whether their storage optimization strategies are working as intended or silently wasting budget on misconfigured policies.
Amazon S3 Metadata journal tables address this challenge as a fully managed feature that automatically records every metadata change for objects in your S3 buckets. Unlike server access logs (which track HTTP requests) and S3 data events in CloudTrail (which track API calls with identity context), S3 Metadata journal tables focus specifically on object state—capturing storage class transitions, encryption changes, tag modifications, and Object Lock events in query-optimized Parquet format. Combined with Amazon Athena, this provides a ready-to-query analytical foundation with no infrastructure to manage.
In this post, we demonstrate how to enable S3 Metadata journal tables and analyze object lifecycle data using Athena. You will learn how to configure the feature, understand the auto-created table schema, and run common analysis patterns for storage class transition tracking, encryption compliance monitoring, lifecycle policy validation, and object change frequency analysis—giving you the visibility needed to optimize storage costs and enforce governance policies with confidence. Depending on your use case, you might need to optimize performance and cost (discussed in Part 1), ensure security and compliance (discussed in Part 2), or consider storage optimization (discussed in this post). See the decision framework in Part 1 to determine which mechanisms best fit your needs.
What S3 Metadata journal tables capture
Metadata journal tables act as your object lifecycle auditor, recording every state change for objects in your bucket.
What is captured:
- Storage class transitions – Lifecycle state changes (Standard, Standard-IA, Glacier, and so on)
- Encryption changes – Encryption status, AWS Key Management Service (AWS KMS) key changes, algorithm updates
- Object tags – Tag additions, modifications, and removals
- Object Lock and retention – Compliance mode, retention period, legal hold changes
- Object creation and deletion – Complete lifecycle from upload to removal
What is not captured:
- Access information – No data on who accessed objects (use S3 data events in CloudTrail, as discussed in Part 2).
- HTTP request details – No status codes, response times, or performance metrics (use server access logs, as discussed in Part 1).
- User identity – No AWS Identity and Access Management (IAM) user or role context for changes (the requester account or service is recorded, but not the specific IAM principal—use CloudTrail data events for full identity context).
- Replication status – No status of object replication, neither at the source bucket side nor destination bucket side.
Delivery latency – Typically, within 15 minutes of the metadata change, though actual delivery times might vary, format is Parquet (query-optimized). There is no separate charge for journal record generation—you pay only for S3 Tables storage of journal records and Athena query costs when analyzing the data. Refer to S3 Tables pricing for details. Data is partitioned by record_timestamp_day.
Enabling S3 Metadata journal tables
Make sure you have permissions required to create the metadata configuration before proceeding. Complete the following steps:
- On the S3 console, choose Buckets in the navigation pane.
- Choose the source bucket you want to monitor.
- On the Metadata tab, choose Create Metadata configuration.
- In the Journal table configuration section, choose Server-side encryption and choose the encryption mechanism and KMS key if you chose to encrypt journal table data before storing to the S3 bucket.
- For Record expiration, choose Disabled if you don’t want to expire the records, or choose Enabled and specify the number of days after which records will expire.
- For Live inventory, choose Disabled or Enabled as required (not in scope of this post).
- Review the configuration and choose Create metadata configuration.
The journal table status will change from Creating to Active.
- To verify table creation, open the Athena console and choose Query editor in the navigation pane.
- For Catalog¸choose
s3tablescatalog/aws-s3. - For Database, choose
b_<bucket_name>.
Understanding the auto-created table
In this section, we explore the auto-created table’s schema and partition structure. For frequently used queries, consider saving them in Athena and views can also be created for recurring queries.
Table schema
To see what data is stored in journal tables, see S3 Metadata journal tables schema. You can also see this information by describing the journal table:
- On the Athena console, choose Query editor in the navigation pane.
- For Catalog¸choose
s3tablescatalog/aws-s3. - For Database, choose the configured S3 bucket
b_<bucket_name>. - Describe the automatically created table that will show all the columns listed in the journal table schema:
-- View the auto-created table structure
DESCRIBE journal;
The following table shows the partition column.
| field_name | field_transform | column_name |
|---|---|---|
record_timestamp_day |
day |
record_timestamp |
Partition structure
The auto-created table uses single-level date partitioning:
├── record_timestamp_day=2024-03-15/
│ └── part-00000.parquet
├── record_timestamp_day=2024-03-16/
│ └── part-00001.parquet
└── record_timestamp_day=2024-03-17/
└── part-00002.parquet
Always include record_timestamp in the WHERE clause for cost-efficient queries.
Analyze S3 Metadata journal tables
In this section, we present common S3 Metadata journal table analysis patterns. All the dates and date ranges used in the following query patterns are examples and use placeholders; these need to be adjusted based on your requirements.
To run each query, complete the following steps:
- On the Athena console, open the query editor.
- For Catalog¸choose
s3tablescatalog/aws-s3. - For Database, choose the auto-created database (
b_<bucket_name>). - In the query editor, enter your desired query.
- Choose Run or press Ctrl+Enter.
- View results in the Results pane.
Query pattern 1: Track storage class transitions
The following query tracks S3 object storage class transitions over the last 200 days to help evaluate lifecycle
policy effectiveness and optimize storage cost strategies:
-- Track all storage class transitions in the last 200 days
WITH storage_transitions AS (
SELECT
key,
record_timestamp,
storage_class,
size,
LAG(storage_class) OVER (
PARTITION BY key
ORDER BY record_timestamp
) as previous_storage_class,
LAG(record_timestamp) OVER (
PARTITION BY key
ORDER BY record_timestamp
) as previous_change_time
FROM journal
WHERE DATE(record_timestamp) >= (current_date - interval '200' day)
)
SELECT
key,
record_timestamp as transition_time,
previous_change_time as previous_time,
previous_storage_class as from_storage_class,
storage_class as to_storage_class,
size as size_bytes,
DATE_DIFF('day', previous_change_time, record_timestamp) as days_in_previous_tier
FROM storage_transitions
WHERE previous_storage_class IS NOT NULL
AND previous_storage_class != storage_class
ORDER BY record_timestamp DESC
LIMIT 100;
Query pattern 2: Storage class distribution with object size categorization
The following query provides a snapshot of S3 object distribution across storage classes to help identify storage optimization opportunities and right-size lifecycle policies:
-- Quick overview of storage distribution for objects in last 200 days
SELECT storage_class,
CASE WHEN size < 1048576 THEN 'Small (<1MB)'
WHEN size < 104857600 THEN 'Medium (1-100MB)'
WHEN size < 1073741824 THEN 'Large (100MB-1GB)'
ELSE 'Very Large (>1GB)'
END as size_bucket,
COUNT(*) as object_count,
SUM(size) / 1024 / 1024 / 1024 as total_size_gb, AVG(size) / 1024 / 1024 as avg_size_mb,
MIN(size) / 1024 / 1024 as min_size_mb,
MAX(size) / 1024 / 1024 as max_size_mb
FROM journal
WHERE DATE(record_timestamp) >= DATE(current_date - interval '200' day)
AND size IS NOT NULL -- To exclude delete markers
GROUP BY storage_class,
CASE WHEN size < 1048576 THEN 'Small (<1MB)'
WHEN size < 104857600 THEN 'Medium (1-100MB)'
WHEN size < 1073741824 THEN 'Large (100MB-1GB)'
ELSE 'Very Large (>1GB)' END
ORDER BY total_size_gb DESC;
Query pattern 3: Object lifecycle timeline
The following query reconstructs the complete lifecycle history of S3 objects matching a specific key prefix to provide a full audit trail for object-level investigation and lifecycle policy validation:
-- Complete lifecycle timeline for specific object or location
WITH lifecycle_events AS (
SELECT
key,
record_timestamp,
record_type,
storage_class,
size,
encryption_status,
ROW_NUMBER() OVER (
PARTITION BY key
ORDER BY record_timestamp
) as event_sequence
FROM journal
WHERE DATE(record_timestamp) >= DATE('2025-09-01')
AND DATE(record_timestamp) <= DATE('2025-11-12')
AND key LIKE '2025/apr-25%'
)
SELECT
key,
event_sequence,
record_timestamp,
record_type,
storage_class,
size,
encryption_status
FROM lifecycle_events
ORDER BY key, event_sequence;
Query pattern 4: Identify premature storage transitions/early deletions
The following query identifies S3 objects that transitioned from STANDARD to STANDARD_IA before the recommended 30-day minimum, flagging potential cost waste from early lifecycle transitions. It also tracks whether objects were subsequently deleted, helping you fine-tune lifecycle policies and avoid paying minimum storage duration charges.
-- Identify objects that transitioned too early (potential cost waste)
WITH object_lifecycle AS (
SELECT
key,
record_timestamp,
record_type,
storage_class,
LAG(storage_class) OVER (
PARTITION BY key
ORDER BY record_timestamp
) as previous_storage_class,
LAG(record_timestamp) OVER (
PARTITION BY key
ORDER BY record_timestamp
) as previous_timestamp,
LAG(record_type) OVER (
PARTITION BY key
ORDER BY record_timestamp
) as previous_record_type,
LEAD(record_type) OVER (
PARTITION BY key
ORDER BY record_timestamp
) as next_record_type,
LEAD(record_timestamp) OVER (
PARTITION BY key
ORDER BY record_timestamp
) as next_timestamp
FROM journal
WHERE DATE(record_timestamp) >= DATE('2025-09-01')
AND DATE(record_timestamp) <= DATE('2025-11-12')
AND record_type IN ('CREATE', 'UPDATE_METADATA', 'DELETE')
)
SELECT
key,
previous_storage_class,
storage_class as transitioned_to_class,
record_timestamp as transition_time,
previous_timestamp as previous_change_time,
previous_record_type,
next_record_type,
next_timestamp as deletion_time,
date_diff('day',
CAST(previous_timestamp AS DATE),
CAST(record_timestamp AS DATE)
) as days_in_previous_class,
CASE
WHEN next_record_type = 'DELETE'
THEN date_diff('day',
CAST(record_timestamp AS DATE),
CAST(next_timestamp AS DATE)
)
ELSE NULL
END as days_before_deletion
FROM object_lifecycle
WHERE
-- Storage class changed
previous_storage_class IS NOT NULL
AND previous_storage_class != storage_class
-- Transitioned from STANDARD to STANDARD_IA
AND previous_storage_class = 'STANDARD'
AND storage_class = 'STANDARD_IA'
-- Transitioned before 30-day minimum
AND date_diff('day',
CAST(previous_timestamp AS DATE),
CAST(record_timestamp AS DATE)
) < 30
ORDER BY days_in_previous_class
LIMIT 100;
Query pattern 5: Encryption status changes
The following query tracks S3 object encryption changes over a specified period, surfacing any modifications to encryption status or KMS keys to support encryption compliance auditing:
-- Track encryption status changes for audit
WITH encryption_history AS (
SELECT
key,
record_timestamp,
record_type,
encryption_status,
kms_key_arn,
LAG(encryption_status) OVER (
PARTITION BY key
ORDER BY record_timestamp
) as previous_encryption_status,
LAG(kms_key_arn) OVER (
PARTITION BY key
ORDER BY record_timestamp
) as previous_kms_key,
ROW_NUMBER() OVER (
PARTITION BY key
ORDER BY record_timestamp
) as record_number
FROM journal
WHERE record_timestamp >= timestamp '2025-10-01 00:00:00'
AND record_timestamp <= timestamp '2025-11-12 23:59:59'
AND record_type IN ('CREATE', 'UPDATE_METADATA')
)
SELECT
key,
record_timestamp,
record_type,
encryption_status as current_encryption,
previous_encryption_status,
kms_key_arn as current_kms_key,
previous_kms_key
FROM encryption_history
WHERE
-- Must have a previous record (not the first record for this object)
record_number > 1
AND (
-- Encryption status actually changed
encryption_status IS DISTINCT FROM previous_encryption_status
OR
-- KMS key actually changed
kms_key_arn IS DISTINCT FROM previous_kms_key
)
ORDER BY record_timestamp DESC;
Query pattern 6: Objects with frequent metadata changes
The following query identifies S3 objects with abnormally high metadata change activity (10 or more changes) by calculating change counts to detect misconfigured lifecycle rules, application bugs, or potentially suspicious object manipulation patterns:
-- Identify objects with unusual metadata change frequency
WITH change_frequency AS (
SELECT
key,
COUNT(*) as change_count,
COUNT(DISTINCT record_type) as unique_record_types,
MIN(record_timestamp) as first_change,
MAX(record_timestamp) as last_change,
date_diff('day',
MIN(CAST(record_timestamp AS DATE)),
MAX(CAST(record_timestamp AS DATE))
) as days_span
FROM journal
WHERE record_timestamp >= timestamp '2025-09-01 00:00:00'
GROUP BY key
)
SELECT
key,
change_count,
unique_record_types,
first_change,
last_change,
days_span,
CASE
WHEN days_span > 0 THEN CAST(change_count AS DOUBLE) / days_span
ELSE change_count
END as changes_per_day
FROM change_frequency
WHERE change_count > 10 -- More than 10 changes
ORDER BY change_count DESC
LIMIT 50;
Cost optimization best practices
For cost optimization, we recommend using partition pruning and including record_timestamp in every query. Athena charges $5 per TB of data scanned. The following partition pruning techniques directly reduce your Athena costs by minimizing data scanned per query. Refer to Amazon Athena pricing for more details.
The following is an example of an efficient query:
SELECT * FROM journal
WHERE DATE(record_timestamp) = DATE('2024-03-15') -- Partition pruning
AND record_type = 'CREATE';
Expensive query:
-- Missing partition column!
SELECT * FROM journal
WHERE record_type = 'CREATE';
The following query is an example of using date ranges efficiently:
-- Efficient: Specific date range
SELECT * FROM journal
WHERE DATE(record_timestamp) >= DATE('2025-09-15')
AND DATE(record_timestamp) <= DATE('2025-09-17') -- Only 3 days
AND storage_class = 'GLACIER'
LIMIT 1000;
Another best practice for cost optimization is to use LIMIT for one-time exploratory queries. Additionally, use
window functions (LAG, LEAD) wherever possible for efficient change tracking without self-joins.
Troubleshooting common issues
The following are common issues and troubleshooting tips:
- No data in Metadata journal table – Allow 15–30 minutes after enabling for first records to appear. Verify the feature is enabled under the bucket’s metadata tab. For more details, see S3 Metadata tables. Additionally, check if metadata changes are occurring in the source bucket and you are running the query in the right database.
- Query returns no results – Check available dates first:
-- Check what dates have data
SELECT DISTINCT DATE(record_timestamp)
FROM journal
ORDER BY DATE(record_timestamp) DESC
LIMIT 10;
-- Then query with correct date
SELECT * FROM journal
WHERE DATE(record_timestamp) = DATE('2024-03-15') -- Use actual date from above
LIMIT 100;
Conclusion
In this post, you learned how to enable S3 Metadata journal tables and analyze object lifecycle data using Athena. You now have query patterns for storage class transition tracking, storage distribution analysis, encryption compliance monitoring, lifecycle timeline reconstruction, premature transition detection, encryption change auditing, and change frequency analysis—all using the auto-created Parquet table with partition pruning for cost-efficiency.
Across this three-part series, you now have three logging mechanisms that provide visibility into your S3 environment for different use cases, as summarized in the following table.
| Question | Mechanism | Part |
|---|---|---|
| How was data accessed? | S3 server access logs | Part 1 |
| Who accessed it? | S3 data events in CloudTrail | Part 2 |
| What metadata changes happened over time? | S3 Metadata journal tables | Part 3 |
Each mechanism serves a distinct purpose, and the right choice depends on your use case. Use server access logs when you need HTTP-level tracking, performance metrics, and cost attribution. Use CloudTrail S3 data events when user identity, compliance auditing, or security investigations are the priorities. Use Metadata journal tables when you need to track object state changes for storage optimization and governance.
For complete audit coverage, implement all three—server access logs capture request-level details, S3 data events in CloudTrail provide the identity context, and Metadata journal tables track the object lifecycle. Together, they can transform your centralized logging repository into a high-performance analytical engine that turns compliance requirements from an operational burden into a strategic advantage.