AWS Database Blog

Dynata’s journey to lower TCO and faster modernization with AWS Database Savings Plans

Satish Bhonsle — Mon, 06 Jul 2026 16:18:19 +0000

In this post, we show how Dynata simplified database cost optimization and accelerated modernization to AWS Graviton processors by adopting Database Savings Plans. Rather than managing Reserved Instances across multiple database services, Dynata consolidated their cost commitment into a single, flexible pricing model. This reduced operational overhead by 70%, extended cost coverage to Amazon Aurora serverless, and lowered total cost of ownership as their infrastructure evolved.

Dynata is the world’s largest first-party data company, delivering precise, reliable insights that help businesses to make informed decisions at scale. Their platform empowers organizations to activate the right audiences, and measure impact with confidence through industry-leading data accuracy, commitment to continuous improvement, and operational excellence.

The challenge

When Dynata’s database workloads grew, managing Reserved Instances across multiple services and database engines became operationally complex. The team needed a way to:

Modernize to the latest generation instances.
Reduce the operational overhead of managing individual reservations.
Extend cost savings to newer service models like Amazon Aurora serverless.
Maintain predictable costs while their infrastructure evolved.

The solution: AWS Database Savings Plans

Dynata adopted Database Savings Plans, a flexible pricing model. It reduces your database costs by up to 35% when you commit to a consistent amount of usage over a 1-year term. Database Savings Plans automatically apply to eligible serverless and provisioned instance usage regardless of engine, instance family, size, deployment option, or AWS Region, with no upfront payment.

Results

By adopting Database Savings Plans, Dynata achieved three measurable outcomes across modernization, operations, and cost.

Accelerated modernization

Database Savings Plans gave Dynata the freedom to adopt the latest AWS Graviton processors. Dynata transitioned from 5th and 6th generation instances to the latest 7th and 8th generation Graviton processors across Amazon Aurora, Amazon Relational Database Service (Amazon RDS), and Amazon ElastiCache. The flexibility of Database Savings Plans, combined with its broad applicability across data infrastructure services, helped Dynata keep billed hours under control while improving operational costs.

Simplified operations

Database Savings Plans streamline capacity reservations across Amazon RDS, Amazon OpenSearch Service, Amazon DocumentDB (with MongoDB compatibility), Amazon DynamoDB, and Amazon ElastiCache in a single order. This reduced procurement time by 70%. A key operational improvement was the inclusion of Amazon Aurora serverless coverage, a service model that had no equivalent reservation option before. With this coverage, Dynata adopted serverless database architectures while maintaining cost optimization.

Lower total cost of ownership

By consolidating discounts into a single pool that applies broadly across their data infrastructure, Dynata achieved measurable cost reductions. Costs for Amazon OpenSearch Service and Amazon ElastiCache declined after they purchased Database Savings Plans. The large discount pool covering multiple data infrastructure services helped reduce overall operating costs, resulting in fewer billed hours than before adopting Database Savings Plans. With the plan’s flexibility, Dynata can make larger commitments, so their savings grow as they modernize.

Conclusion

By adopting Database Savings Plans, Dynata accelerated their migration from 5th and 6th generation instances to 7th and 8th generation Graviton processors across Amazon Aurora, Amazon RDS, and Amazon ElastiCache. They did this without the financial friction of managing traditional Reserved Instances. Operationally, the team reduced the time spent on capacity reservation management and gained cost coverage for Amazon Aurora serverless, a service model that had no equivalent Reserved Instance option before. The result is a measurable reduction in billed hours and improved total cost of ownership, with savings that scale alongside their modernization efforts rather than working against them.

To learn more about AWS Database Savings Plans, visit the AWS Database Savings Plans page.

About the authors

Data masking in Amazon RDS for Oracle

Jobin Joseph — Thu, 02 Jul 2026 20:51:09 +0000

As organizations migrate their Oracle databases to Amazon Relational Database Service (Amazon RDS) for Oracle, one critical operational practice that must be re-established in the cloud is safely refreshing non-production environments with production-like data. On-premises, this was a standard process: clone production, mask sensitive data, and hand it off to developers and testers. Data masking permanently replaces sensitive information with realistic fictitious values, so your test environments retain the same structure, patterns, and scale that production has. This makes testing meaningful without inadvertent access to customer information.

On Amazon RDS for Oracle, you don’t have the same level of infrastructure control you had on-premises, but the overall data masking approach remains similar. Data masking using the Oracle Data Masking and Subsetting Pack works with Amazon RDS for Oracle, letting you maintain the same data governance workflow in the cloud that you used in your on-premises environment.

In this post, we walk through how to use the Oracle Data Masking and Subsetting Pack with Amazon RDS for Oracle. We cover setting up Data Masking in Oracle Enterprise Manager (OEM) and automation options.

Why data masking matters in the cloud

Compliance doesn’t change in the cloud: GDPR, HIPAA-eligible, and PCI DSS-compliant requirements apply equally to RDS as they did on-premises.
RDS DB snapshot: Restoring an RDS DB snapshot to a lower environment without masking exposes sensitive data in non-production environments. Additionally, development and test databases typically have relaxed security groups, more AWS Identity and Access Management (IAM) roles, and shared credentials. Masking helps prevent real data exposure even if these environments allow inadvertent access.
Third-party development needs safe data: Outsourced development teams need realistic data to work with. Masked RDS clones let you share production-quality datasets without exposing actual customer information.
Preserving on-premises workflows: If you already use Oracle Enterprise Manager for masking on-premises data, you can use a similar approach with RDS for Oracle.

Oracle Data Masking and Subsetting Pack

Oracle licenses the Data Masking and Subsetting Pack as an add-on to Oracle Enterprise Manager (OEM) Cloud Control. It provides data masking capabilities including referential integrity preservation, sensitive data discovery, and flexible deployment. The pack understands Oracle data types, constraints, and relationships natively, making it suited for complex Oracle workloads.

How it works with RDS for Oracle

The Oracle Data Masking and Subsetting Pack operates through Oracle Enterprise Manager (OEM) Cloud Control, connecting to your RDS for Oracle DB instance as managed targets. OEM runs on a separate Amazon Elastic Compute Cloud (Amazon EC2) instance and communicates with RDS over the network. It requires no OS-level access to the RDS host.

RDS DB snapshot and restore: Take an RDS DB snapshot of the production RDS for Oracle DB instance and restore it as a new RDS for Oracle DB instance (the target/clone).
OEM connectivity: OEM Cloud Control on Amazon EC2 connects to the RDS for Oracle DB instance (target/clone). The masking script is generated and downloaded from OEM.
Masking: The masking script is run manually on the target/clone RDS for Oracle DB instance, replacing sensitive data in-place.
Result: The target/clone RDS for Oracle DB instance now contains masked data, ready to be shared with the development and test environments.

Prerequisites

Cost consideration: Consider the cost of running multiple RDS for Oracle DB instances (Enterprise Edition).

Amazon RDS for Oracle: Enterprise Edition (source/production).
Amazon RDS for Oracle: Enterprise Edition (target/clone, restored from production RDS DB snapshot).
Oracle Enterprise Manager Cloud Control (13.5 or 24ai): Installed on an on-prem/Amazon EC2 instance with network connectivity to target/clone RDS for Oracle DB instance. Connectivity via Oracle client to run the masking script on the target/clone RDS for Oracle DB instance.
Oracle Data Masking and Subsetting Pack license.
RDS admin user access on the target RDS for Oracle DB instance.

Step 1: Deploy Oracle Enterprise Manager on EC2

Launch an Amazon EC2 instance in the same virtual private cloud (VPC), or a peered VPC, as your RDS for Oracle DB instances. Recommended: m5.xlarge or larger, 100GB+ Amazon Elastic Block Store storage, Oracle Linux 8 or Red Hat Enterprise Linux (RHEL) 8. Note: Running an m5.xlarge EC2 instance with 100GB+ EBS storage will incur ongoing charges. Use the AWS pricing calculator to determine the compute and storage costs.
Install and configure Oracle Enterprise Manager Cloud Control 24ai (or 13.5).

Step 2: Create and restore RDS DB snapshot

Create a RDS DB snapshot of the RDS DB instance that you want to mask. Alternatively use an automated backup.
Restore the RDS Snapshot as a clone/target RDS DB instance.
Wait for the clone/target restored RDS DB instance status to become ‘available’.

Step 3: Register RDS DB instances as targets in OEM

Set up OEM integration with RDS for Oracle as discussed in Oracle Management Agent for Enterprise Manager Cloud Control using option groups.

You need to register the clone/target RDS for Oracle DB instance as a monitored target in OEM.

Note: If you are using multi-tenant or single-tenant configuration of RDS for Oracle, verify that the tenant database is discovered as a target in OEM and all database related activities must be carried out on the tenant database in the step-by-step instructions.

All the following steps use the Enterprise Manager 24ai console, refer to Data Masking and Subsetting for the steps to follow on Enterprise Manager 13.5.

Step 4: Create an Application Data Model (ADM)

The Application Data Model is the foundation for all masking operations. It maps your schema structure, table relationships, and identifies sensitive columns.

Navigate to Enterprise > Quality Management > Application Data Models.
Select Create.
In the Create Application Data Model screen, specify the following properties:

Enter the Name for the ADM.
Enter the Description for the ADM.
For Target Type, select Database Instance.
For Target Database, select the target/clone RDS for Oracle DB instance.
Select the Database Named Credentials (admin user with role:normal).
Select the Schemas to include.
For Relation Discovery Type, select either Dictionary-Based or Non-Dictionary.

Select Create.
Wait for the job status to change to succeeded. OEM discovers tables, views, primary key/foreign key relationships, and column metadata.

Screenshot of Oracle Enterprise Manager 24ai showing the Create Application Data Model

Step 5: Discover sensitive data

Select the ADM that you created in the previous step.
Select Discover Sensitive Columns and select Schedule.
In the Create Sensitive Column Discovery Job screen, specify the following properties:

Select Database Named Credentials.
For Applications, select the schema.
Select the Sensitive Types (such as Age, Credit Card Number, Date of Birth, and Email ID).
Specify Data Sample Size.

Select Submit.
Wait for the status of the job to change from scheduled to succeeded.
Review the results of the discover sensitive columns job.
Review each discovered column and adjust its sensitivity status as needed by choosing Mark Sensitive, Mark Not Sensitive, or Mark Undefined.

Screenshot of Oracle Enterprise Manager showing the Discover Sensitive Columns results.

Step 6: Create a masking definition

Navigate to Enterprise > Quality Management > Data Masking Definitions.
Select Create.
In the Create Masking Definitions screen, specify the following properties:

Enter the Name for the definition.
Select the Application Data Model.
Select the Associated Database.
Select the Database Named Credentials.

Select Next.
For each sensitive column, complete the following steps:

Select Define Format and Add.
Select or define the masking format.
Select Add Format Entry.
Select Add.

Select Next.
In the Advanced Options screen, select the options that you would like to use.
Select Next.
In the Review screen, select Create.

Step 7: Generate and download the script

Since Amazon RDS for Oracle is a managed service, OEM cannot schedule masking jobs directly against the RDS for Oracle DB instance. The OEM job scheduler requires OS-level credentials on the target host, which are not available on RDS. Instead, you generate and download the masking script from OEM, then run it manually (or through automation) against the target RDS for Oracle clone.

From the Masking Definition page, select the masking definition.
Select Actions.
Select Manage Masking Script.
Select Generate Masking Script.
In the Generate Masking Script screen, specify the following properties:

Select the In-Data Masking Option.
Select the Associated Database.
Select the Database Named Credentials.

Select Generate.
Wait for the status of the job to change to Script Generated.
Select the Masking Definition.
Select Actions.
Select Manage Masking Script.
Select Download In-Database Masking Script.
Review the generated script.

Amazon RDS for Oracle restricts direct ALTER SYSTEM commands for managed service security. The OEM-generated script includes these commands, which will fail on RDS. Replace them with NULL placeholders and use the RDS-specific rdsadmin utility package equivalents at the end of the script. This approach helps the masking script to complete successfully while achieving the same flush and checkpoint operations through RDS-approved methods.

Replace:
EXECUTE IMMEDIATE 'alter system flush shared_pool';
EXECUTE IMMEDIATE 'alter system checkpoint';
With:
NULL;

Add the RDS equivalent commands at the end of the script, before “spool off”.

EXEC rdsadmin.rdsadmin_util.flush_shared_pool;
EXEC rdsadmin.rdsadmin_util.checkpoint;

Step 8: Run the script on the target clone

Since the OEM job scheduler cannot run directly on RDS, run the downloaded masking script manually against the target RDS clone using a SQL client (SQL*Plus, SQLcl, or SQL Developer):

Connect to the target (clone) RDS for Oracle DB instance using SQL*Plus or SQLcl from the OEM/EC2 instance.
Run the downloaded masking script: @/path/to/downloaded/masking_script.sql
Monitor execution progress by reviewing the status messages output as the script processes each table.
For large databases, run the script in a screen or nohup session to avoid disconnection issues.

Example

Before:

SQL query result showing data from demo.customers table with columns EMAIL and DATE_OF_BIRTH, before data masking is applied.

After:

SQL query result showing masked data from demo.customers table with columns EMAIL and DATE_OF_BIRTH, displaying masked email addresses and birth dates.

Automation considerations

OEM Data Masking scripts are interactive by default, requiring manual user input during execution. To automate, wrap the masking script in a shell script that uses SQL*Plus or SQLcl with input redirection. To get an idea of the input that the script requires, first run it manually before implementing the automation.

Example approach

Warning: The script provided is only for demo purposes. Develop the script according to your organization’s coding and security best practices.

run_masking.sh script:

#!/bin/bash
# run_masking.sh
set -e

# Set Oracle environment from /etc/oratab
ORACLE_HOME=$(awk -F: '/^[^#]/{print $2; exit}' /etc/oratab)
export ORACLE_HOME
export PATH=$ORACLE_HOME/bin:$PATH
export LD_LIBRARY_PATH=$ORACLE_HOME/lib:${LD_LIBRARY_PATH:-}

RDS_ENDPOINT="<rds-instance-endpoint>:<port>/<service-name or pdb-name>"
SECRET_ID="<secrets_manager_arn>"

# Run masking script (note: when passing values to the sql script, use a blank line for the value if you are choosing the default)
{
aws secretsmanager get-secret-value \
--secret-id "$SECRET_ID" \
--region eu-central-1 \
--query SecretString --output text | \
jq -r '"CONNECT " + .username + "/" + .password + "@'"$RDS_ENDPOINT"'"'

cat <<SQLEOF
@masking202605251609.sql

N
N
3
USERS
SQLEOF
} | sqlplus -s /nolog

Running the script in the background.

nohup ./run_masking.sh > masking.log 2>&1

For an automated workflow, use Amazon EventBridge Scheduler to start an AWS Step Functions state machine execution. AWS Step Functions orchestrate the workflow end-to-end.

Automation Flow

Architecture diagram showing the automated masking workflow.

1. Trigger: Amazon EventBridge scheduler starts the AWS Step Functions state machine.

2. Restore RDS DB snapshot: Restore production RDS DB snapshot to a temporary RDS for Oracle DB instance. Poll restored RDS for Oracle DB instance status until available.

3. Fetch credentials: Amazon EC2 retrieves Oracle username and password from AWS Secrets Manager.

4. Run masking: Run Command, a capability of AWS Systems Manager, runs the masking script via SQL*Plus/SQLcl on Amazon EC2 against the restored RDS for Oracle DB instance.

5. Create masked RDS DB snapshot: Take RDS DB snapshot of the masked database.

6. Cross-account sharing/restore: Share masked RDS DB snapshot with Dev/UAT accounts and restore RDS DB snapshot.

7. Cleanup: Refer to the cleanup section in step 10.

Additional Considerations:

Implement Amazon Simple Notification Service (Amazon SNS) notifications for success/failure states.
Verify RDS DB snapshot encryption keys are accessible to target accounts.

Step 9: Validate the masked data

1. Connect to the target RDS for Oracle DB instance.

2. Verify that sensitive columns are properly masked, referential integrity is preserved, and row counts match production.

Step 10: Clean up

Warning: Before deleting any resources, verify that:

The masked data in the target RDS for Oracle DB instance meets your requirements.
You are deleting the correct temporary RDS for Oracle DB instance (not production).
Create an optional final snapshot of the RDS for Oracle DB instance.

Delete the temporary RDS for Oracle DB instances that are no longer required.
Delete RDS DB snapshots that are no longer required.

AWS-native alternatives (brief overview)

It’s worth noting that AWS provides cloud-native options for data masking that don’t require the additional Oracle Data Masking and Subsetting Pack:

AWS Glue Visual ETL: The Detect Sensitive Data transform can identify and mask over 250 types of personally identifiable information (PII) during ETL operations between RDS databases. Serverless, no-code, and designed to work with RDS for Oracle editions. For details, see: Copy and mask PII between Amazon RDS databases using visual ETL jobs in AWS Glue Studio
AWS Database Migration Service (AWS DMS) data masking: When replicating data from one RDS for Oracle DB instance to another while protecting sensitive information, AWS DMS provides a streamlined approach. Starting with version 3.5.4, you can use data masking transformation rule actions in your table mapping to obfuscate or alter column values. This applies as data moves from your source RDS for Oracle DB instance to the target. This allows you to produce a sanitized copy of your database which is ideal for non-production environments like Dev or QA, without the overhead of separate masking scripts or tools. For details, see: Using data masking to hide sensitive information

You’ll find these approaches are ideal for straightforward PII redaction scenarios. However, when you need referential integrity preservation, application-aware masking, deterministic masking, or format-preserving transformations, Oracle offers the Data Masking and Subsetting Pack as an add-on to Oracle Enterprise Manager (OEM) Cloud Control.

Best practices

Always mask a copy, never production: Masking is irreversible. Only mask non-production environments.
Test your masking rules first: Make sure masked data doesn’t break your application logic.
Build a complete Application Data Model: A well-defined ADM automatically preserves foreign key relationships and referential integrity.
Use deterministic masking for consistency: If the same customer appears across multiple databases, deterministic masking supports consistent masked values.
Disable triggers before masking: Turn off triggers and materialized view refreshes before you start, then re-enable them after masking completes.
Monitor your tablespace: Masking generates redo and undo. Make sure your temp and undo tablespaces have enough space. Masking operations also require additional storage. Calculate the space requirements for masking operations.
Document what you mask: Keep a record of which columns are masked and how. You’ll need this for audits.
Automate the workflow: Use Amazon EventBridge and Step Functions to automatically trigger masking after RDS DB snapshot restores, creating a repeatable refresh pipeline.
Cleanup: Remove temporary RDS for Oracle DB instances, and RDS DB snapshots that are no longer required.
Protect the unmasked RDS DB snapshot. The RDS DB snapshot contains a full copy of production data. Restrict who can restore or share snapshots using IAM policies and use AWS Key Management Service (AWS KMS) on your RDS DB instance to protect data at rest. Consider using Service Control Policies (SCPs) to help prevent snapshot sharing outside your AWS organization.
Minimize the masking window. Between snapshot restore and masking completion, unmasked production data exists in your target environment. To reduce this risk:
- Automate the end-to-end workflow (restore → register → mask → validate) so it completes without human delay.
- Block non-admin access to the clone RDS DB instance until masking validation passes.
- Use a dedicated security group that only allows connections from the OEM EC2 instance until masking is confirmed.
Secure the OEM EC2 instance. This instance holds stored database credentials and has admin-level connectivity to your RDS DB instances. Harden it by:
- Restricting its security group to only the necessary ports.
- Applying OS patches regularly.
- Turn on IMDSv2 to help prevent inadvertent credential disclosure from server-side request forgery (SSRF) attacks.
- Use AWS Systems Manager Session Manager instead of SSH.
Don’t store database credentials in scripts. The downloaded masking script needs database connectivity, but credentials should come from AWS Secrets Manager at runtime. Don’t hardcode them in the script file. Configure automatic rotation for RDS credentials.
Encrypt connections in transit. Use Oracle native network encryption or Oracle Secure Sockets Layer for all connections between the OEM EC2 instance, SQL client, and RDS DB instances. This helps prevent inadvertent credential or data disclosure within the VPC.

Conclusion

In this post, you learned how to use the Oracle Data Masking and Subsetting Pack with Amazon RDS for Oracle through OEM Cloud Control. You walked through creating an Application Data Model and discovering sensitive data to generating and executing masking scripts on a cloned RDS for Oracle DB instance.

Combining RDS DB snapshot and restores with OEM’s masking capabilities can help you meet compliance requirements when provisioning lower environments. This approach allows you to deliver production-quality test data to development teams safely and automate the pipeline using EventBridge and Step Functions.

If you use RDS for Oracle Standard Edition or lack the Data Masking and Subsetting Pack license, AWS-native alternatives such as AWS Glue and AWS DMS provide cloud-native masking options without requiring the additional Data Masking and Subsetting Pack license.

About the authors

How CRED uses Amazon RDS Blue/Green Deployments at scale

Murali A — Thu, 02 Jul 2026 20:50:09 +0000

CRED is a fintech platform built for India’s most creditworthy individuals. The platform enables users to manage and pay credit card bills, utility bills, rent, and other payments through a single app. CRED manages over 120 production database clusters on Amazon Relational Database Service (Amazon RDS) and Amazon Aurora, with an average switchover time of 2 minutes and a 100 percent operation success rate. Each maintenance window previously meant roughly 3 hours of coordination, risk, and degraded availability. With stringent Service Level Agreements (SLAs) and a growing cluster count, this approach was unsustainable.

In this post, you will learn how CRED built an automated orchestration framework around Amazon RDS blue/green deployments. The framework performs engine upgrades, instance scaling, storage optimization, and Change Data Capture (CDC) pipeline migration across their entire fleet. This approach achieved zero data loss incidents and zero production incidents.

Solution overview

Database engine upgrades are a routine but high-stakes operation for teams running production workloads on Amazon RDS and Amazon Aurora. Major version upgrades, instance class changes, and storage optimization all require careful planning. This is especially true when the fleet is large and the workloads are critical.

Amazon RDS Blue/Green Deployments create a staging environment that copies the production environment. The blue environment is the current production database, and the green environment is the staging environment that stays in sync through replication. You can make changes to the green environment, such as upgrading the major or minor DB engine version or changing database parameters, without affecting production workloads. For a full list of supported modifications and limitations, see Limitations and considerations for Amazon RDS blue/green deployments. When ready, you switch over the environments, typically under a minute with no data loss and no need for application changes.

CRED’s infrastructure spans multiple AWS accounts serving diverse production workloads with varying performance and availability requirements. Traditional database maintenance, in-place modifications with extended maintenance windows, proved inadequate at this scale. The team needed a systematic, automated approach that could handle database engine version upgrades across MySQL variants, instance upscaling and downscaling, storage shrinkage following data archival operations, large table alterations, and active CDC pipelines using AWS Database Migration Service (AWS DMS) and Debezium (an open source change data capture platform).

To address this, CRED built a blue/green deployment architecture with the following components:

Blue environment — the current production database serving live traffic.
Green environment — the new target database with required changes.
Validation framework — an automated testing suite that verifies green environment readiness.
CDC pipeline management — AWS DMS and Debezium checkpoint tracking.
Switchover orchestration — coordinated Domain Name System (DNS), application configuration, and database role transitions.
Rollback mechanism — reverse replication for rapid recovery.

The automation framework

At the center of this architecture is an internal automation framework that uses the AWS API to run parallel operations, manage state transitions, and provide intelligent decision-making throughout the deployment lifecycle. The framework consists of 6 modules:

Parallel AWS API orchestration — Executes blue/green switchover operations in parallel across multiple database instances using the AWS SDK, reducing total operation time from hours to minutes.
Real-time metrics validation — Continuously monitors and validates critical metrics during switchover: replication lag, connection counts, query latency, error rates, and throughput.
Checkpoint management — Captures and stores checkpoint information from blue/green switchover events, including database position markers, replication coordinates, and transaction IDs.
CDC pipeline intelligence — Automatically restarts AWS DMS tasks and Debezium connectors using captured checkpoints, ensuring zero data loss during CDC pipeline migration.
Automated rollback — Establishes reverse replication (native or DMS-based) from green to blue for one-click rollback to the previous state if issues arise post-switchover.
Decision engine — Evaluates readiness criteria, validates preconditions, and determines go/no-go for each switchover phase based on configurable thresholds.

Pre-deployment preparation

Although many discussions of blue/green deployments focus on the switchover moment, CRED’s data across 200 migrations shows that 90 percent of success is determined in the preparation phase. Incidents traced back to preparation gaps, not execution failures. The team identified several preparation areas that proved critical.

Testing on staging environments

CRED mirrors production configurations in their staging environments. Configuration drift between staging and production caused 40 percent of switchover surprises in early operations. The team enforces a strict rule: If stage tests fail, the production migration is blocked. There are no exceptions.

Application team alignment

Communication proved as important as technical execution. Teams surprised by database changes caused 30 percent of post-switchover incidents. To address this, CRED provides green environment endpoints to application teams early, giving them time to test connection pools, timeouts, and failure modes. Each dependent team must provide explicit sign-off before the switchover proceeds.

Major version upgrade validation

Breaking changes are expected in major version upgrades. For example, MySQL 5.x to 8.x changed GROUP BY behavior, authentication mechanisms, and over 50 system variables. The team replays production queries against both blue and green environments, comparing results row by row. This approach catches 80 percent of issues. Although release notes cover major changes, thorough testing reveals edge cases and interactions.

Piloting green in the read path

Before switchover, CRED progressively shifts read traffic to the green environment in stages: 10 percent, then 50 percent, then 100 percent over several days. This validates the green environment under real production load and catches application-specific issues such as connection pool tuning and client library incompatibilities. By switchover time, the green environment has served production reads successfully, and confidence is earned through data rather than assumption.

Operational considerations

Two additional factors required attention. First, application-level parameter caching can cause issues during switchover if services read stale connection strings from AWS Systems Manager Parameter Store. The team enforces parameter refresh intervals under 5 minutes or coordinates application deployments during the switchover window. Second, AWS does not support cross-region replication for native blue/green deployments, so cross-region replicas must be handled separately. If your database uses cross-region read replicas, you must handle them separately outside the blue/green deployment process.

Maintenance mode and switchover orchestration

CRED’s microservices architecture, running on Amazon Elastic Container Service (Amazon ECS), is designed to handle database switchovers through an intelligent maintenance mode mechanism. Before initiating a switchover, services automatically enter maintenance mode, which triggers a controlled connection closure to the existing database instance. After connections are safely closed, the blue/green switchover is initiated. After the switchover completes and the new instance becomes active, services exit maintenance mode and establish fresh connections to the updated database endpoint. This approach verifies that in-flight transactions are handled properly and connection pools are refreshed without manual intervention.

Pre-switchover validation

The automation framework validates the following conditions before proceeding with any switchover:

Replication lag below 1 second for a minimum of 2 minutes.
Checksum validation on critical tables.
All Amazon CloudWatch alarms in OK state.
Application configuration management systems updated.
DNS and load balancer configurations prepared.
Reverse replication operational for rollback capability.

Switchover flow

The following diagram illustrates the 3 phases of a typical switchover. In the pre-switchover state, the blue database serves as the active primary handling 100 percent of write traffic and 50 percent of read traffic, while the green database operates as a standby replica handling the remaining 50 percent of read traffic.

Diagram: Blue/Green Database Switchover Flow — provided by CRED

In the post-switchover state, the green database becomes the active primary handling 100 percent of write and read traffic. The former blue database transitions to standby with reverse replication enabled, ready for rollback if needed.

Post-switchover operations

After switchover, the automation continues with a 30-minute intensive monitoring window. Reverse replication from the new production environment back to the former blue environment is maintained for rollback capability. CDC pipeline health is monitored for AWS DMS task latency and Debezium connector status. If critical thresholds are violated during this window, the framework triggers an automatic rollback.

CDC pipeline migration

One of the most complex aspects of blue/green database deployments is maintaining Change Data Capture (CDC) pipelines. CRED runs active CDC pipelines using AWS DMS and Debezium across their database fleet. During a switchover, these pipelines must be stopped on the blue environment and restarted on the green environment without losing events or introducing duplicates.

CRED’s automation framework handles this through checkpoint-based migration. The process follows 4 steps:

Stop the current CDC task on the blue environment.
Update the AWS DMS task checkpoint using the position captured from the blue/green switchover event. The DMS source endpoint remains unchanged because blue/green switchover preserves database connectivity.
Restart the DMS task pointing to the new primary database post blue/green switchover.
Monitor pipeline health through Amazon CloudWatch metrics for AWS DMS task latency, events, and errors.

Across 120 databases with active CDC pipelines, this checkpoint-based approach achieved the following results:

Zero data loss incidents.
Average CDC pipeline downtime of 30 seconds.
Zero duplicate event rate.
100 percent successful AWS DMS and Debezium migration rate.

Operational lessons and best practices

Over the course of building and operating this framework across 120 database clusters, the CRED team encountered several lessons that shaped their approach.

Test rollback, not just switchover

During a production upgrade, an application compatibility issue emerged 20 minutes after switchover. Because the team had tested the rollback procedure and maintained reverse replication, they reverted to the blue environment in 2.5 minutes with zero data loss. This incident reinforced the importance of treating rollback as a first-class operation rather than an afterthought.

Instrument everything

Early in the journey, subtle performance degradation in a green environment went unnoticed until after switchover. The team now collects and validates hundreds of metrics before, during, and after every switchover. Critical metrics include replication lag at sub-second resolution, connection pool utilization and wait times, query latency percentiles (p50, p90, p95, p99), disk I/O patterns and saturation, CPU and memory utilization, active connection counts and churn rate, and database-specific indicators such as deadlocks and buffer cache hit ratios.

Application connection handling

Applications with poor connection management caused connection storms during switchover, overwhelming the green environment. The team now requires all dependent applications to implement exponential backoff on connection failures, connection pool health checks, circuit breaker patterns for database failures, and graceful degradation when database connectivity issues arise.

Parameter group drift

Green environments inadvertently provisioned with different parameter groups caused performance regressions that were not caught during testing. To prevent this, the automation now validates parameter group consistency between blue and green environments, manages parameter groups as code through infrastructure as code tooling, generates parameter group diff reports before switchover, and alerts on any configuration drift.

Cost monitoring during migrations

Running parallel blue and green environments for extended periods during large migrations increased AWS costs. The team addressed this by integrating with AWS Cost Explorer for cost visibility, setting budget alerts specific to migration operations, automating blue environment decommissioning after validation, and scheduling migrations to minimize the duration of duplicate resources.

Results

Before adopting blue/green deployments, a database change at CRED meant hours of planning, cross-team coordination, and carefully chosen maintenance windows that often spilled into weekends. Every operation carried risk, and even small changes demanded disproportionate effort.

Since adopting this framework, the team has automated MySQL switchovers end-to-end. Across their fleet, CRED achieved the following operational metrics:

2 minutes average switchover time (95th percentile: 2.1 minutes).
100 percent operation success rate.
Zero production incidents.
Zero data loss incidents.
85 percent reduction in operational effort.

Services now coordinate their own maintenance mode during switchovers, eliminating the need for manual intervention, cross-team war rooms, and weekend maintenance staff availability. What once took days of planning and execution now completes in hours.

The risk profile has also changed. With automated validation, built-in rollback, and predictable execution, database operations have gone from being one of the most failure-prone areas to delivering zero production incidents since adoption. Database changes no longer block releases or require lengthy approval processes.

Conclusion

In this post, we described how CRED built an automated orchestration framework around Amazon RDS Blue/Green Deployments to manage database engine upgrades, instance scaling, storage optimization, and CDC pipeline migration across 120 production clusters. By investing heavily in pre-deployment preparation, building intelligent maintenance mode coordination into their Amazon ECS services, and automating CDC checkpoint management for AWS DMS and Debezium, the team achieved an average switchover time of 2 minutes, a 100 percent operation success rate, and zero data loss incidents.

The key takeaways from CRED’s experience are that thorough preparation determines the outcome more than the switchover itself, rollback must be treated as a first-class operation, and instrumentation across hundreds of metrics is necessary to catch subtle regressions before they reach production.

To get started with blue/green deployments for your own database fleet, see Overview of Amazon RDS Blue/Green Deployments.

About the authors

Cross-account and cross-Region monitoring for Amazon RDS and Aurora with Database Insights

Haseena Shaik — Wed, 01 Jul 2026 15:55:11 +0000

This post shows you how to set up centralized cross-account and cross-Region monitoring for Amazon Relational Database Service (Amazon RDS) and Amazon Aurora databases using Amazon CloudWatch Database Insights. It covers all supported Amazon RDS engine types, including MySQL, PostgreSQL, SQL Server, Oracle, and MariaDB, as well as Amazon Aurora MySQL-Compatible and PostgreSQL-Compatible editions. Whether your databases are spread across two AWS accounts or ten, and across one Region or several, this walkthrough gives you a single monitoring account with visibility across your entire database fleet.

What Amazon CloudWatch Database Insights monitors

Amazon CloudWatch Database Insights is the next generation of database performance monitoring on AWS, building on the foundation of Performance Insights with a broader fleet-level view across your entire database fleet rather than one instance at a time. It tracks wait events, resource consumption, connection patterns, and query performance so you have the context to diagnose issues quickly. For supported engine types, see the introduction earlier in this post.

Database Insights captures:

Query performance and top SQL.
Wait events.
CPU, memory, storage, and network utilization.
Database connection counts.
Performance telemetry and logs.

This visibility helps you identify and troubleshoot database performance issues across linked accounts and configured AWS Regions from a single monitoring account.

When to use cross-account monitoring and its operational impact

Most teams running AWS at scale split workloads across separate accounts: production in one, staging in another, dev and shared services elsewhere. That structure makes sense for security and cost control. The problem shows up during troubleshooting, when correlating metrics across these environments means signing in to multiple accounts, switching Regions, and repeating the same steps before you can identify the root cause.

Global workloads make this harder still. If your databases run in us-east-1, eu-west-1, and ap-southeast-2, a single traffic spike can mean checking three Regions before you even know where to focus. Note that the Fleet Health dashboard supports up to three AWS Regions at once when cross-account and cross-Region monitoring is active. For large global setups, plan your monitoring layout with that limit in mind.

Cross-account Database Insights monitoring addresses this by routing visibility through a central monitoring account. That account gets read-only access to database metrics shared by your source accounts. It can view the shared monitoring data, but it can’t modify database resources in those accounts. This means observability only, with no configuration changes and no accidental deletions.

Two terms come up throughout this post, so it is worth defining them upfront:

Key terminology

Monitoring account – A central AWS account that views and interacts with observability data from multiple source accounts. It has read-only access and can’t modify or delete databases in source accounts.

Source account – An AWS account that contains Amazon RDS or Amazon Aurora database instances and generates observability data shared with the monitoring account.

With cross-account and cross-Region monitoring configured, operational teams can view Amazon RDS and Amazon Aurora database performance data from linked accounts and configured AWS Regions through a centralized monitoring account. This helps reduce operational overhead, simplifies troubleshooting workflows, and streamlines identification of performance anomalies without repeated console navigation or account switching. For example, an engineer investigating a slow query across three accounts and two Regions previously required six separate console sessions to gather metrics, wait events, and query data. With a centralized monitoring account, the same investigation runs from a single dashboard, reducing the number of console switches to one. For environments operating across multiple AWS accounts and Regions, this approach can meaningfully reduce mean time to identify database performance issues during incidents.

Prerequisites for cross-account monitoring

Before you start, check that the following are in place. Skipping these might cause the setup to fail partway through.

AWS Identity and Access Management (IAM) permissions

You need permissions to configure CloudWatch cross-account observability, create and manage IAM roles, create AWS CloudFormation stacks, and configure Amazon RDS and Amazon Aurora monitoring settings. See required IAM permissions for cross-account setup for full details on required permissions for cross-account monitoring.

AWS infrastructure

At least two AWS accounts: one monitoring account and one or more source accounts (see Key terminology earlier in this post for definitions).
Permissions to create and manage IAM roles across the monitoring and source accounts.
Amazon RDS or Aurora instances running supported engine versions. For the full list of supported engine versions, instance classes, and AWS Regions, see Amazon RDS for Database Insights and Amazon Aurora for Database Insights.
Database Insights enabled on the instances you want to monitor (either Standard mode or Advanced mode).

Note: Database Insights Advanced mode and Enhanced Monitoring incur additional charges. Review the Amazon CloudWatch pricing page before you enable these features.

CloudWatch configuration covered in this post

CloudWatch cross-account observability setup in monitoring account.
Cross-account cross-Region CloudWatch console access enabled.
Data sharing configured (Logs, Metrics, Traces, and Application Signals).
Appropriate permissions granted in the monitoring account (CloudWatch automatic dashboards and read-only access for Database Insights).

Optional (for advanced features)

Enhanced Monitoring enabled on your databases if you plan to use Advanced mode features like OS-level metrics.

Set up central monitoring

The monitoring account is where you spend most of your time after setup is complete. It’s the central account that aggregates read-only data from linked source accounts. Here’s how to configure it.

Open the Amazon CloudWatch console, select the AWS Region, and choose Settings in the left navigation pane.

This walkthrough uses the Asia Pacific (Sydney) ap-southeast-2 Region. The steps are identical for any other Region you want to configure.

In the Monitoring account configuration section, select Configure.

Configuring your monitoring account

Choose the data types to share with the monitoring account (for example, Logs, Metrics, and Traces and Application Signals).
Paste the source account IDs (SrcAcc1, SrcAcc2, SrcAcc3).
Choose Configure.

A confirmation message, “You have successfully enabled the monitoring account,” appears at the top of the settings page.

Before you link your source account, either download the AWS CloudFormation template, copy the URL, or collect information from the configured monitoring account. This includes the monitoring account sink Amazon Resource Name (ARN), which you use in source account linking.

There are two main ways to link source accounts. You can use AWS Organizations to link accounts in bulk, or link individual accounts one at a time. This walkthrough covers the individual account approach.

On the Resource to link accounts page, you have the following options:

Option 1: The quickest is to download the AWS CloudFormation template or copy the generated URL and share it with whoever manages your source accounts. Both options automate the linking steps on the source side.
Option 2 (manual): If you want to configure manually, expand the Configuration details section and copy the Monitoring account sink ARN and save it. You need this when linking the source account.

Linking the source account to the monitoring account

After you link the source account, enable cross-account cross-Region functionality. Use the CloudWatch console to set up a monitoring account to view cross-account CloudWatch data.

The Amazon CloudWatch cross-Region console setting is global, which means you only need to configure it once per account. It applies across configured Regions automatically after you save.

Under View cross-account cross-Region choose Configure to share your CloudWatch data.

Enabling cross-account and cross-Region monitoring

Choose one of the following options based on your use case and choose Save changes. View cross-account cross-Region is now enabled.

Note: After completing the monitoring account setup, open Database Insights from the Amazon CloudWatch console. In the left panel under Database Views, expand the Filters section and toggle on Enable cross-account cross-Region mode. For Select Region, choose the AWS Regions that you want to monitor. You can select up to three Regions simultaneously.

Configure source accounts

Run through the following steps in each source account that holds the Amazon RDS or Aurora databases that you want visible from the monitoring account.

From the source account, open the Amazon CloudWatch console, select the AWS Region where your Amazon RDS or Aurora databases reside, and choose Settings in the left navigation pane.
Open the Source account configuration section and choose Configure.

Setting up data sharing in the source account

Choose the data to share with the monitoring account (for example, Logs, Metrics, and Traces).
(Optional) To be more granular, filter the Logs and Metrics that you want to share with the monitoring account.
Either enter the monitoring account sink ARN, or use the AWS CloudFormation template or URL that you copied earlier when you configured the monitoring account. Then define a label to identify the source account. In the monitoring account, this label is displayed with data from that source account. The account label appears in charts and search experiences to help you identify account contexts.

Entering monitoring account details

After you choose Link, confirm that the monitoring account receives shared data from the source account. Confirm this action by entering “Confirm” in the confirmation dialog box.

The status changes to Linked in the Source account configuration section of the settings page.

Repeat the steps to add other source accounts from the same Region.

Note: If you have resources across AWS Regions, repeat the steps to configure both monitoring and source in that Region.

After you link the source account, enable cross-account cross-Region functionality. This step tells CloudWatch in the source account to make the configured data available to the monitoring account. Open the Amazon CloudWatch console in the source account and go to Settings.

Under Enable account switching select Configure to share your CloudWatch data.

Enabling account switching for source accounts

For Sharing, choose Specific accounts.
Enter the IDs of the accounts that you want to share data with (the monitoring account ID).
Choose the Permissions accordingly.
Choose Launch AWS CloudFormation template.
In the confirmation screen, enter “Confirm” and choose Launch template.
Select the I acknowledge checkbox.
Choose Create stack.

After the stack is created, the monitoring account has read-only access to the shared CloudWatch data in this source account. Behind the scenes, this deploys an IAM role that permits the monitoring account to view data without being able to make changes. If you later need to extend sharing to an entire AWS organization instead of individual accounts, you can modify that IAM role accordingly.

Sharing CloudWatch data with an entire AWS organization

Note: Cross-Region support is built into the feature automatically. You don’t need to perform additional configuration steps to display metrics from different Regions side by side on the same graph or dashboard. The one exception is alarms. You can’t create an alarm in one Region that monitors a metric in a different Region.

To integrate cross-account functionality with AWS Organizations, see the AWS Organizations integration guidance for CloudWatch cross-account observability.

If you face issues related to “access denied” or “don’t see an account list in the Amazon CloudWatch console,” refer to Troubleshooting your CloudWatch cross-account setup.

One thing to keep in mind for multi-Region setups: the monitoring account configuration has to be active in every AWS Region where you want to see data. The same goes for source accounts. Complete the linking and sharing steps in each Region separately. If a Region is missing from either side, metrics from that Region won’t show up on the dashboard.

Validate the configuration

After the setup is complete, open the Database Insights dashboard in the monitoring account to confirm everything is working. Sign in to the monitoring account, open the Amazon CloudWatch console, and choose Database Insights. You should see database resources from your linked source accounts listed there.

In the Amazon CloudWatch console, the Database Insights dashboard provides a unified view of Amazon RDS and Amazon Aurora database resources across linked AWS accounts and Regions.

Dashboard highlights

Account selector – Switch between linked source accounts from the monitoring account.
Region filter – View database resources across selected AWS Regions.
Fleet Health summary – Identify healthy, warning, or high-utilization database instances.
Performance metrics – Monitor key database metrics such as:
- Database load (average active sessions).
- Top SQL queries.
- Wait events.
- CPU, memory, and I/O utilization.
Database resource view – View Amazon RDS and Amazon Aurora instances from multiple accounts in a single centralized dashboard.

Note: The monitoring account provides read-only visibility into linked source account database resources.

Database Insights Fleet Health dashboard

Advanced Database Insights unlocks the Fleet Health dashboard. It provides a centralized view of database health, database load, resource utilization, and database inventory across linked accounts and configured Regions.

Fleet Health dashboard overview

From the Fleet Health view, you can select an individual database instance to get a more detailed breakdown. You can review database load, wait events, slow queries, and resource metrics through a centralized monitoring view, without ever leaving the monitoring account.

Exploring database details in Database Insights

Clean up resources

If you don’t need cross-account monitoring, here’s how to remove the configuration cleanly without affecting Database Insights in the individual source accounts.

Note: This cleanup removes the cross-account monitoring infrastructure but doesn’t disable Database Insights or Enhanced Monitoring on your database instances. If you enabled Advanced Database Insights or Enhanced Monitoring, those features continue to incur charges until you disable them on each instance.

In the source account, open CloudWatch.
Choose Settings.
Unlink the monitoring account to stop data sharing.
Delete the CloudFormation stack created during setup to remove IAM roles and cross-account permissions.
In the monitoring account, open CloudWatch.
Choose Settings.
Disable cross-account cross-Region viewing if no longer required.
Under Monitoring account configuration, choose Manage.
Remove the sink configuration to fully clean up the monitoring account setup.
Remove unused source account IDs from the monitoring account configuration.
Verify cleanup by confirming that source accounts no longer appear on the dashboard, associated IAM roles are removed, and no residual AWS CloudFormation stacks remain.
If you enabled Advanced Database Insights and want to stop incurring charges, disable Database Insights on each Amazon RDS or Aurora instance: in the Amazon RDS console, select the instance, choose Modify, and set Database Insights to Disabled.

Important: Removing cross-account monitoring doesn’t disable Database Insights within individual source accounts. Only the centralized monitoring view is removed. If you enabled Advanced Database Insights, charges continue until you disable it on each instance.

Conclusion

In this post, you learned how to configure cross-account and cross-Region monitoring for Amazon RDS and Aurora using CloudWatch Database Insights. Instead of switching between accounts during an incident, your team can check fleet health, database load, top SQL, and wait events from one place.

The read-only access model means you get visibility into shared metrics without weakening security controls in linked source accounts. For teams managing databases across many accounts and Regions, that trade-off is worth the setup time.

If you’re running large fleets across several Regions, keep the Fleet Health dashboard three-Region limit in mind when planning your layout. To configure your monitoring account, sign in to the Amazon CloudWatch console and navigate to Settings. For full reference material, see the CloudWatch Database Insights documentation.

About the authors

Amazon RDS log analysis: natural language queries with Kiro and MCP

Sravan Kumar Gogana — Wed, 01 Jul 2026 15:53:58 +0000

Database administrators and developers face a persistent challenge: efficiently analyzing vast amounts of database logs to identify performance bottlenecks, security issues, and operational anomalies. Amazon Relational Database Service (Amazon RDS) generates multiple log types that provide critical insights into database performance, errors, and security events. However, traditional log analysis methods often require complex queries, deep knowledge of log formats, and time-consuming manual investigation through the AWS Management Console or Amazon CloudWatch Log Insights.

In this post, we demonstrate an approach to review RDS logs using Kiro, an AI-powered conversational assistant combined with the Model Context Protocol (MCP) server from awslabs.cloudwatch-mcp-server. This solution transforms log analysis from a technical, query-based process into a natural language conversation, delivering actionable insights instantly.

Understanding RDS log types

Amazon RDS supports multiple database engines, each generating specific log types that you can publish to Amazon CloudWatch for centralized monitoring and analysis. The following table shows the log types available for each engine.

Database engine	Log types
Amazon RDS for MySQL	Error Logs, Slow Query Logs, General Logs, Audit Logs, IAM Database Authentication Error Logs
Amazon Aurora MySQL-Compatible Edition	Error Logs, Slow Query Logs, General Logs, Audit Logs, IAM Database Authentication Error Logs
Amazon RDS for MariaDB	Error Logs, Slow Query Logs, General Logs, Audit Logs
Amazon RDS for SQL Server	Error Logs, Agent Logs
Amazon RDS for PostgreSQL	PostgreSQL log, upgrade Logs, IAM Database Authentication Error Logs
Amazon Aurora PostgreSQL-Compatible Edition	PostgreSQL log, upgrade Logs, IAM Database Authentication Error Logs
Amazon RDS for Oracle	Alert Logs, Trace Logs, Audit Logs, Listener Logs, Oracle Management Agent Logs
Amazon RDS for Db2	Diagnostic Logs, Notify Logs

Solution overview

This solution uses Kiro with MCP to enable natural language querying of Amazon RDS logs stored in Amazon CloudWatch. The architecture consists of three layers. Kiro CLI communicates with Amazon Bedrock (Anthropic’s Claude model) to interpret your natural language input. The CloudWatch MCP server (awslabs.cloudwatch-mcp-server) translates those interpreted requests into AWS API calls. Amazon CloudWatch returns the relevant RDS log data for analysis.

Kiro

Kiro is an AI-powered conversational assistant designed to assist developers and IT professionals, including DBAs, with tasks ranging from coding to troubleshooting. Available on the command-line interface (CLI), Visual Studio Code, JetBrains IDEs, and mobile apps, it supports natural language queries to generate code, optimize resources, and interact with AWS services. With Kiro you can:

Write and optimize SQL queries based on natural language prompts.
Analyze database schemas and generate documentation or diagrams.
Suggest performance optimization by accessing execution plans and metrics.
Integrate with AWS services such as Amazon CloudWatch, Amazon RDS for SQL Server/MySQL/PostgreSQL, or Amazon Aurora.

MCP

The Model Context Protocol (MCP) developed by Anthropic is an open protocol that standardizes how applications provide context to large language models (LLMs). MCP enables Kiro to connect to external data sources and tools through lightweight MCP servers. The Kiro CLI acts as an MCP client, facilitating communication between the LLM and the MCP server to Amazon CloudWatch, as shown in the following figure.

Prerequisites

Estimated time to complete: 30 minutes.

Before you begin, make sure you have the following:

An AWS account. Create one if you don’t already have one.
Access to an AWS account through the AWS Management Console and the AWS Command Line Interface (AWS CLI).
Kiro CLI installed on your system.
The uv utility from Astral or the GitHub README.
Python version 3.14 installed using the uv utility.
```
uv python install 3.14
```
A database instance with logs configured to stream to Amazon CloudWatch for monitoring and analysis.
Note: For more details, see Amazon CloudWatch Logs billing and cost.
This solution works with multiple AI models available through Amazon Bedrock. For this post, we use Anthropic’s models in Amazon Bedrock. For more information, see Add or remove access to Amazon Bedrock foundation models.

Configuring MCP servers

The following steps walk you through configuring the AWS MCP server for use with Kiro CLI.

Set up your AWS profile:

aws configure --profile Profile Name
AWS Access Key ID: [Your access key]
AWS Secret Access Key: [Your secret key]
Default region name: [your-region]
Default output format: json

Verify your Kiro CLI version is 1.9.x or later to use MCP features. You can check the Kiro CLI version by running kiro-cli --version in your terminal.

Configure the Amazon CloudWatch MCP server. Kiro CLI supports two levels of MCP configuration:

Global configuration: ~/.kiro/settings/mcp.json applies across all your projects.
Local configuration: .kiro/settings/mcp.json is specific to the current project.

Choose your preferred configuration level and add the following CloudWatch MCP server configuration to the corresponding mcp.json file. Replace the AWS_PROFILE and AWS_REGION placeholders with your specific AWS profile and Region.

{
    "mcpServers": {
        "awslabs.cloudwatch-mcp-server": {
            "autoApprove": [],
            "disabled": false,
            "command": "uvx",
            "args": [
                "awslabs.cloudwatch-mcp-server@latest"
            ],
            "env": {
                "AWS_PROFILE": "[The AWS Profile Name you used in step 1 for AWS access]",
                "AWS_REGION": "Add your AWS Region",
                "FASTMCP_LOG_LEVEL": "ERROR"
            },
            "transportType": "stdio"
        }
    }
}

Start the conversation with the command kiro-cli. For more information, see Using chat on the command line. Use the /tools command to view available tools and capabilities.

The following screenshot shows Kiro CLI in action, analyzing error logs for an Amazon RDS for MySQL instance.

The following example shows a sample prompt and the response from Kiro CLI analyzing RDS error logs:

> Analyze error logs for RDS instance mysql8032 from last 7 days and identify top
  5 recurring errors with remediation steps

Let me fetch the RDS error logs for your MySQL instance. First, I'll check the
available log files.

Running aws cli command (using tool: aws):

Service name: rds
Operation name: describe-db-log-files
Parameters:
- db-instance-identifier: "mysql8032"
- filename-contains: "error"
Region: us-east-1
Label: List RDS log files for mysql8032

Allow this action? Use 't' to trust (always allow) this tool for the session. [y/n/t]:

> t

- Completed in 4.52s

> Here's the full analysis of RDS instance mysql8032 over the last 7 days.

================================================================================
Top 5 Recurring Errors/Warnings - RDS Instance mysql8032
================================================================================

1. [MY-013068] / [MY-013287] - Deprecated validate_password plugin syntax (4 occurrences)

The syntax 'validate password plugin' is deprecated and will be removed in a
future release. Please use validate_password component instead.

This fires on every shutdown and startup cycle.

- Migrate from the legacy plugin to the component-based architecture:

  UNINSTALL PLUGIN validate_password;
  INSTALL COMPONENT 'file://component_validate_password';

- Update your RDS custom parameter group: set validate_password parameters using
  the component syntax (e.g., validate_password.policy instead of
  validate_password_policy).
- This will become a hard error in a future MySQL release.

================================================================================

2. [MY-000067] - Unknown Group Replication variables (4 occurrences)

unknown variable 'loose-group_replication_enforce_update_everywhere_checks=1'
unknown variable 'loose-group_replication_single_primary_mode=0'

These appear on every startup. The loose- prefix prevents a crash, but they
indicate stale config.

- If you're not using Group Replication (and RDS MySQL doesn't support it),
  remove these from your custom parameter group:
  loose-group_replication_enforce_update_everywhere_checks
  loose-group_replication_single_primary_mode

- These were likely carried over from a migration or template. Clean them up via
  the RDS console or CLI:

  aws rds modify-db-parameter-group --db-parameter-group-name <your-pg> \
    --parameters "ParameterName=loose-group_replication_enforce_update_everywhere_checks,ParameterValue=,ApplyMethod=pending-reboot"

================================================================================

3. [MY-013640] - Self-signed CA certificate (2 occurrences)

CA certificate (rootCAcerts/rds-metadata/server.pem is self-signed)
Fires on every startup.

- This is for RDS-managed TLS. RDS uses its own CA file inside. No action needed
  unless you're enforcing custom CA validation.
- If your clients need to trust the RDS CA, download the appropriate CA bundle
  and configure your connection strings with --ssl-ca.
- Severity: safe to ignore.

================================================================================

4. [MY-013898] - Deprecated --ssl-fips-mode (2 occurrences)

'--ssl-fips-mode' is deprecated and will be removed in a future release.

- This is set by the RDS platform internally, not by your parameter group.
  You cannot directly change it.
- AWS will update this in a future RDS engine version.
- If you're on an older minor version, consider upgrading to the latest
  MySQL 8.0.x patch where this may be resolved.

================================================================================

5. Auto-start after exceeding max stop duration - Forced thread close
   (1 occurrence each, but operationally significant)

DB instance is being started due to it exceeding the maximum allowed time
being stopped.
Closing close of thread / user 'rdsadmin'

The instance was stopped for 7+ days and RDS auto-started it. During the
restart cycle, an rdsadmin thread was forcibly closed.

- RDS auto-starts stopped instances after 7 days. If you need it stopped
  longer, automate re-stopping with a Lambda + EventBridge rule:

  import boto3
  def lambda_handler(event, context):
      boto3.client('rds').stop_db_instance(DBInstanceIdentifier='mysql8032')

- The forced thread close of rdsadmin is benign - it's RDS's internal
  management user being cleaned up during shutdown. No action needed.
- If this instance isn't needed, consider snapshotting and deleting it to
  avoid charges.

================================================================================
Summary
================================================================================

| # | Error Code              | Severity | Occurrences | Action Required                            |
|---|-------------------------|----------|-------------|--------------------------------------------|
| 1 | MY-013068/MY-013287     | Warning  | 4           | Yes - migrate to validate_password component |
| 2 | MY-000067               | Warning  | 4           | Yes - clean up parameter group             |
| 3 | MY-013640               | Info     | 2           | No - expected RDS behavior                 |
| 4 | MY-013898               | Info     | 2           | No - RDS platform managed                  |
| 5 | Auto-start + thread close | Notice | 1 each      | Optional - automate stop or delete instance |

The instance is otherwise healthy - no client-level errors, no InnoDB corruption,
and backups are completing successfully (daily backups running consistently from
May 30-31). The two actionable items are cleaning up the parameter group
(items 1 and 2).

Important: Validate MCP recommendations against your managed service’s supported configuration. Not every suggested action is applicable or executable within its scope.

Example prompts using Kiro CLI

The following are example prompts you can use with Kiro CLI to analyze different RDS log types:

Error log: Analyze error logs for RDS instance [instance name] in [region] from last 7 days and identify top 5 recurring errors with remediation steps.
Audit log: Review audit logs for RDS instance [instance name] in [region] and identify failed logins, unauthorized access, and privilege escalation events.
Slow query logs: Analyze slow query logs for RDS instance [instance name] in [region] exceeding 5 seconds in the last 1 hour.
General log: Examine general logs for RDS instance [instance name] in [region] from last 24 hours and analyze connection patterns and DDL statements.
Agent log: Analyze SQL Server Agent logs for RDS instance [instance name] in [region] and identify failed jobs with optimization recommendations.
Diagnostic log: Review diagnostic logs for RDS instance [instance name] in [region] and identify critical system events with performance tuning recommendations.
Notify log: Analyze notify logs for RDS instance [instance name] in [region] and summarize administrative notifications and operational events.
IAM authentication error log: Analyze IAM authentication error logs for RDS instance [instance name] in [region] and detect any error patterns with security recommendations.
Upgrade log: Review upgrade logs for RDS instance [instance name] in [region] and provide comprehensive upgrade success report with lessons learned.
Alert log: Analyze Oracle alert logs for RDS instance [instance name] in [region] and identify ORA errors with troubleshooting steps.
Trace log: Examine Oracle trace logs for RDS instance [instance name] in [region] and analyze wait events with query optimization recommendations.
Listener log: Analyze Oracle listener logs for RDS instance [instance name] in [region] and track connection attempts with network configuration recommendations.
Oracle Management Agent log: Analyze Oracle Management Agent logs for RDS instance [instance name] in [region] and identify communication failures with troubleshooting steps.

Best practices

The following best practices cover two areas: operational guidance on how to safely act on AI-generated findings, and security guidance on how to protect sensitive log data when using Kiro and MCP for RDS log analysis.

Operational

Do not run any mitigation without first understanding its impact. This is the most important rule in this workflow. Kiro can surface findings, summarize logs, and propose remediation steps, but you must evaluate any suggested action before applying it.

Before executing a change based on an AI-generated recommendation:

Understand what the change does. Read the command, query, or parameter change in full.
Validate the assumption behind the recommendation. AI-assisted analysis is pattern-based. The root cause it suggests may be reasonable but not correct for your environment.
Test in non-production first whenever the change is mutating (for example, parameter group updates, index rebuilds, terminate session commands, schema changes).
Keep a human in the loop for any production action. Automation is a multiplier, not a substitute for judgment, especially on stateful systems like databases.

Security

When you implement natural language queries for RDS log analysis using Kiro and MCP, you must address security considerations to protect sensitive data and maintain compliance. Database logs can contain personally identifiable information (PII), authentication credentials, IP addresses, query patterns revealing business logic, and schema information that could expose vulnerabilities. The current architecture downloads logs to local workstations through CLI calls, creating risks including data breach exposure, lack of audit trails, and potential violations of compliance frameworks such as HIPAA, PCI-DSS, and GDPR.

To mitigate these risks, implement the following security controls:

Infrastructure security

Deploy Kiro on Amazon Elastic Compute Cloud (Amazon EC2) instances within the same VPC as your RDS instances rather than local laptops.
Implement security groups restricting access to only necessary ports and IP ranges.

Access control

Use IAM roles with least privilege permissions instead of long-term access keys.
Grant only rds:DescribeDBInstances, logs:DescribeLogGroups, logs:DescribeLogStreams, logs:FilterLogEvents.
Implement resource-based policies limiting access to specific RDS instances and log groups.

Cleanup

Follow these steps to remove the resources and configurations created during this walkthrough.

Configuration cleanup:

Remove or comment out the MCP server configuration from the mcp.json file.
Delete the AWS profile configurations if you created them specifically for this setup.

Software cleanup:

Uninstall the CloudWatch MCP server: kiro-cli mcp remove cloudwatch.
Remove Kiro CLI if no longer needed.

File cleanup:

Remove configuration files (mcp.json, AWS credentials).
Delete any generated reports or analysis outputs.

Warning: Before deleting analysis outputs, ensure you have archived any reports needed for compliance, auditing, or historical reference. Once deleted, this data cannot be recovered. Consider backing up critical analysis results to Amazon Simple Storage Service (Amazon S3) before deletion.

Although this walkthrough uses an Amazon RDS for MySQL instance, the same approach applies to all RDS and Aurora engine types listed in the log types table.

Conclusion

In this post, you learned how to use Kiro to query Amazon RDS logs through Amazon CloudWatch using natural language. With this approach, you can discover resources (list all RDS log groups), perform time-based analysis (analyze logs between specific date ranges), investigate performance issues (identify slow queries exceeding configurable thresholds such as 10 seconds), diagnose database-specific problems (analyze SQL Server deadlocks), and generate comprehensive error reports for specific instances. This eliminates the need for complex CloudWatch Log Insights queries or manual console navigation, providing actionable insights through conversational commands that target specific instances, time periods, and performance criteria.

To get started, install the Kiro CLI and configure the CloudWatch MCP server. For more information about publishing RDS logs to Amazon CloudWatch, see Publishing database logs to Amazon CloudWatch Logs.

About the authors

Enable self-managed AD Kerberos authentication with Amazon RDS for Db2

Vikram Khatri — Tue, 30 Jun 2026 18:31:25 +0000

Amazon Relational Database Service (Amazon RDS) for Db2 supports Kerberos authentication through your own Active Directory domain, whether it runs on AWS, on-premises, or in another cloud. Connecting Amazon RDS for Db2 directly to a customer-managed Microsoft Active Directory gives you centralized authentication and single sign-on for your database users. The setup hinges on delegating a precise set of AD permissions to a dedicated service account and passing those credentials to RDS securely.

In this post, we show how to configure Windows Active Directory for Amazon RDS for Db2 with Kerberos authentication and how to validate the setup from a domain-joined client. We walk through the end-to-end process published in the aws-samples/sample-rds-db2-tools repository:

Creating a dedicated OU and service account.
Delegating the required AD permissions for Amazon RDS for Db2 to interact with your AD.
Storing credentials in an AWS Key Management Service (AWS KMS)-encrypted secret in AWS Secrets Manager.
Verifying the configuration with a PowerShell script that translates raw GUIDs into human-readable permission names.

We then show how to validate the configuration from a domain-joined Db2 client:

launching an EC2 instance.
joining it to the domain.
installing the Db2 Runtime Client.
connecting with a Kerberos ticket (no password).

We also cover one non-obvious step that catches most people: granting servicePrincipalName read/write on User objects requires ADSI Edit (adsiedit.msc), not the standard Active Directory Users and Computers snap-in.

Solution overview

The following diagram shows the architecture for authenticating Amazon RDS for Db2 against a customer-managed Active Directory.

In this design, RDS for Db2 joins your self-managed AD domain directly. A dedicated AD service account user id and password is stored in AWS Secrets Manager and encrypted with an AWS KMS key. During the domain join, RDS retrieves these credentials from Secrets Manager to register the instance in your directory. A domain-joined Db2 client obtains a Kerberos ticket (TGT) from the AD KDC, then connects to RDS for Db2 using that ticket with no password exchanged. Connectivity between RDS, the client, and the domain controllers relies on the standard AD ports — DNS (53), Kerberos (88, 464), LDAP (389, 3268), and the RPC dynamic range (49152–65535).

What you will build

A dedicated OU and AD service account scoped to RDS for Db2 and a delegated ACLs with the permissions RDS for Db2 requires.
An AWS KMS key and Secrets Manager secret to pass credentials to RDS securely.
An Amazon RDS for Db2 instance configured with self-managed AD variables.
A tested connection using an AL2023 EC2 domain-joined client.

1. Delegate AD permissions

Create a dedicated OU and service account, then grant the necessary access control entries (ACEs) using the Delegation of Control Wizard and ADSI Edit:

Create/Delete User and Computer objects in the OU.
Reset Password, Read/Write msDS-SupportedEncryptionTypes on descendant User objects (through the Delegation Wizard).
Read/Write servicePrincipalName on descendant User objects — use ADSI Edit (adsiedit.msc).

Or run the provided PowerShell script which applies all nine ACEs in one idempotent pass:

.\Grant-ADDomainJoinPrivileges.ps1 `
    -ServiceAccount "CORP\rdsdb2svc" `
    -TargetOU "OU=RDSDb2,DC=company,DC=com" `
    -Verbose

Verify with the included human-readable ACL viewer script:

.\Show-OUDelegation.ps1 -OU "OU=RDSDb2,DC=company,DC=com" -SamAccountName rdsdb2svc

Replace ServiceAccount name with your preferred name or keep it as rdsdb2svc. In TargetOU, change OU name to your preferred name or keep it as RDSDb2 but you must change the DC names to match the domain name of your AD. For example: "OU=RDSDb2,DC=corp,DC=mycompany,DC=com".

The repository documents two equivalent methods for this delegation: a step-by-step UI walkthrough using the ADUC Delegation of Control Wizard and ADSI Edit, and the PowerShell script shown above. Both produce the same ACL, so pick whichever fits your operations model. Before you start, read the repository README to understand the full step-by-step flow end to end, then follow the detailed UI walkthrough (or the PowerShell guide) for the delegation itself.

Full walkthrough in the repo: README-UI.md

2. Create a KMS key and Secrets Manager secret

Create a symmetric KMS key and store the service account credentials as a Secrets Manager secret with a resource policy that allows RDS to read it:

aws kms create-key --key-usage ENCRYPT_DECRYPT --key-spec SYMMETRIC_DEFAULT ...

aws secretsmanager create-secret --name "rds-db2-self-managed-ad-secret" ...

Full walkthrough in the repo: README-KMS-Secret.md

3. Store the service account credentials in AWS Secrets Manager

Store the AD service account username (sAMAccountName only — no DOMAIN\ prefix) and password as a Secrets Manager secret, encrypted with the KMS key from the previous step, and attach a resource policy that lets RDS read it. RDS for Db2 retrieves this secret during the domain join:

aws secretsmanager create-secret --name "rds-db2-self-managed-ad-secret" ...

Full UI and CLI instructions are in the repo: README-KMS-Secret.md

4. Configure the RDS for Db2 instance

Modify or create the DB instance, supplying the Secret ARN in the Directory section. For CLI:

aws rds modify-db-instance \
--db-instance-identifier your-db-instance \
--domain-fqdn "company.com" \
--domain-ou "OU=RDSDb2,DC=company,DC=com" \
--domain-auth-secret-arn "arn:aws:secretsmanager:..." \
--domain-dns-ips "<dc-ip-1>" "<dc-ip-2>" \
...

Full walkthrough in the repo: README-RDS-Db2.md

5. Test the connection from a domain-joined Db2 client

Launch an Amazon Linux 2023 EC2 instance in the same VPC, join it to your AD domain, and install the Db2 Runtime Client:

Step 1 – Join to AD

Installs realmd, sssd, adcli, and krb5-workstation:

curl -sL https://bit.ly/domainjoin | bash
source joindomain.sh

Step 2 – Install Db2 RT client

REGION=<region>
./db2-driver.sh

Step 3 – Configure DSN entries

Auto-detects domain join, writes both local-auth and Kerberos DSNs:

sudo su - db2inst1
REGION=<region> source db2client-configure.sh

Step 4 – Connect

kinit user@COMPANY.COM
db2 terminate
db2 "connect to RDSAKS"  # Kerberos, no password
db2 "connect to RDSAS user admin using '$MASTER_USER_PASSWORD'"  # local auth

Full walkthrough in the repo: README-Db2-Client.md

Important points

ADSI Edit, not ADUC, for SPN on User objects. The ADUC snap-in filters servicePrincipalName from the User objects attribute list. ADSI Edit shows the full schema. This is the most common stumbling block.
Scope matters. Granting SPN read/write on Computer objects instead of User objects produces an ACL that looks correct but fails at runtime. Amazon RDS for Db2 checks permissions on the User object it creates, not a Computer object.
Verify with PowerShell, not the UI. The ADUC Security tab collapses attribute-level ACEs into blank rows. The provided Show-OUDelegation.ps1 script resolves GUIDs to names and makes the ACL readable.
Username format in Secrets Manager. The secret must contain the sAMAccountName only — no DOMAIN\ prefix. Including the domain prefix causes instance creation to fail.

Networking

Multiple networking scenarios exist between AD and RDS. Refer to this guide for the following three network topologies:

AD and RDS in the same VPC.
AD hosted in Microsoft Azure (VPN or AWS Direct Connect + ExpressRoute).
AD in a different VPC or AWS account (VPC Peering or AWS Transit Gateway).

The key ports are DNS (53), Kerberos (88, 464), LDAP (389, 3268), and RPC dynamic ports (49152–65535). Missing the RPC range is the most common cause of intermittent failures after a successful initial join.

Clean up

To avoid ongoing charges and to remove the principals this walkthrough created in your Active Directory, tear the resources down in the reverse order you created them:

Terminate the EC2 Db2 client — leave the domain (realm leave) so it removes its own computer object, then terminate the instance and delete any dedicated IAM instance profile/role.
Remove self-managed AD from RDS for Db2 — either detach the domain with aws rds modify-db-instance --disable-domain (and reboot) to keep the instance, or delete the instance with aws rds delete-db-instance if you no longer need it.
Delete the Secrets Manager secret holding the service account credentials.
Schedule the KMS key for deletion and remove its alias. Do not delete the key while any instance still uses it for storage encryption, or its snapshots become unrecoverable.
Remove the AD objects — delete the service account and the dedicated OU (recursively, to clear any computer/user objects RDS created during domain joins), or reset the OU ACL if you want to keep the OU but drop the delegated permissions.

Step-by-step commands for each of these, including console and CLI options and verification checks, are in the cleanup guide in the GitHub repository.

Conclusion

In this post, we showed how to enable Kerberos authentication for Amazon RDS for Db2 using a self-managed Active Directory. We delegated the exact nine AD permissions to a dedicated service account, passed its credentials to RDS securely through a KMS-encrypted Secrets Manager secret, joined the DB instance to the domain, and verified the result end-to-end from a domain-joined Db2 client connecting with a Kerberos ticket and no password. Along the way we called out the steps that most commonly trip people up: using ADSI Edit for the servicePrincipalName ACE, scoping permissions to User objects, and opening the RPC dynamic port range.

The PowerShell scripts, CLI commands, and verification tooling referenced throughout are available in the aws-samples/sample-rds-db2-tools GitHub repository. Clone the repo, adapt the variables to your domain, and you can have centralized, single sign-on authentication for your RDS for Db2 workloads running quickly. To learn more, see Kerberos authentication for Amazon RDS for Db2 in the Amazon RDS User Guide. If you have questions or suggestions, leave a comment.

About the authors

Manage long-running transactions for AWS DMS performance

Chirantan Pandya — Tue, 30 Jun 2026 16:29:54 +0000

In this post, we show you how long-running transactions affect AWS Database Migration Service (AWS DMS) change data capture (CDC) latency, walk through monitoring approaches for Oracle, PostgreSQL, MySQL, and SQL Server, and provide ready-to-use scripts to identify and resolve problematic transactions before they impact your replication performance.

Long-running transactions are one of the most common and often overlooked causes of rising CDC source latency during AWS Database Migration Service (AWS DMS) migrations. When a database transaction stays open for hours instead of seconds, DMS must hold its replication position, buffering all subsequent changes until that transaction commits. The result is a cascading delay across all pending changes that can stall your migration pipeline right when low latency matters most, during your final cutover window. At that point, every minute of replication delay translates directly to extended downtime and higher migration risk.

By detecting these transactions early, you can avoid unnecessary replication instance scaling, reduce cutover downtime, and keep migration on schedule.

DMS source latency

When working with AWS DMS, source latency (CDCLatencySource) is a key metric. It measures the delay between the commit time of the last event captured from the source endpoint and the current system timestamp of the replication instance. In other words, it measures how far behind the replication instance is from the latest committed change at the source. Lower latency indicates healthier replication performance.

Several factors contribute to elevated source latency: an uncommitted transaction prevents DMS from applying/forwarding changes to the target, network throughput limitations between source and replication instance, source database performance bottlenecks, and heavy workload during peak processing times. These issues often interconnect, compounding their impact on replication performance.

To troubleshoot effectively, analyze latency patterns over time at hourly, daily, and weekly intervals. This helps you distinguish between temporary spikes during expected peak periods and systematic issues that indicate underlying problems.

The path to resolving source latency isn’t always straightforward, but it typically involves investigating six key areas:

High database workload
Network connectivity issues
Insufficient system resources
Large transaction volumes
Long-running transactions
Archival log processing delays

Each of these factors requires a different troubleshooting approach. Pinpointing the exact root cause before initiating remediation is essential for an effective fix. For the scope of this post, we focus on long-running transactions.

Long-running transactions

Long-running transactions are database operations that remain open for minutes or even hours. In AWS DMS, they can cause significant replication delays.

When a transaction stays open, DMS must retain all transaction log entries from the point that transaction began. For example, if a transaction opens at 9:00 AM and does not commit until 11:00 AM, DMS preserves two hours of log data. This increases storage consumption and memory usage on the replication instance.

The open transaction also blocks DMS from advancing its log-reading position. All changes that commit after the open transaction must wait in queue, causing CDC source latency to climb steadily. In high-transaction environments, a single long-running transaction can create a backlog that takes hours to clear after the transaction finally completes.

Common causes include complex queries processing large datasets, batch operations that run longer than expected, application code that holds transactions open unnecessarily, and missing commit or rollback statements. Each cause calls for a different troubleshooting approach.

To reduce the impact, set up automated monitoring to detect open transactions that exceed a defined threshold. Keep transaction duration as short as possible by optimizing application code and committing frequently. Avoid user interaction within open transactions. Also make sure the replication instance has sufficient storage and computing resources to handle temporary backlogs.

Some long-running transactions are unavoidable for certain business operations. The goal is not to eliminate them entirely but to detect them early and prevent them from stalling your replication pipeline.

How long-running transactions impact DMS source latency

Long-running transactions directly impact your AWS DMS CDC source latency. Identifying them is the first step to troubleshooting performance issues.

How DMS processes transactions

When DMS operates in CDC mode, it captures changes from the source database’s transaction logs (such as Oracle redo logs, SQL Server transaction logs, PostgreSQL WAL, or MySQL binlogs). However, DMS can only capture and replicate committed transactions to maintain data consistency.

A bottleneck forms because DMS cannot skip past the open transaction:

CDCLatencySource increases from seconds to potentially hours.
CDCIncomingChanges accumulates.
CDCChangesMemorySource or CDCChangesDiskSource increases as DMS buffers pending changes.

For instance, consider a batch job that starts a transaction at 10:00 AM and runs for 2 hours without committing. Any changes captured during that window cannot be applied to the target until the transaction commits at 12:00 PM. The DMS replication task is the unit of work responsible for reading the source transaction log and applying changes to the target. During this period, the task buffers all subsequent changes it has read, regardless of whether those changes involve related tables. It maintains transactional ordering within its scope, so no captured changes can be applied until the open transaction completes at noon. This creates a significant replication delay that grows proportionally with the transaction’s duration.

Diagnostic approach

Identifying long-running transactions helps you differentiate between DMS configuration issues and database-level problems when diagnosing source latency. This identification process follows a systematic approach:

Initial investigation: Monitor CDCLatencySource to identify when replication is falling behind, then investigate replication instance metrics and source database open transactions to determine the root cause.
Root cause analysis: Use monitoring scripts to identify active long-running transactions and correlate transaction start times with CDCLatencySource spikes. Track specific identifiers such as Oracle SCN (such as 12345678) or SQL Server LSN (such as 0000000A:00000B00:0001).
Targeted solutions: Once identified, you can request a commit or rollback from the application team, optimize the logic to use smaller transaction batches, or implement transaction timeout policies.

Without identifying these long-running transactions, teams often waste time tuning DMS task settings (like MemoryLimitTotal or parallel threads) or scaling up replication instances unnecessarily, when the actual solution requires addressing the source database’s transaction management practices. The monitoring scripts thus serve as a diagnostic bridge between observing DMS performance symptoms and pinpointing their database-level causes.

Queries to identify long-running transactions

To proactively manage this issue, implementing alerts for long-running transactions is essential. Here are the monitoring queries for each database engine:

For Oracle source engine:

SELECT
'SESSION_INFO: ' ||
RPAD('SID: ' || a.sid || ' - ' || a.serial#, 20) ||
RPAD('Username: ' || a.username, 20) ||
RPAD('OS User: ' || a.osuser, 15) ||
RPAD('Program: ' || a.program, 30) ||
RPAD('Machine: ' || a.machine, 25) ||
RPAD('Status: ' || a.status, 15) ||
'Duration: ' || ROUND((SYSDATE - a.logon_time)*24*60, 2) || ' minutes' || CHR(10) ||
'SQL_ID: ' || NVL(a.sql_id, 'N/A') || CHR(10) ||
'WAIT_INFO: ' || NVL(w.event, 'None') || CHR(10) ||
'----------------------------------------'
FROM
v$session a
LEFT JOIN v$session_wait w ON a.sid = w.sid
WHERE
a.type != 'BACKGROUND'
AND a.username != 'RDSADMIN'
AND a.status = 'ACTIVE'
AND (SYSDATE - a.logon_time)*24*60 > 15
ORDER BY
(SYSDATE - a.logon_time) DESC;

For PostgreSQL source engine:

SELECT
pid,
usename,
datname,
client_addr,
state,
ROUND(EXTRACT(EPOCH FROM (now() - xact_start))/60, 2) AS minutes_running,
query_id,
wait_event_type,
wait_event
FROM pg_stat_activity
WHERE xact_start IS NOT NULL
AND EXTRACT(EPOCH FROM (now() - xact_start))/60 > 15
AND pid != pg_backend_pid()
ORDER BY xact_start ASC;

For MySQL source engine:

SELECT
CONCAT('PID: ', p.id,
' | User: ', p.user, '@', p.host,
' | DB: ', IFNULL(p.db, 'None'),
' | Duration: ', p.time, 's',
' | State: ', IFNULL(p.state, 'N/A'),
' | Locked Rows: ', IFNULL(t.trx_rows_locked, 0),
' | Modified Rows: ', IFNULL(t.trx_rows_modified, 0),
' | Query: PID_', p.id)
FROM information_schema.processlist p
LEFT JOIN information_schema.innodb_trx t ON p.id = t.trx_mysql_thread_id
WHERE p.time > 900 -- 15 minutes (THRESHOLD_MINUTES * 60)
AND p.command != 'Sleep'
AND p.command != 'Daemon'
AND p.user NOT IN ('event_scheduler', 'rdsadmin', 'system user')
ORDER BY p.time DESC;

For SQL Server source engine:

SELECT
s.session_id as [SessionID],
db_name(r.database_id) as [Database],
s.login_name as [Login],
s.host_name as [Host],
s.program_name as [Program],
CAST(DATEDIFF(MINUTE, s.last_request_start_time, GETDATE()) as VARCHAR) as [MinutesRunning],
r.status as [Status],
r.command as [Command],
r.wait_type as [WaitType],
CAST(r.percent_complete as VARCHAR) + '%' as [PercentComplete],
r.session_id as [SessionRef]
FROM sys.dm_exec_sessions s
INNER JOIN sys.dm_exec_requests r ON s.session_id = r.session_id
WHERE s.is_user_process = 1
AND DATEDIFF(MINUTE, s.last_request_start_time, GETDATE()) > 15
ORDER BY [MinutesRunning] DESC

Monitoring scripts

We’ve developed monitoring scripts for Oracle, PostgreSQL, MySQL, and SQL Server. These scripts alert when long-running transactions stall DMS replication. They detect and alert on long-running transactions (default: 15 minutes). Each script is designed for its specific database engine while following a consistent monitoring framework.

The full scripts are available on GitHub: sample-dms-long-running-transaction-detection.

Infrastructure requirements

Linux / Unix environment (bash shell required)
Appropriate database client tools (sqlplus, psql, mysql, or sqlcmd)
An Amazon Simple Notification Service (Amazon SNS) topic for receiving alert notifications
AWS Command Line Interface (AWS CLI) configured with IAM permissions for SNS publish
Proper database access permissions
Network connectivity from the monitoring host to the database endpoint

Common features across all scripts

All four scripts share a consistent design pattern with the following features:

Feature	Description
Interactive Prompts	All connection parameters are collected through interactive prompts at runtime. Credentials are securely retrieved from AWS Secrets Manager
TLS/SSL Encryption	All database connections are encrypted by default: TCPS for Oracle, `-N` flag for SQL Server, `sslmode=require` for PostgreSQL, `--ssl-mode=REQUIRED` for MySQL
AMAZON SNS Alerts	Alerts are published to a configurable AMAZON SNS topic, enabling distribution to multiple subscribers (email, SMS, AWS Lambda, and more)
Prerequisite Checks	Each script validates that required tools (AWS CLI, database client) are installed before proceeding
False Positive Filtering	Internal/system processes and client warnings are filtered out to prevent false alerts

Core functions

Each script implements these core functions:

collect_inputs(): Interactive prompt-based parameter collection.
validate_inputs(): Input validation and prerequisite checks.
check_long_sessions() / check_long_transactions(): Core monitoring query execution.
send_alert(): SNS alert publishing with formatted message.
log_message(): Timestamped logging to file.
build_*_connstr() / build_*_args(): TLS/SSL connection string construction.
main(): Orchestration with configuration summary and log directory setup.

Script details by engine

Oracle long-running session monitor

This bash script automatically detects Oracle database sessions running longer than the configured threshold and publishes alerts to Amazon SNS. It connects using the TCPS protocol for TLS/SSL encryption, with optional Oracle Wallet support for certificate-based authentication.

Usage:

bash monitor_long_running_sessions_ORCL.sh

Interactive prompts:

============================================
Oracle Long Running Session Monitor
============================================
AWS Region [us-east-1]:
SNS Topic ARN: arn:aws:sns:us-east-1:123456789012:MyTopic
Oracle SID: ORCL
DB Hostname: mydb.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com
DB Port (TCPS) [2484]:
DB Username: admin
DB Password:
Threshold in minutes [15]:
Configuration:
OS User: ec2-user
Region: us-east-1
Host: mydb.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com:2484 (TCPS/SSL)
SID: ORCL
Threshold: 15 min
Log: /home/ec2-user/Oracle_long_running_session/long_sessions.log
Starting monitoring...

TLS/SSL configuration: The script uses TCPS protocol by default (port 2484). For Oracle Wallet-based SSL, set the following environment variables before running:

export ORACLE_WALLET_DIR=/path/to/wallet
export ORACLE_TNS_ADMIN=/path/to/tns_admin
export ORACLE_SSL_SERVER_DN="CN=mydb.rds.amazonaws.com"

Key filters: Excludes BACKGROUND processes and the RDSADMIN user. Only monitors ACTIVE sessions.

PostgreSQL long-running transaction monitor

This bash script automatically detects PostgreSQL transactions exceeding the configured threshold and publishes alerts to Amazon SNS. It enforces TLS/SSL via the sslmode connection parameter.

Usage:

bash monitor_long_running_transactions_PG.sh

Interactive prompts:

============================================
PostgreSQL Long Running TX Monitor
============================================
AWS Region [us-east-1]:
SNS Topic ARN: arn:aws:sns:us-east-1:123456789012:MyTopic
PostgreSQL hostname: mydb.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com
PostgreSQL port [5432]:
Database name [postgres]:
DB Username: admin
DB Password:
Threshold in minutes [15]:
Configuration:
OS User: ec2-user
Region: us-east-1
Host: mydb.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com:5432 (SSL: require)
Database: postgres
Threshold: 15 min
Log: /home/ec2-user/Postgres_long_running_transaction/long_transactions.log
Starting monitoring...

TLS/SSL configuration: Default sslmode=require. For stricter verification, set environment variables:

export PG_SSLMODE=verify-full
export PG_SSLROOTCERT=/path/to/rds-combined-ca-bundle.pem

False positive prevention: The script separates stderr from stdout to prevent psql client warnings (such as libpq.so version messages) from being treated as transaction data.

MySQL long-running transaction monitor

This bash script automatically detects MySQL transactions running longer than the configured threshold, providing detailed reports and publishing alerts to Amazon SNS. It enforces TLS/SSL via the --ssl-mode parameter.

Usage:

bash monitor_long_running_transactions_MYSQL.sh

Interactive prompts:

============================================
MySQL Long Running TX Monitor
============================================
AWS Region [us-east-1]:
SNS Topic ARN: arn:aws:sns:us-east-1:123456789012:MyTopic
MySQL hostname: mydb.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com
MySQL port [3306]:
Database name [mysql]:
DB Username: admin
DB Password:
Threshold in minutes [15]:
Configuration:
OS User: ec2-user
Region: us-east-1
Host: mydb.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com:3306 (SSL: REQUIRED)
Database: mysql
Threshold: 15 min
Log: /home/ec2-user/MySQL_long_running_transaction/mysql_monitor.log
Starting monitoring...

TLS/SSL configuration: Default --ssl-mode=REQUIRED. For stricter verification with the RDS CA bundle:

export MYSQL_SSL_MODE=VERIFY_IDENTITY
export MYSQL_SSL_CA=/path/to/rds-combined-ca-bundle.pem

False positive prevention: The script filters out MySQL internal processes (event_scheduler, rdsadmin, system user, Daemon commands) and strips the mysql: [Warning] Using a password on the command line interface can be insecure warning from output.

SQL Server long-running transaction monitor

This bash script automatically detects SQL Server transactions exceeding the configured threshold and publishes alerts to Amazon SNS. It enforces TLS/SSL via the -N flag with sqlcmd.

Usage:

bash monitor_long_running_transactions_MSS.sh

Interactive prompts:

============================================
SQL Server Long Running TX Monitor
============================================
AWS Region [us-east-1]:
SNS Topic ARN: arn:aws:sns:us-east-1:123456789012:MyTopic
SQL Server hostname: mydb.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com
SQL Server port [1433]:
Database name [master]:
DB Username: admin
DB Password:
Threshold in minutes [15]:
Configuration:
OS User: ec2-user
Region: us-east-1
Server: mydb.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com:1433 (TLS/SSL)
Database: master
Threshold: 15 min
Log: /home/ec2-user/sql-server-long-running-transction/long_transactions.log
Starting monitoring...

TLS/SSL configuration: The script uses -N (encrypt) and -C (trust server certificate) by default. For strict certificate verification, install the RDS CA bundle into the system trust store and set:

# Install RDS CA bundle into system trust store
wget https://truststore.pki.rds.amazonaws.com/global/global-bundle.pem -O /tmp/rds-global-bundle.pem
sudo cp /tmp/rds-global-bundle.pem /etc/pki/ca-trust/source/anchors/rds-global-bundle.pem
sudo update-ca-trust extract
# Then disable trust server cert
export SSL_TRUST_SERVER_CERT=no

False positive prevention: Filters to is_user_process = 1 only and strips (N rows affected) noise from sqlcmd output. SQL text is truncated to 200 characters.

When a long-running transaction is detected, the script publishes a formatted alert to the configured SNS topic. Here is an example alert from the Oracle monitor:

Oracle Long Running Session Alert
================================
Database: ORCL
Host: mydb.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com
Port: 2484 (TCPS/SSL)
Threshold: 15 minutes
Connection: TLS/SSL Encrypted (TCPS)
Time: Fri Mar 20 18:00:00 UTC 2026
Long Running Sessions:
---------------------
SESSION_INFO: SID: 142 - 12345
Username: APPUSER
OS User: oracle
Program: JDBC Thin Client
Machine: app-server-01
Status: ACTIVE
Duration: 45.23 minutes
SQL_ID: 4qx7m2k5bv1rz
WAIT_INFO: db file sequential read
----------------------------------------
Note: Please investigate these sessions if they are causing any performance issues.

All subscribers to the SNS topic (email, SMS, Lambda, and more) receive this alert, so the operations team can respond quickly.

Sample log output

# Oracle
2026-03-20 18:00:00 - Starting Oracle long-running session check (TLS/SSL: TCPS port=2484)
2026-03-20 18:00:05 - Long running sessions detected and alert published to SNS
2026-03-20 18:00:05 - Alert published to SNS topic: arn:aws:sns:us-east-1:123456789012:MyTopic
2026-03-20 18:00:06 - Completed Oracle long-running session check
# PostgreSQL
2026-03-20 18:00:00 - Starting PostgreSQL long-running transaction check (TLS/SSL: sslmode=require)
2026-03-20 18:00:03 - No long running transactions detected
2026-03-20 18:00:03 - Completed PostgreSQL long-running transaction check
# MySQL
2026-03-20 18:00:00 - Starting MySQL long-running transaction check (TLS/SSL: ssl-mode=REQUIRED)
2026-03-20 18:00:02 - No long running transactions detected
2026-03-20 18:00:02 - Completed MySQL long-running transaction check
# SQL Server
2026-03-20 18:00:00 - Starting SQL Server long-running transaction check (TLS/SSL enabled)
2026-03-20 18:00:04 - No long running transactions detected
2026-03-20 18:00:04 - Completed SQL Server long-running transaction check

Scheduling with crontab

For continuous monitoring, configure the scripts as cron jobs. Since the scripts use interactive prompts, you can create a wrapper script that pipes the inputs, or modify the scripts to retrieve credentials from AWS Secrets Manager. Alternatively, for crontab usage, you can set the variables directly in a non-interactive wrapper:

# Example crontab entry (runs every 15 minutes)
# Edit with: crontab -e
*/15 * * * * /home/ec2-user/run_oracle_monitor.sh >> /home/ec2-user/cron_oracle_monitor.log 2>&1
*/15 * * * * /home/ec2-user/run_pg_monitor.sh >> /home/ec2-user/cron_pg_monitor.log 2>&1
*/15 * * * * /home/ec2-user/run_mysql_monitor.sh >> /home/ec2-user/cron_mysql_monitor.log 2>&1
*/15 * * * * /home/ec2-user/run_mss_monitor.sh >> /home/ec2-user/cron_mss_monitor.log 2>&1

Note: The scripts use bash-specific syntax (such as, [[ =~ ]] for regex matching). Always invoke them with bash script_name.sh rather than sh script_name.sh.

Demonstration

We are going to perform a demo of the script in action by choosing Oracle as our source endpoint and check if the script is indeed able to capture the long-running sessions. For the demo we have used Amazon Relational Database Service (Amazon RDS) for Oracle 19c Enterprise Edition as source:

Create a full load and CDC task to load the EVENTS_LOB table. Table structure:

CREATE TABLE EVENTS_LOB(
EVENTID INTEGER NOT NULL,
SOURCE VARCHAR2(500),
TYPE VARCHAR2(500),
EVENT_DATE DATE,
DESCRIPTION CLOB
);

Schedule the monitoring script in crontab to execute every 15 minutes:

[ec2-user@ip-10-0-0-40 ~]$ crontab -l
*/15 * * * * /home/ec2-user/run_oracle_monitor.sh >> /home/ec2-user/cron_oracle_monitor.log 2>&1

Start the DMS task and have it completed the full load and move on to the CDC phase of the migration task.

Generate high volume of DML in the source with commit and wait for the latency spike to happen:

BEGIN
FOR loop_counter IN 1..80500 LOOP
INSERT INTO "EVENTS_LOB" (EVENTID, SOURCE, TYPE, EVENT_DATE, DESCRIPTION)
VALUES (
loop_counter,
'Field 2 This is dummy data meant for testing purpose alone.',
'Field 3 This is dummy data meant for testing purpose alone.',
TO_DATE(TRUNC(DBMS_RANDOM.VALUE(
TO_CHAR(DATE '2000-01-01','J'),
TO_CHAR(DATE '9999-12-31','J'))), 'J'),
'Field 5 This is dummy data meant for testing purpose alone. Donot drop this table.'
);
DBMS_SESSION.SLEEP(5);
END LOOP;
COMMIT;
END;
/

Observe the DMS task Amazon CloudWatch metrics. You will see an increase in CDCLatencySource as the long-running transaction holds up replication.
Execute the monitoring script. The script detects the long-running session and publishes an alert to the SNS topic with full session details including SID, username, duration, SQL text, and wait events.

SNS subscribers receive the alert. All subscribers to the configured SNS topic (email, SMS, Lambda, and more) receive the formatted alert, so the operations team can investigate and take action. A sample alert is included in the following section for reference:

Oracle Long Running Session Alert
================================
Database: ORCL
Host: database-4.XXXXXXXX.us-east-1.rds.amazonaws.com
Port: 2484 (TCPS/SSL)
Threshold: 2 minutes
Connection: TLS/SSL Encrypted (TCPS)
Time: Tue Jun 9 02:32:05 UTC 2026
Long Running Sessions:
---------------------
SESSION_INFO: SID: 123 - 45678 Username: ADMIN OS User: rdsdb Program: sqlplus@ip-10-0-1-100 Machine: ip-10-0-1-100.ec2 Status: ACTIVE Duration: 3.5 minutes
SQL_ID: 8xk2m3nq9pqrs
WAIT_INFO: SQL*Net message from client
----------------------------------------
Note: Please investigate these sessions if they are causing any performance issues.

Security considerations

The monitoring scripts incorporate several security best practices:

No command-line credentials: All connection parameters are collected via interactive prompts, preventing password exposure in ps output, shell history, or process listings.
TLS/SSL encryption: All database connections are encrypted by default across all four engines.
IAM-based alerting: Alerts use Amazon SNS with IAM authentication, so no email credentials are stored in the scripts.
Password: Credentials are collected via silent interactive prompts (read -s), preventing exposure in shell history or process listings. For automated scheduling, the scripts support --secret-id to retrieve credentials directly from AWS Secrets Manager, eliminating plaintext passwords entirely.
SQL ID only: For security, scripts report only SQL_ID/query_id references, not actual SQL text. Use these identifiers to look up queries in database monitoring views if needed.
Temporary file cleanup: All temporary SQL files and stderr capture files are cleaned up after use.

Conclusion

Long-running transactions can increase your CDC source latency from seconds to hours and cause replication delays that grow until the transaction commits. The monitoring scripts provided for Oracle, PostgreSQL, MySQL, and SQL Server offer a proactive solution to identify these problematic transactions before they become critical issues. As demonstrated in our Oracle example, these tools effectively detect and alert administrators about long-running sessions, enabling timely intervention and optimization.

The scripts enforce TLS/SSL encryption on all database connections, use Amazon SNS for scalable alert distribution, and include false-positive filtering to make sure alerts are actionable. By implementing this monitoring suite, you transform from reactive troubleshooting to proactive management. This post helps you identify potential latency issues before they impact database operations or AWS DMS replication, ensuring smooth, efficient database operations across all your platforms.

About the authors

User authentication and session management with Amazon Aurora DSQL

Chaitanya chary Chatlapally — Tue, 30 Jun 2026 16:28:38 +0000

User authentication and session management are foundational to most backend applications. They demand strong data consistency, low-latency reads, high availability, and secure credential handling. When you build these services on traditional relational databases, you often need to provision instances, configure read replicas, schedule maintenance windows, manage database credentials, and plan capacity for unpredictable login surges. Authentication flows can also be sensitive to replication lag: a user registers and immediately tries to log in, but the account isn’t available yet because the write hasn’t propagated.

Amazon Aurora DSQL, a serverless, PostgreSQL-compatible distributed SQL database, addresses these challenges with strong read-after-write consistency, automatic scaling, and no infrastructure to manage. It also uses AWS Identity and Access Management (AWS IAM)-based authentication for database connections, which streamlines authentication and eliminates security risks associated with traditional user-generated passwords.

In this post, you learn how to design and implement a user authentication service with session management on Amazon Aurora DSQL. You see the full request flow from client to database and back, explore the design considerations specific to Amazon Aurora DSQL, and discover practical lessons from building and testing against a live cluster.

Architecture overview

The following diagram illustrates the architecture of the authentication service.

Architecture diagram of the authentication service on AWS. A client sends HTTPS requests to Amazon ECS Express Mode running in the AWS Cloud. Amazon ECS connects to Amazon Aurora DSQL through an IAM-authenticated connection. AWS IAM provides the task role to Amazon ECS and validates the token presented to Amazon Aurora DSQL.

The service runs on Amazon Elastic Container Service (Amazon ECS) using Express Mode deployment. Express Mode is a deployment mode for Amazon ECS that provisions a production-ready AWS Fargate service with automatic scaling, built-in load balancing, and networking through simplified APIs. Amazon ECS Express Mode works well with Amazon Aurora DSQL because both are serverless: you don’t manage instances for either.

The architecture uses three AWS services, each chosen for a specific reason:

Amazon ECS Express Mode handles the compute layer. It deploys your containerized Node.js application on AWS Fargate with automatic scaling, built-in load balancing, and zero server management.
Amazon Aurora DSQL handles the data layer. It stores user credentials and session records with strong read-after-write consistency, scales automatically, and authenticates connections through IAM rather than database passwords.
AWS IAM handles the security layer. It controls which compute resources can connect to the database using short-lived tokens. The Amazon Aurora DSQL connector generates these tokens automatically from the Amazon ECS task role, so no credentials are stored in code or configuration.

Inside the AWS Fargate task, the Express.js application is structured in layers. Routes handle HTTP requests, middleware validates input and authenticates sessions, services contain business logic (password hashing, token generation), and repositories interact with Amazon Aurora DSQL through parameterized queries. All write paths go through an Optimistic Concurrency Control (OCC) retry wrapper, and all connections go through the Amazon Aurora DSQL connector, which handles IAM token generation and refresh.

The full request flow:

Client sends an HTTPS request (register, login, or authenticated API call).
Amazon ECS Express Mode routes the request through its load balancer to the Express.js application running on AWS Fargate.
Application validates the request, processes business logic (password hashing, session token generation).
Amazon Aurora DSQL stores and retrieves data with strong consistency via a connection authenticated through IAM.
Response returns to the client immediately, with no replication lag or eventual consistency delay.

How Amazon Aurora DSQL simplifies authentication workloads

With Amazon Aurora DSQL, you get several capabilities that simplify building an authentication service:

Zero database infrastructure

There is no need to size instances, provision storage volumes, or configure read replicas. To get started, create a cluster in the Amazon Aurora DSQL console, get an endpoint, and start writing SQL. The database scales automatically based on your workload. For authentication services that start small and grow unpredictably, this removes capacity planning.

Strong consistency by default

When a user registers, their account is immediately available for login. A new session is valid for the very next request. And a revoked session is rejected right away. There’s no eventual consistency window and no replication lag between writes and reads. For authentication flows where timing matters, this is critical. Amazon Aurora DSQL provides this by default with no configuration needed. For more information, see What is Amazon Aurora DSQL?

PostgreSQL compatibility

You can use familiar tools like psql, popular drivers like node-postgres, and existing SQL knowledge with Amazon Aurora DSQL, which supports the PostgreSQL wire protocol. You use familiar SQL. There isn’t a new query language to learn. The official @aws/aurora-dsql-node-postgres-connector extends node-postgres with automatic IAM token handling, so connecting takes minimal setup.

IAM-based authentication

With Amazon Aurora DSQL, you use short-lived IAM authentication tokens instead of traditional database passwords. There aren’t database credentials to store or rotate. The Amazon Aurora DSQL connector generates and refreshes tokens automatically. When running on Amazon ECS Express Mode with a task role, your application connects to the database without secrets in environment variables or configuration files.

Here’s how the IAM chain works in this architecture:

First, create an IAM role with dsql:DbConnect permission on your Amazon Aurora DSQL cluster.
Assign that role to the Amazon ECS Express Mode service as its task role.
At runtime, the Amazon Aurora DSQL connector automatically uses the Amazon ECS task role credentials to generate short-lived auth tokens.
Amazon Aurora DSQL validates the token against IAM before accepting the connection.

The IAM role handles access control for both the compute layer (Amazon ECS Express Mode) and the database layer (Amazon Aurora DSQL), so there’s no database password anywhere in this flow. See Production deployment for the full configuration, including the custom database role the runtime connects as.

No maintenance windows

Schema changes run asynchronously in the background. CREATE INDEX ASYNC builds indexes without blocking reads or writes. There’s no need to schedule downtime for database maintenance. For an authentication service that needs to be highly available, this is a significant operational benefit.

Designing the data model

The authentication service needs two tables: users and sessions. The distributed architecture of Amazon Aurora DSQL introduces a few design considerations that differ from standard PostgreSQL.

Users table

CREATE TABLE IF NOT EXISTS users (
  id UUID PRIMARY KEY,
  email VARCHAR(254) NOT NULL UNIQUE,
  password_hash VARCHAR(72) NOT NULL,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

Sessions table

CREATE TABLE IF NOT EXISTS sessions (
  id UUID PRIMARY KEY,
  user_id UUID NOT NULL,
  token_hash VARCHAR(64) NOT NULL UNIQUE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
  expires_at TIMESTAMPTZ NOT NULL,
  revoked_at TIMESTAMPTZ,
  client_metadata TEXT
);

The PRIMARY KEY and UNIQUE constraints in the two preceding CREATE TABLE statements each create a backing index automatically, for example, sessions_token_hash_key for token_hash. These inline indexes are part of CREATE TABLE itself and are immediately valid. They do not go through the async job manager described later. The async behavior described in the No maintenance windows section applies only to standalone CREATE INDEX ASYNC statements, including the case where you add a UNIQUE constraint to an existing populated table later via ALTER TABLE … ADD CONSTRAINT.

Design considerations for Amazon Aurora DSQL

Universally Unique Identifiers (UUIDs) as primary keys. This implementation uses UUIDs generated in the application layer using crypto.randomUUID(). Random UUIDs distribute writes evenly across the storage partitions in Amazon Aurora DSQL, which helps performance at scale. Amazon Aurora DSQL also supports sequences and identity columns if you prefer monotonically increasing keys. For more information, see Working with primary keys in Amazon Aurora DSQL.

Application-level referential integrity. Amazon Aurora DSQL doesn’t support foreign keys. You enforce the relationship between sessions.user_id and users.id in application code within the same transaction. This is a common pattern in distributed databases and gives you full control over error messages and behavior.

Token hashing. The token_hash column stores a SHA-256 hash of the session token. Your application returns the plaintext token to the client exactly once at creation time and doesn’t persist it. Even if the database is subject to unintended access, the stored hashes provide a layer of protection against impersonation.

Indexes

Amazon Aurora DSQL creates secondary indexes asynchronously, which means they build in the background without blocking reads or writes:

CREATE INDEX ASYNC IF NOT EXISTS idx_sessions_user_id ON sessions (user_id);

The sessions.token_hash column already has a UNIQUE constraint declared in the preceding CREATE TABLE statement, which produces a backing unique index named sessions_token_hash_key. Adding an explicit CREATE INDEX ASYNC on the same column would create a second index, doubling the write cost on every INSERT INTO sessions and consuming an entry from the table’s index quota for no benefit. Use the constraint and skip the explicit index.

Inline indexes from UNIQUE and PRIMARY KEY constraints are created as part of CREATE TABLE itself and are immediately valid. They don’t go through the async job manager. Only standalone CREATE INDEX ASYNC statements do. This matters when you add a UNIQUE constraint to an existing populated table later (ALTER TABLE … ADD CONSTRAINT), where the latency profile is different from CREATE INDEX ASYNC.

Database migrations

Each Data Definition Language (DDL) statement runs in its own transaction. This is an Amazon Aurora DSQL requirement, and it means your migration code runs each CREATE TABLE and CREATE INDEX separately. For more information, see DDL and distributed transactions in Amazon Aurora DSQL.

for (const sql of DDL_STATEMENTS) {
  const client = await pool.connect();
  try {
    await client.query('BEGIN');
    await client.query(sql);
    await client.query('COMMIT');
  } catch (error) {
    await client.query('ROLLBACK');
    throw error;
  } finally {
    client.release();
  }
}

CREATE INDEX ASYNC returns a job_id synchronously while the index continues building in the background. If your migration script finishes and the application starts serving traffic before the index becomes valid, the affected lookups (sessions.user_id here) fall back to a sequential scan for a few seconds to a few minutes. To make migrations deterministic (useful in continuous integration (CI) pipelines or when migrating large existing tables), wait for the job to finish before returning:

// CREATE INDEX ASYNC returns a row containing the job_id of the build job.
const result = await client.query(
  'CREATE INDEX ASYNC IF NOT EXISTS idx_sessions_user_id ON sessions (user_id)',
);
const jobId = result.rows[0]?.job_id;

if (jobId) {
  // Block until the async index is VALID before continuing.
  await client.query('SELECT sys.wait_for_job($1)', [jobId]);
}

See Asynchronous indexes for the sys.jobs and sys.wait_for_job(job_id) reference.

Connecting to Amazon Aurora DSQL

The official Amazon Aurora DSQL connector simplifies connecting. It handles IAM token generation and refresh automatically, extracts the AWS Region from the cluster hostname, and configures the PostgreSQL driver with verify-full Transport Layer Security (TLS) by default, so the application validates the server certificate against the public Amazon root certificate authority (CA) without any extra setup.

import { AuroraDSQLPool } from '@aws/aurora-dsql-node-postgres-connector';

const pool = new AuroraDSQLPool({
  host: process.env.DSQL_ENDPOINT,
  user: 'app_runtime', // non-admin database role; see Production deployment
  database: 'postgres',
  max: 10,
  idleTimeoutMillis: 300_000,
  maxLifetimeSeconds: 3300, // recycle each connection just before the 1-hour DSQL connection cap
});

That’s the entire connection setup. No password, no TLS certificate management, no credential rotation logic. The connector handles it. Amazon Aurora DSQL distinguishes admin and non-admin connections through separate IAM actions (dsql:DbConnectAdmin and dsql:DbConnect). The preceding example connects as app_runtime, the non-admin runtime role you create in the Production deployment section. Use admin only for one-off setup steps such as creating tables, creating the runtime role, and granting privileges. Amazon Aurora DSQL caps any single connection at one hour. Setting maxLifetimeSeconds to 3,300 (55 minutes) lets the pool retire connections proactively, so a request never lands on a connection that the cluster is about to close. See Amazon Aurora DSQL Connector for node-postgres for the full connection options.

Full authentication flow

This section walks through each stage of the authentication flow, from registration to session revocation, with the application code for each step.

Registration

A user registers with an email and password. Your application hashes the password with bcrypt in the application layer (Amazon Aurora DSQL doesn’t support the pgcrypto extension), generates a UUID, and inserts the user record:

const passwordHash = await bcrypt.hash(password, 10);
const id = crypto.randomUUID();

await client.query(
  `INSERT INTO users (id, email, password_hash, created_at)
   VALUES ($1, $2, $3, NOW())`,
  [id, email, passwordHash],
);

With strong consistency from Amazon Aurora DSQL, the user is immediately available for login after registration. The data is available immediately. There’s no replication delay.

Your application looks up the user by email, verifies the password, generates a cryptographically secure session token, and stores its SHA-256 hash:

const token = crypto.randomBytes(32).toString('hex');
const tokenHash = crypto.createHash('sha256').update(token).digest('hex');

await client.query(
  `INSERT INTO sessions (id, user_id, token_hash, created_at, expires_at)
   VALUES ($1, $2, $3, NOW(), $4)`,
  [sessionId, userId, tokenHash, expiresAt],
);

Your application returns the plaintext token to the client once. Your application stores only the hash.

Session validation

On each authenticated request, your application hashes the incoming token and looks it up:

const tokenHash = crypto.createHash('sha256').update(incomingToken).digest('hex');
const session = await client.query(
  `SELECT user_id, expires_at, revoked_at FROM sessions WHERE token_hash = $1`,
  [tokenHash],
);

The strong consistency feature of Amazon Aurora DSQL means a session created during login is immediately available for validation on the very next request.

Session revocation

Revoking a session sets a revoked_at timestamp. Subsequent validation checks reject it:

UPDATE sessions SET revoked_at = NOW() WHERE id = $1 AND revoked_at IS NULL

For bulk revocation (“log out everywhere”), your application batches updates to stay within the 3,000-row transaction limit in Amazon Aurora DSQL. A naive UPDATE sessions SET revoked_at = NOW() WHERE user_id = $1 works fine for typical users with a handful of sessions, but a power user with thousands of active sessions hits the 3,000-row cap and the whole transaction aborts. Filter inside a LIMIT and loop until you stop revoking rows. The revoked_at IS NULL predicate makes each iteration naturally idempotent and resumable, so a partial completion picks up where it left off:

const BATCH_SIZE = 3000;

while (true) {
  const result = await client.query(
    `UPDATE sessions
        SET revoked_at = NOW()
      WHERE id IN (
        SELECT id FROM sessions
         WHERE user_id = $1
           AND revoked_at IS NULL
         LIMIT $2
      )`,
    [userId, BATCH_SIZE],
  );

  if ((result.rowCount ?? 0) < BATCH_SIZE) break;
}

Handling concurrent writes without locks

You get high throughput and no deadlocks with Amazon Aurora DSQL, which uses Optimistic Concurrency Control instead of traditional locking. OCC lets transactions proceed without acquiring locks, and Amazon Aurora DSQL detects conflicts at commit time.

When a conflict occurs, the database returns a serialization error (SQLSTATE 40001). Amazon Aurora DSQL distinguishes two flavors under that SQLSTATE: OC000 for data conflicts (two transactions wrote the same row) and OC001 for schema conflicts (a concurrent DDL change, for example an async index becoming valid mid-transaction). For application code the handling is identical (both are transient and both should be retried), and the connector’s built-in retry helper covers both.

Rather than write your own retry loop, use the OCC retry helper that ships with the @aws/aurora-dsql-node-postgres-connector. AuroraDSQLPool.transaction(callback) runs the callback inside a transaction and retries on serialization errors with exponential backoff and jitter. The callback must be idempotent, it may be invoked multiple times if a conflict is detected at commit time.

import { AuroraDSQLPool } from '@aws/aurora-dsql-node-postgres-connector';

const pool = new AuroraDSQLPool({ /* …config… */ });

// pool.transaction wraps the callback in BEGIN/COMMIT and retries on
// SQLSTATE 40001 with exponential backoff. No custom retry plumbing needed.
await pool.transaction(async (client) => {
  await client.query(
    'UPDATE sessions SET revoked_at = NOW() WHERE id = $1',
    [sessionId],
  );
});

Under the hood the helper does what you’d expect, catches 40001, sleeps with exponential backoff, and re-runs the callback up to a configurable maximum:

for (let attempt = 0; attempt <= maxRetries; attempt++) {
  try {
    return await operation();
  } catch (error) {
    if (!isSerializationError(error)) throw error;
    if (attempt < maxRetries) {
      await sleep(baseDelayMs * Math.pow(2, attempt) + Math.random() * baseDelayMs);
    }
  }
}

For authentication workloads, OC000 data conflicts are rare because different users write to different rows. OC001 schema conflicts can fire transiently across a multi-task ECS deployment whenever a background async index finishes promoting. In both cases the retry wrapper provides a fallback, not a frequent code path.

Security considerations

The service implements several security best practices:

User enumeration prevention. Login returns the same generic error for both “email not found” and “wrong password.”
Token hashing. The service stores session tokens as SHA-256 hashes. The stored hashes don’t reveal usable tokens in an unintended access event.
bcrypt with unique salts. Each password hash uses a unique random salt.
No internal error details. The API returns generic error messages for unhandled exceptions.
IAM database authentication. No database passwords in configuration files or environment variables.

Try it yourself

You can deploy a working proof-of-concept implementing everything described in this post. The source code is available on GitHub.

API endpoints

Method	Endpoint	Description
POST	/api/auth/register	Register a new user
POST	/api/auth/login	Authenticate and receive a session token
GET	/api/auth/me	Retrieve the authenticated user’s profile
GET	/api/sessions	List active sessions
DELETE	/api/sessions/:sessionId	Revoke a specific session
DELETE	/api/sessions	Revoke sessions (optionally exclude current)

Prerequisites

An AWS account with the AmazonAuroraDSQLConsoleFullAccess managed policy.
Node.js 20+ and npm.
AWS Command Line Interface (AWS CLI) configured with valid credentials.

Setting up

Create an Amazon Aurora DSQL cluster following the Getting started with Amazon Aurora DSQL guide, then:

git clone https://github.com/aws-samples/aurora-dsql-samples.git
cd aurora-dsql-samples/sample-amazon-aurora-dsql-auth-session-mgmt
npm install

export DSQL_ENDPOINT="your-cluster-id.dsql.us-east-1.on.aws"
npm run build
npm start

You don’t need to set AWS_REGION separately for the connector, the Amazon Aurora DSQL connector parses the Region out of the cluster hostname (the us-east-1 in the preceding example). If you do set AWS_REGION or pass an explicit region field to the pool, that value takes precedence.

Testing

curl -s -X POST http://localhost:3000/api/auth/register \
  -H "Content-Type: application/json" \
  -d '{"email": "alice@example.com", "password": "secureP@ss1"}' | jq

curl -s -X POST http://localhost:3000/api/auth/login \
  -H "Content-Type: application/json" \
  -d '{"email": "alice@example.com", "password": "secureP@ss1"}' | jq

Retrieve the authenticated user’s profile (use the token from the login response):

curl -s http://localhost:3000/api/auth/me \
  -H "Authorization: Bearer <token>" | jq

List active sessions:

curl -s http://localhost:3000/api/sessions \
  -H "Authorization: Bearer <token>" | jq

Revoke a session (use the session ID from the list response), then verify the token is immediately rejected. This demonstrates the strong consistency of Amazon Aurora DSQL: the revocation is visible on the very next request with no replication delay.

curl -s -X DELETE http://localhost:3000/api/sessions/<session-id> \
  -H "Authorization: Bearer <token>" | jq

curl -s http://localhost:3000/api/auth/me \
  -H "Authorization: Bearer <token>" | jq
# Returns: 401 Invalid session

You can verify the data in the Amazon Aurora DSQL Query Editor:

SELECT id, email, created_at FROM users;
SELECT id, user_id, created_at, expires_at, revoked_at FROM sessions;

Production deployment

For production, deploy the Express.js application on Amazon ECS Express Mode and connect to Amazon Aurora DSQL as a dedicated, least-privilege database role, not the cluster’s admin role. The admin role is unmodifiable, exists for cluster bootstrapping (creating other roles, granting AWS IAM access, schema management), and gives the runtime far more authority than an authentication service needs. The Amazon Aurora DSQL Database roles and IAM authentication docs recommend a custom database role for any production application.

Create a runtime database role once and grant it only the privileges this service uses on users and sessions. Run these statements as admin from a one-off setup script, they are not part of the application’s regular DDL migration:

-- Create a login role for the application runtime.
CREATE ROLE app_runtime WITH LOGIN;

-- Map an IAM principal (the ECS task role's ARN) to that database role.
AWS IAM GRANT app_runtime TO 'arn:aws:iam::111122223333:role/auth-service-task-role';

-- Grant only the privileges the runtime needs. DELETE on sessions lets
-- the same role run the housekeeping job (next section) that purges
-- expired and long-revoked rows. If you prefer stricter separation,
-- split into two roles: app_runtime with SELECT/INSERT/UPDATE only,
-- and app_housekeeping with SELECT/DELETE only, then run housekeeping
-- under separate credentials.
GRANT SELECT, INSERT, UPDATE ON users     TO app_runtime;
GRANT SELECT, INSERT, UPDATE, DELETE ON sessions TO app_runtime;

Run these statements after the application has run its DDL migrations once (so users and sessions exist), since GRANT … ON <table> requires the table to exist.

Then connect the application as app_runtime instead of admin, and attach a task-role IAM policy that grants only dsql:DbConnect (not dsql:DbConnectAdmin):

const pool = new AuroraDSQLPool({
  host: process.env.DSQL_ENDPOINT,
  user: 'app_runtime', // custom least-privilege role, not admin
  database: 'postgres',
  max: 10,
  idleTimeoutMillis: 300_000,
  maxLifetimeSeconds: 3300,
});

Continue using admin (and dsql:DbConnectAdmin) for the one-off setup steps in this post, creating tables, creating the app_runtime role, granting privileges. The runtime path stays narrow.

To roll the role back, revoke its table privileges and the IAM mapping before dropping it. Otherwise DROP ROLE fails with 2BP01 cannot be dropped because some objects depend on it. The table grants and the IAM mapping are independent dependencies. Both must be removed:

REVOKE ALL ON users    FROM app_runtime;
REVOKE ALL ON sessions FROM app_runtime;
AWS IAM REVOKE app_runtime FROM 'arn:aws:iam::111122223333:role/auth-service-task-role';
DROP ROLE app_runtime;

Housekeeping for revoked and expired sessions

Nothing in this post deletes expired or long-revoked rows. If you leave it that way, sessions grows unboundedly and every token_hash lookup pays for the larger index. For a production auth service you need a periodic purge job. Because Amazon Aurora DSQL caps a single transaction at 3,000 rows, the cleanup is a batched, idempotent loop:

const BATCH_SIZE = 3000;

while (true) {
  const result = await client.query(
    `DELETE FROM sessions
      WHERE id IN (
        SELECT id FROM sessions
         WHERE expires_at < NOW() - INTERVAL '30 days'
            OR (revoked_at IS NOT NULL
                AND revoked_at < NOW() - INTERVAL '30 days')
         LIMIT $1
      )`,
    [BATCH_SIZE],
  );

  if ((result.rowCount ?? 0) < BATCH_SIZE) break;
}

Run this on a schedule (a cron job, an Amazon EventBridge-triggered AWS Lambda function, or an Amazon ECS scheduled task). The 30-day grace window lets you debug auth issues against recently revoked tokens without leaking active access. Adjust the threshold to match your retention policy.

Cleaning up

Warning: These cleanup steps permanently delete all user data, sessions, and database resources. This action cannot be undone. Make sure you have backed up any data you need to retain before proceeding.

To avoid incurring charges, delete the resources you created:

Stop the application.

Drop the tables in the Amazon Aurora DSQL Query Editor:

DROP TABLE IF EXISTS sessions;
DROP TABLE IF EXISTS users;

If you deployed to Amazon ECS Express Mode for production, delete the Amazon ECS service in the Amazon ECS console to avoid ongoing charges.
In the IAM console, open the role attached to your Amazon ECS task (or local development credentials).
Detach the inline or managed policy that grants dsql:DbConnect from the role (or dsql:DbConnectAdmin if you only used the setup admin role).
Delete the role to avoid leaving orphaned permissions.
In the Amazon Aurora DSQL console, disable deletion protection for your cluster.
Delete the Amazon Aurora DSQL cluster.

For more information, see Managing Amazon Aurora DSQL clusters.

Conclusion

In this post, we showed how you can simplify building an authentication service with session management using Amazon Aurora DSQL. The strong consistency feature reduces timing issues in registration and login flows. The serverless architecture removes capacity planning and maintenance overhead. Authentication based on IAM removes database passwords from the system. And PostgreSQL compatibility means you can use familiar SQL, drivers, and tools without learning a new query language.

The design considerations specific to Amazon Aurora DSQL, including application-level referential integrity, async index creation, and OCC retry logic, are practical to implement and apply to transactional workloads on Amazon Aurora DSQL.

The full source code for this implementation is available on GitHub. To get started with Amazon Aurora DSQL, visit the Amazon Aurora DSQL overview page or refer to the user guide.

About the authors

Announcing Valkey 9.1 for Amazon ElastiCache

Mas Kubo — Mon, 29 Jun 2026 21:10:32 +0000

Amazon ElastiCache now supports Valkey 9.1, bringing the latest community-driven innovations from the Valkey open source project to customers running latency-sensitive, high-throughput, and operationally demanding in-memory workloads on ElastiCache. In this post, we discuss how Valkey 9.1 helps you get more throughput and memory efficiency from demanding workloads while providing stronger isolation for multi-tenant and shared-cluster deployments. We also cover new commands that simplify common application and operational workflows, new observability features that give operators better visibility into engine behavior, and how ElastiCache continues to deliver the latest Valkey open source innovations in a fully managed service.

The open source Valkey community developed Valkey 9.1, with contributions focused on security, observability, performance, efficiency, and tooling. For deeper technical details on the upstream release, see the Valkey 9.1 community announcement.

Improve price-performance at scale

The Valkey project’s focus on price-performance is critical for customers operating large-scale ElastiCache deployments, where even modest improvements in throughput, latency, or memory efficiency can translate into meaningful savings. Samsung Electronics, for example, achieved about 30% infrastructure cost savings by upgrading to ElastiCache for Valkey, while maintaining the performance, reliability, and developer experience required for its global services. Valkey 9.1 continues this focus with enhancements that help customers process more requests per node, store more data in the same footprint, and operate more predictably as workloads scale.

For throughput-bound workloads, higher per-node performance can help customers absorb traffic growth on existing infrastructure, delay scaling events, or reduce the number of nodes required to serve the same request volume. Valkey 9.1 includes a redesigned I/O threading communication model that improves throughput across a variety of workloads. In upstream Valkey benchmark testing, Valkey 9.1 achieved up to 2.1 million requests per second on a single server using 512-byte payloads, nine I/O threads, and a pipeline depth of 10 commands. You can explore the full results and compare across versions on the Valkey Performance Dashboards. It also includes I/O threading enhancements that improved throughput by up to 17% in upstream tested workloads.

Valkey 9.1 also improves performance for common high-throughput access patterns. These include faster stream range reads with XRANGE and XREVRANGE, higher-throughput string reads for caching workloads, faster sorted set queries for leaderboards and schedulers, and cached COMMAND responses that reduce client initialization overhead.

For memory-bound workloads, better memory efficiency helps customers store more data on the same node size. It can also reduce memory pressure without requiring changes to application behavior. Valkey 9.1 reduces memory usage for STRINGS under 128 bytes by up to 20% and sorted set memory usage by up to 10%. These savings are especially valuable for workloads that store large numbers of small cached values, rankings, schedules, or rate-limit state.

Valkey 9.1 also improves internal rehashing behavior to reduce latency impact during keyspace growth. It additionally pauses unnecessary hash table resizing during bulk delete operations such as SREM, ZREM, and HDEL. Together, these enhancements help customers improve throughput, reduce memory pressure, and increase operational predictability while getting more value from their ElastiCache infrastructure. For businesses, this can mean lower infrastructure costs, more headroom for traffic growth, and greater confidence during periods of peak demand. For end users, it helps deliver more responsive and reliable applications, whether they are loading personalized experiences, viewing real-time rankings, processing events, or interacting with latency-sensitive AI-powered features.

Fine-grained access control for multi-tenant workloads

Teams running multi-tenant or shared-cluster workloads need ways to isolate applications, tenants, and environments without sacrificing cluster mode’s scalability and availability. MoEngage, for example, uses ElastiCache for Valkey to support a customer engagement system serving more than 1,350 global brands and delivering billions of personalized messages daily. Valkey 9.0 addressed this by introducing numbered databases in cluster mode, combining logical separation with horizontal scaling across shards. This lets teams separate tenants or environments within a single cluster while retaining full cluster mode benefits.

As an example, we can create a user and limit their access to databases 0 and 1:

ACL SETUSER app-user on >secretpass +@all ~* db=0,1

After authenticating, that user can interact with database 0:

SELECT 0
OK

SET mykey "hello"
OK

But not database 2:

SELECT 2
(error) NOPERM No permissions to access database

Valkey 9.1 builds on that foundation with database-level access control lists, so you can scope user permissions to specific databases. Previously, access control rules could limit which commands users could run and which keys they could access, but those permissions applied broadly across databases. With Valkey 9.1, you can grant users access only to the databases they need. Together, numbered databases in cluster mode and database-level access controls help expand the range of workloads that can be consolidated onto shared clusters while maintaining stronger isolation and governance.

Simplify common workflows with new commands

As applications scale, teams often rely on multi-step client logic to implement common workflows such as consuming temporary state, setting multiple expiring keys, or scanning data across a cluster. These patterns can add complexity, increase network round trips, and make application code harder to maintain.

Valkey 9.1 introduces new commands that simplify these workflows. HGETDEL atomically retrieves and deletes one or more fields from a hash, which helps you build workflows that consume temporary state, one-time tokens, or queue-like data exactly once. For example, a worker can retrieve and remove job state in a single command while leaving other job metadata in place.

HSET job:42 status "pending" payload '{"action":"send_email"}' retries "3"
HGETDEL job:42 FIELDS 2 status payload

MSETEX sets multiple keys with a shared expiration time in a single command. This simplifies patterns such as writing session keys, cache fragments, rate-limit buckets, or other temporary application state with a consistent TTL.

MSETEX 3 session:abc "user:1" session:def "user:2" session:ghi "user:3" EX 3600

Valkey 9.1 also introduces CLUSTERSCAN, which provides a unified way to scan keys across a cluster. Previously, clients needed to scan each node independently and merge results themselves. CLUSTERSCAN simplifies cluster-aware operational tooling, debugging, inventory jobs, and migration utilities by providing a cluster-wide interface for iterating over keys.

CLUSTERSCAN 0 MATCH "session:*"

Together, these commands help developers and operators express common workflows more directly, reduce client-side complexity, and build applications and tools that work more naturally with large cluster-mode enabled deployments.

Improved observability for large deployments

Valkey 9.1 also adds new visibility improvements that help customers operate large deployments more effectively. New main-thread and I/O-thread usage metrics provide a clearer view into how busy the engine is, helping operators distinguish between actual workload pressure and expected thread behavior.

The release also adds JSON-formatted server logs, making it straightforward to ingest, search, and analyze engine logs in observability platforms without custom parsing logic.

Together, these enhancements help teams tune clusters, investigate performance issues, and manage large ElastiCache deployments with greater confidence.

Conclusion

Valkey 9.1 for Amazon ElastiCache continues the evolution of Valkey from a high-performance cache into a broader foundation for real-time, shared, and operationally demanding applications. For customers running node-based clusters, this release helps improve infrastructure efficiency, strengthen workload isolation, reduce application and tooling complexity, and operate large deployments with greater confidence.

To get started, you can:

Create your first Valkey 9.1 cache: See the ElastiCache Getting Started tutorial to launch a new cache.
Upgrade an existing cluster: Follow the engine version upgrade documentation to upgrade an existing ElastiCache cluster running Valkey or Redis OSS.
Move existing self-hosted workloads to ElastiCache: Use Online Migration for Valkey or Redis OSS to migrate data from self-hosted open-source Valkey or Redis OSS on Amazon EC2 to Amazon ElastiCache.
Explore the open source release: Read the Valkey 9.1 community announcement for deeper technical details.

Valkey 9.1 for Amazon ElastiCache is available at no additional cost in supported AWS Regions.

About the author

Building Financial Hierarchies with Amazon Neptune for Treasury Operations

Steven Li — Mon, 29 Jun 2026 17:22:32 +0000

In this post, we show how Amazon’s Finance Technology (FinTech) team uses Amazon Neptune to model complex corporate treasury structures as a property graph. These structures include the legal entity relationships, intercompany agreements, and bank account associations that govern payment routing and cash management. We walk through an iterative graph modeling approach, Gremlin (the graph traversal language supported by Neptune) query patterns for real-time payment routing, and a Primary/State node versioning pattern (where primary nodes hold current state and state nodes preserve history).

Large enterprises often manage hundreds of legal entities across dozens of countries, each with its own bank account, intercompany agreements, and payment rules. When a central treasury operation needs to determine which bank account to use for a payment, or trace the accounting path between two entities, the answer depends on traversing a hierarchy of relationships. These traversals sometimes span multiple hops with different rules at each level.

At Amazon, FinTech faced exactly this challenge. Amazon’s centralized treasury structure program manages hundreds of legal entities across multiple jurisdictions with dozens of banking partners and needed to resolve payment routing and accounting paths in real time. The relationships between these entities are not simple parent-child trees: they involve directional payment flows, bidirectional cash pooling, and eligibility rules that vary by entity and transaction type.

The business challenge

In a centralized treasury structure, the entity hierarchy connects central entities to member entities. A central entity can make payments or pool cash on behalf of its members. Each legal entity uses a company code (a short alphanumeric identifier), and intercompany cash agreements govern the relationships between them.

This creates several use cases that require real-time hierarchy traversal:

POBO routing: The service determines the correct multi-hop accounting path for each transaction. For example, if CORP-B pays on behalf of SUB-1, the accounting path traverses SUB-1 → CORP-A → CORP-B, crossing treasury center boundaries.
Cash pooling/sweep: Member entities sweep excess cash to their central entity’s bank accounts. Sweeps can flow in both directions and across multiple tiers. For example, entity SUB-4 can sweep directly to its regional central entity CORP-C, which in turn sweeps to global central entity CORP-B. Each sweep is a single bank transaction rather than a route through every intermediate level, which reduces redundant cash movements and transaction costs across the entity structure.
Bank account eligibility: Determine which bank accounts are available for a given entity, filtered by currency and payment method. This includes the entity’s own accounts, treasury center central entity accounts accessible through the hierarchy, and business continuity plan (BCP) backup accounts. Each treasury center central entity bank account carries its own set of supported payment methods (cross-border, WIRE, EFT, RTP). You apply this information to decide whether a treasury center POBO payment is possible or whether to fall back to a different payment instrument held by the entity directly.
Intercompany balance tracking and accounting: When bank statements arrive, the service matches the originating and destination bank accounts for each cash movement. The correct treasury center path between these accounts drives intercompany balance tracking across multi-tier relationships and generates the corresponding accounting entries. For example, a sweep from entity SUB-4’s bank account to CORP-B’s EUR account must identify the path SUB-4 → CORP-C → CORP-B, recording an intercompany balance at each hop along with the associated agreementId values. This path resolution drives both the intercompany balance position and the journal entries for each leg of the movement.

Payment paths and accounting paths can differ for the same transaction. For example, when CORP-B pays on behalf of member entity SUB-4:

Payment path: SUB-4 → CORP-B (direct, with eligible bank accounts from both SUB-4 and CORP-B).
Accounting path: SUB-4 → CORP-C → CORP-B (traverses through intermediate entity CORP-C for proper intercompany accounting).

The same graph must handle both paths, so a single data model must support multiple traversal strategies over the same nodes and edges. These access patterns, multi-hop, directional, filtered by relationship type, are the characteristics that make graph databases well-suited for this type of problem. Additionally, an entity might have an intercompany agreement in place but might not be POBO-eligible. For example, the entity’s country might have POBO restrictions. In this case, the entity can participate in cash sweeps but not POBO payment routing.

The data store must support each of these access patterns at high throughput, particularly bank account eligibility, which handles the highest call volume. It must also remain flexible enough to add new hierarchy types without refactoring the existing model.

Choosing a graph database for hierarchy traversal

To reduce payment routing latency and simplify hierarchy traversal logic, we chose a data store built for graph workloads. In Neptune, a single Gremlin query walks the edge chain from member entity to central entity, filters by label and property at each hop, and returns the full path. The following diagram illustrates the graph structure for a treasury center hierarchy, showing how company code nodes connect through labeled edges.

Multi-hop hierarchy traversals have specific requirements:

Multi-hop paths span multiple levels. Finding the path from SUB-4 to CORP-B (SUB-4 → CORP-C → CORP-B) involves traversing at least two levels, and each additional level adds depth to the traversal.
Nodes and edges carry distinct attributes, and they must be modeled in a way that scales as the graph grows.
Traversals must return node metadata along with the path, not only the connected node identifiers.

The Gremlin examples in this post run on any Neptune cluster with a Neptune notebook attached. For setup instructions, see Getting started with Neptune. You can also use Graph Explorer, an open-source low-code visual exploration tool for querying and visualizing graph connections without writing code.

Solution architecture

The following diagram illustrates the high-level architecture of the hierarchy service.

This solution includes the following components:

We expose the graph resolution service through Amazon API Gateway with AWS Identity and Access Management (IAM) authentication, accepting authenticated API calls from internal clients such as payment planning and intercompany accounting services.
Traffic routes to the graph resolution service through AWS PrivateLink and Elastic Load Balancing. A Network Load Balancer serves as the PrivateLink endpoint target (required for VPC endpoint connectivity), and an Application Load Balancer handles HTTP-level routing to the underlying Amazon Elastic Container Service (Amazon ECS) and AWS Fargate services. The Application Load Balancer routes hierarchy traversal requests and key-based search to separate microservices based on request path. Each microservice runs within an Amazon Virtual Private Cloud (Amazon VPC) for network isolation.
The graph resolution service runs Gremlin traversals against Neptune for hierarchy queries. When API responses require full reference data (such as bank account details and payment methods), the service uses key attributes returned from Neptune to look up details in Amazon DynamoDB. This avoids the complexity of maintaining consistent writes across both data stores.
We store the hierarchy in Neptune as a property graph with labeled nodes (company code or bank account) and labeled directed edges representing relationship types. We keep only the key attributes needed for traversal and filtering (such as isPOBO, isTreasuryCenterCentral, and functionalCurrency) to maintain a minimal graph.
A key-based lookup service on DynamoDB manages the foundational mappings between company codes and their bank accounts (actual ledger) as well as intercompany financial agreements (AGREEMENT ledger). DynamoDB is well-suited for these single-hop, key-value lookups that do not require traversing multiple levels of the hierarchy. For teams that prefer a single-database architecture, these mappings can also be stored as edges in Neptune. We separated them because bank account and agreement attributes change frequently, and isolating high-frequency writes in DynamoDB avoids unnecessary write pressure on the graph.
DynamoDB serves as the source of truth for full entity and bank account reference data. At runtime, the service enriches Neptune traversal results with DynamoDB lookups for detailed attributes like payment methods, bank names, and country codes.

For guidance on designing the DynamoDB table schema for entity and bank account reference data, including partition key strategy and access patterns, see the Amazon DynamoDB Developer Guide.

This architecture reduces payment routing latency and reconciliation time because Neptune resolves multi-hop paths in a single query, and DynamoDB enforces consistency through conditional writes on the source of truth. The service queries Neptune for graph relationships and traversal, and DynamoDB for reference data lookups. Neptune writes occur only when a relationship changes, such as a new member entity joining a treasury center central entity or the addition of a POBO edge. Frequent attribute updates stay in DynamoDB.

Keeping these high-frequency attribute changes in DynamoDB and limiting Neptune to structural relationship changes minimizes write volume to the graph and avoids bidirectional synchronization complexity. For the read-intensive workload, API Gateway caching reduces the number of calls reaching Neptune. Hierarchy structures change infrequently, because new agreements or entity reorganizations happen on a weekly or monthly cadence. As a result, even the default 300-second TTL absorbs the majority of repeated eligibility lookups without serving stale results. This is a reversible decision: the cache can be disabled (TTL=0), resized, or its TTL adjusted purely through API Gateway stage configuration, without requiring any code changes.

Graph modeling: An iterative approach

We started with an initial representation and progressively added complexity as we validated each use case. This iterative approach keeps each query focused on one use case, so new requirements add edges and labels without rewriting existing traversals. Graph databases enable this pattern through their flexible schema, where structure emerges alongside requirements rather than being defined upfront.

The following five steps show how our model evolved from an initial directed graph to a model with labeled edges, bidirectional relationships, node-level eligibility properties, and bank account vertices. Each step addresses a specific use case that the previous iteration could not support.

Step 1: Basic directed graph for POBO

Our first step was to define company codes as vertices and directed edges pointing from member entity to central entity:

In this model, CORP-A and CORP-B are treasury center central entities. US member entities (SUB-1, SUB-2, SUB-3) connect to US central entity CORP-A, while non-US member entities (SUB-4) connect through CORP-C to CORP-B. The cross-central entity edge from CORP-A to CORP-B supports cross-treasury-center POBO payments. This initial graph already identifies accounting paths. For CORP-B paying on behalf of SUB-1, the traversal follows: SUB-1 → CORP-A → CORP-B.

Step 2: Bank accounts as separate vertices

Next, our API responses needed bank account information. We considered two options: storing bank accounts as properties on company code vertices or modeling them as separate vertices.

Separate vertices work well here because bank accounts have their own attributes (currency, payment methods, multi-currency eligibility). The relationship between central entities and bank accounts can carry its own metadata, and this approach scales to support future bank-to-bank relationships. Additionally, a single entity can hold multiple bank accounts across different currencies and payment methods, making a separate vertex per bank account the natural graph representation for this one-to-many relationship.

Step 3: Labeled edges for transaction types

In a property graph, every edge must have a label. The next step replaced generic edges with semantically meaningful labels to distinguish transaction types: POBO for pay-on-behalf-of, SWEEP for cash pooling, and Bank Account for company-code-to-bank-account associations.

This labeling matters for query filtering. When a payment service asks for the POBO path from SUB-1 to CORP-B, the traversal follows only POBO-labeled edges. When a cash pooling service needs the sweep path, it follows SWEEP-labeled edges.

Step 4: Bidirectional edges for cash pooling

POBO is directional: payments flow from member entity to central entity. But cash pooling can flow in either direction. In a labeled property graph, any edge can be traversed in either direction using Gremlin’s both() step regardless of how it was created. We model SWEEP as separate directional edges to make the business semantics explicit. Each edge represents a distinct cash pooling agreement with its own effective dates. To support this, we added reverse SWEEP edges between central entities:

Now the graph supports both use cases:

POBO path (SUB-1 to CORP-B): SUB-1 →[POBO]→ CORP-A →[POBO]→ CORP-B.
Cash sweep (CORP-B to CORP-A): CORP-B →[SWEEP]→ CORP-A.

Step 5: Node properties for eligibility

The final challenge is handling POBO eligibility. An entity might have an intercompany agreement (represented by an edge) but might not be POBO-eligible. To address this, we added an isPOBO property to company code nodes:

Notice the distinction: edge labels imply the semantic direction of a relationship, while the physical edge construction defines its actual direction in the graph. Node properties determine the eligibility of each entity for specific operations. For example, if CORP-C has isPOBO: false, a POBO traversal skips it, but a SWEEP traversal still works because sweep eligibility does not depend on the isPOBO flag.

Final data model

The following tables summarize the node and edge definitions.

Company Code to Company Code:

Element	Label	Key Properties
Node	CompanyCode	isCentral, isPOBO, functionalCurrency
Edge	POBO	agreementId, startDate, endDate
Edge	SWEEP	agreementId, startDate, endDate

Company Code to Bank Account:

Element	Label	Key Properties
Node	BankAccount	accountCurrency, poboEnabled, sweepEnabled
Edge	Bank_Account	startDate, endDate

Place hierarchy-centric properties (agreementId, date ranges) on edges and entity-centric properties (isPOBO, functionalCurrency) on nodes. This distinction aligns with how queries filter and traverse: access patterns are relationship-centric, so edge properties drive traversal logic, while node properties drive eligibility checks.

Gremlin query patterns

With the graph model defined:

Creating the hierarchy (upsert vertices and edges).
Finding POBO and SWEEP paths (multi-hop traversals).
Retrieving eligible bank accounts (filtered by currency and payment method).

The following Gremlin queries implement the core access patterns described earlier. Gremlin is a traversal-based query language where you start at a vertex and walk through the graph step by step. Each chained method (.out(), .has(), .repeat()) adds a filter or movement to the traversal. Neptune also supports openCypher as an alternative query language over the same graph data, so teams already familiar with Cypher from Neo4j can use it interchangeably without migrating their data model.

Building the hierarchy

The following Gremlin query creates the initial treasury center hierarchy:

g.addV("CompanyCode").property(id, "CORP-A")
.property("isCentral", true)
.property("functionalCurrency", "USD")
.property("isPOBO", true).as("CORP-A")
.addV("CompanyCode").property(id, "CORP-B")
.property("isCentral", true)
.property("functionalCurrency", "non-USD")
.property("isPOBO", true).as("CORP-B")
.addV("CompanyCode").property(id, "SUB-1")
.property("isCentral", false)
.property("functionalCurrency", "USD")
.property("isPOBO", true).as("SUB-1")
.addV("BankAccount").property(id, "BANK-ACCT-1")
.property("accountCurrency", "USD")
.property("allowOtherCurrency", false).as("BANK-ACCT-1")
.addE("POBO").from("SUB-1").to("CORP-A")
.addE("SWEEP").from("SUB-1").to("CORP-A")
.addE("POBO").from("CORP-A").to("CORP-B")
.addE("SWEEP").from("CORP-B").to("CORP-A")
.addE("ACTUAL").from("CORP-A").to("BANK-ACCT-1")

When CORP-B pays on behalf of SUB-1, the accounting service needs the ordered path. This traversal follows POBO edges from source to target:

g.V("SUB-1")
.repeat(out("POBO").simplePath()) // simplePath() filters traversers whose path contains repeated objects, preventing cycles in multi-hop traversals
.until(hasId("CORP-B"))
.path()
.by(elementMap("isCentral", "functionalCurrency"))

Cash sweep path resolution

For cash pooling from SUB-4 through multiple hops to CORP-A (SUB-4 → CORP-C → CORP-B → CORP-A), the traversal follows SWEEP edges:

g.V("SUB-4")
.repeat(out("SWEEP").simplePath()) // Filters any path that revisits a previously seen vertex, required here because SWEEP edges are bidirectional
.until(hasId("CORP-A"))
.path()
.by(elementMap("isCentral", "functionalCurrency"))

This traversal follows the full sweep path across multiple hops in a single query.

Getting eligible bank accounts

The highest-volume API, bank account eligibility, is called for every payment. It retrieves eligible bank accounts for a given company code, including treasury center central entity accounts filtered by currency:

g.V("SUB-1")
.repeat(out("POBO")) // No simplePath needed: POBO edges are unidirectional
.until(has("isCentral", true))
.out("ACTUAL")
.has("accountCurrency", "EUR")
.path()
.by(elementMap("isCentral", "functionalCurrency", "accountCurrency", "poboEnabled"))

This traversal returns results in a single query. The traversal returns key attributes from Neptune. Full bank account details (bank name, payment methods, country code) are then added through a DynamoDB lookup using the bank account identifiers.

Primary/State node pattern for versioning

Financial systems require the ability to reconstruct prior states of entity relationships for audit and regulatory review. The hierarchy service uses a Primary/State node pattern in Neptune to maintain version history while supporting efficient real-time queries. This pattern applies to any use case where you need to track changes to graph entities over time.

The pattern works as follows:

Primary nodes represent the current state of each entity. They carry the live hierarchy edges (POBO, SWEEP) and the latest property values. Traversal queries operate on primary nodes.
State nodes represent historical versions. Each time a primary node is updated, a new state node is created with the previous property values, connected via a HAS_STATE edge with startDate and endDate timestamps.

UPSERT implementation with mergeV and mergeE

This pattern uses the Gremlin mergeV() and mergeE() steps, which provide native upsert semantics. The traversal first attempts to find an existing primary node. If found, it updates the properties and terminates the previous state edge. If not found, it creates a new primary node. In both cases, it creates a new state node to preserve the historical version. In this pattern, the primary node lookup uses a property-based filter (has("CompanyCode", "companyCode", "CORP-A")) rather than a vertex ID lookup, because primary nodes carry a separate isPrimaryNode flag that distinguishes them from their state node counterparts sharing the same logical identifier.

// Upsert primary node: find by custom ID, create or update properties
g.mergeV([(T.id): "CORP-A-primary"])
.option(Merge.onCreate, [
(T.label): "CompanyCode",
"companyCode": "CORP-A",
"isPrimaryNode": true,
"isPOBO": true,
"functionalCurrency": "USD"
])
.option(Merge.onMatch, [
"isPOBO": true,
"functionalCurrency": "USD"
])

// Terminate previous state edge
.outE("HAS_STATE").hasNot("endDate")
.property("endDate", "2025-01-15T00:00:00Z")

// Create new state node and connect
.mergeV([(T.id): "CORP-A-state-v2"])
.option(Merge.onCreate, [
(T.label): "CompanyCodeState",
"isPOBO": true,
"functionalCurrency": "USD"
])
.mergeE([(T.label): "HAS_STATE",
(Direction.from): "CORP-A-primary",
(Direction.to): "CORP-A-state-v2"])
.option(Merge.onCreate, [
"startDate": "2025-01-15T00:00:00Z"
])
.id()

This pattern preserves every hierarchy change as a state node, while keeping traversal queries fast by operating only on primary nodes.

Scaling for diverse relationship types

The graph model supports extensibility. We introduce new node labels and edge labels to serve entirely different hierarchy types from the same Neptune cluster while preserving existing query behavior.

The future-state graph adds three capabilities without modifying existing data:

Bank-to-bank sweep relationships: BANK_SWEEP edges connect bank accounts directly (for example, Bank5 → Bank2_USD), allowing physical cash movement tracking across treasury center central entities.
Legal entity hierarchy: LegalEntity nodes connected via EntityParent edges model the corporate ownership structure. EntityMapping edges link company codes to their legal entities.
Additional treasury center member entities: New company codes join with their own POBO/SWEEP edges and bank account associations.

Each hierarchy type uses distinct edge labels, so queries for one type never interfere with another. For example, a legal entity traversal to find the corporate parent chain for company code “SUB-5” (a specific company code identifier in the hierarchy) would look like:

g.V("SUB-5") // Company code identifier
.repeat(out("EntityMapping", "EntityParent")) // Safe: entity hierarchy is acyclic,
// always terminates at central entity
.until(has("isEntityCentral", true))
.path().by(elementMap("companyCode", "entityName"))

This query coexists with treasury center hierarchy queries in the same graph without conflict.

Visualizing the hierarchy in Graph Explorer

The following visualization shows the treasury center hierarchy rendered in Graph Explorer, Neptune’s built-in low-code tool for browsing graph data interactively. The node-edge layout illustrates how company codes connect through labeled POBO and SWEEP edges, with bank accounts attached as separate vertices. This visual representation was generated directly from the Gremlin queries described in this post, without any additional visualization code. Graph Explorer renders query results as interactive diagrams that you can explore by clicking nodes to expand their connections.

Lessons learned

Iterate on the graph model with real use cases. The model evolved through five iterations, each driven by a specific use case the previous model could not support. Starting simple and adding complexity incrementally helps avoid adding unnecessary complexity.
Separate relationship semantics from eligibility semantics. The design pattern here is using edge labels for relationship direction and type, and node properties for eligibility flags. This keeps the model clean and avoids duplicating edges for different eligibility states.
Choose node properties vs. edge properties deliberately. Place hierarchy-centric properties (agreementId, date ranges) on edges, and entity-centric properties (isPOBO, functionalCurrency) on nodes. Neptune supports Single and Set (default) cardinality for properties but not lists cardinality. Design your data model around this constraint.
Focus Neptune on relationships, not reference attributes. Neptune is best suited for the graph structure: which entities connect to which central entities, through what transaction types, and with what eligibility. Properties that do not affect the relationship itself (bank name, payment methods, country codes) stay in DynamoDB. At runtime, enrich traversal results through DynamoDB lookups for these reference attributes. The additional DynamoDB lookup adds minor latency, but it keeps the graph limited to relationships and the attributes that govern traversal.
Use Neptune notebooks and Graph Explorer for visualization. Neptune notebooks render traversal results as interactive node-edge diagrams directly in the notebook cell output. For path queries, use path().by(elementMap()) which returns id, label, and properties in a single flat map. For more visual-based exploration, Graph Explorer provides a low-code interface for browsing and querying graph data without writing Gremlin. Both tools help validate traversal logic and spot modeling errors before deploying to production.
Understand Gremlin’s processing pipeline for performance tuning. Gremlin queries go through three phases: parsing (TinkerPop steps), conversion (Neptune native steps), and optimization. Rewrite queries to eliminate steps that cannot be converted to native steps and move filter steps to the end of traversals for better conversion. The Neptune Gremlin results cache further reduces latency for repeated queries with no I/O costs and can be used if pagination is required.
Plan for extensibility from the start. By using distinct edge labels for each hierarchy type, you build a model that supports multiple coexisting hierarchies in a single graph. We added legal entity hierarchies and bank-to-bank relationships, and existing treasury center queries continued to work unchanged.

Financial industry applications

The hierarchy challenges described here are common across the financial industry. Organizations managing complex entity relationships, whether routing payments, aggregating exposures, or tracing ownership, often face a similar underlying graph problem: multi-hop, directional, rule-filtered traversal over structures that change over time. The patterns in this post apply directly.

Corporate treasury. Fortune 500 companies manage hundreds of legal entities across dozens of jurisdictions. Their treasury teams face the same POBO, cash pooling, and bank account eligibility challenges. Managing these hierarchies at scale requires a data model that supports multi-hop traversals, directional relationships, and real-time path resolution. A graph-based approach standardizes payment routing and intercompany accounting path resolution, reducing manual effort and providing real-time visibility into cash positions across the entity structure.
Regulatory reporting and stress testing. US banking regulators require institutions to aggregate exposures across legal entity hierarchies for stress testing (CCAR, DFAST) and resolution planning. Tracing exposures through complex ownership structures spanning hundreds of entities is fundamentally a graph traversal problem. The same repeat().until() patterns used for POBO path resolution apply directly. The Primary/State versioning pattern is particularly relevant here because regulators can reconstruct point-in-time views of entity structures without maintaining separate snapshot tables.
Exposure aggregation and entity relationship mapping. The Legal Entity Identifier (LEI) system maps ownership relationships between over 2.7 million registered entities globally. Today, this data lives in flat files and relational tables. Modeling it as a property graph would support real-time queries such as identifying which institutions have direct or indirect exposure to a specific entity, or tracing potential impact paths across interconnected financial networks.
Payment networks and correspondent banking. Payment routing through correspondent banking networks is a graph problem. Each payment may traverse multiple intermediary banks with different fee structures, currency capabilities, and processing windows. The labeled edge pattern used in this post to distinguish POBO from SWEEP applies directly: edges represent nostro/vostro relationships or fee corridors, while node properties determine eligibility by currency, payment type, or jurisdiction.
AML and beneficial ownership. Tracing payment flows through complex entity structures is central to anti-money laundering compliance. The same traversal patterns that trace an accounting path can trace transactions through chains of related entities. The eligibility filtering pattern maps to compliance-based screening, and the Primary/State versioning pattern delivers the historical audit trail that regulators require.

In each of these cases, the core technology patterns are the same. Use iterative graph modeling with labeled directed edges, node properties for eligibility filtering, temporal versioning for auditability, and a minimal graph paired with a key-value store for reference data enrichment. The financial hierarchy problem is universal, and the graph solution scales across it.

We built a graph resolution service, an internal microservice that resolves payment routing and accounting paths in real time, using Neptune to model these financial hierarchies as a property graph. In this post, we walk through our iterative graph modeling process, key design decisions including a Primary/State node pattern for versioning, and the Gremlin query patterns that power our treasury operations. If you manage financial hierarchies, organizational structures, or other systems with complex multi-hop relationships, these patterns can help you design an effective graph solution. With this approach, we accomplished the following:

Modeled financial hierarchies as a property graph in Neptune.
Wrote Gremlin queries that resolve multi-hop payment routing and cash pooling paths in a single query.
Implemented a Primary/State node pattern that preserves version history and supports audit trails.
Added new hierarchy types (legal entity structures, bank-to-bank relationships) without refactoring existing queries.

Conclusion

Financial hierarchies are well-suited to graph modeling. The relationships between legal entities, bank accounts, and intercompany agreements are directional, multi-hop, rule-filtered, and change over time. As these hierarchies grow, traversing them requires increasingly complex application logic that becomes brittle with each additional level of depth.

This post shared a working pattern for modeling complex treasury center financial hierarchies as a property graph in Neptune. Our iterative approach started with simple directed edges and progressively added bank account vertices, labeled edges, bidirectional relationships, and node-level eligibility properties. The result is a flexible model that supports POBO payment routing, cash pooling, bank account eligibility, and accounting path resolution, with proof-of-concept latencies that met the target access pattern requirements. A key design decision shaped the outcome: separating relationship semantics (edge labels) from eligibility semantics (node properties) kept the model clean and avoided duplicating edges for different eligibility states.

To get started, model your simplest hierarchy as a directed graph in a Neptune notebook, validate it against your real access patterns, and iterate. The Gremlin patterns and Primary/State versioning approach in this post apply to any domain with complex, multi-hop entity relationships. To discuss how Neptune can support your financial hierarchy workloads, contact your AWS account team or request a consultation with an AWS Solutions Architect. If you have questions about the patterns in this post, leave a comment below.

About the authors

How Securonix reduced cache costs by 20% with Amazon ElastiCache for Valkey

Sanjeev Kishore Yarnapati — Mon, 29 Jun 2026 17:20:38 +0000

When protecting over hundreds of global enterprises from cybersecurity threats and processing millions of security events per second, system performance isn’t just a benchmark, it’s mission critical. Securonix, a six-time Gartner Magic Quadrant Leader for Security Information and Event Management (SIEM), delivers AI-powered security analytics and operations at enterprise scale. The system combines SIEM, User and Entity Behavior Analytics (UEBA), and Security Orchestration, Automation and Response (SOAR) capabilities into a unified, cloud-native solution.

In this post, we share how Securonix migrated hundreds of Amazon ElastiCache clusters from Redis OSS to Valkey, achieving a 20% reduction in caching costs. This amounts to over $100,000 in annualized savings. The migration also improved CPU utilization and overall throughput across Securonix’s global SIEM platform, which processes hundreds of terabyte data volumes daily for enterprise security teams worldwide.

Challenges with scaling constraints and rising costs

Securonix operates a cloud-native SIEM platform that processes multi-terabyte data volumes per day relying heavily on caching to deliver the sub-millisecond response times required for threat detection and security orchestration. With hundreds of ElastiCache clusters across multiple regional deployments, cache throughput requirements were increasing an estimated 20–30% year over year. Their caching infrastructure needed to evolve to keep pace with growing demand. Under bursty, high-ingestion workloads, the team observed increased latency during peak loads, and inefficient resource utilization led to over-provisioning in some clusters to absorb peak demand. Optimizing cost and performance became a priority.

At the scale Securonix was operating, the single-threaded caching layer became a bottleneck. During peak loads, CPU utilization spiked sharply, requiring additional nodes to handle traffic, which increased costs and operational complexity.

Faced with these compounding pressures, Securonix sought a more efficient and cost-effective caching solution that could scale alongside their platform’s growth. This led them to explore Amazon ElastiCache for Valkey as a cost-effective alternative.

Solution overview

Securonix partnered with AWS to evaluate a path forward from their existing Amazon ElastiCache for Redis OSS deployment. The goal was to address their infrastructure challenges while positioning them for continued growth.

The decision to migrate to ElastiCache for Valkey was driven by several factors. As a Linux Foundation-hosted, open source fork of Redis 7.2, Valkey maintains full protocol compatibility, making it a straightforward transition from their current ElastiCache for Redis OSS environment. The managed service model they already relied on would continue to reduce operational overhead at scale, while Valkey’s multi-threaded architecture improves CPU utilization during peak workloads.

Valkey’s multi-threaded architecture delivers up to 230% improved throughput, which reduces the need for additional read replicas during scaling. From a cost perspective, ElastiCache for Valkey offers approximately 20% cost savings on node-based clusters. Additional benefits included up to 20.6% lower memory consumption and full compatibility with existing Jedis client libraries, requiring zero application code changes during migration.

Securonix’s deployment was designed around a per-tenant model with cluster mode disabled for isolated workloads. Other clusters use cluster mode enabled to support microservices at scale, spread across AWS Regions to support their global customer base and data residency requirements. This architecture serves three distinct workload types:

SIEM platform: Caching layer for data ingestion and query performance, storing threat intelligence indicators, entity attributes, and geolocation data for real-time enrichment and correlation.
SOAR workflows: Shared clusters supporting short-lived sessions, JWT token management, distributed logging, workflow state including rate limiting and quotas, and session authentication.
Spark jobs: Enrichment data lookups for each customer, including user metadata, security policies, and detection rules.

Together, the per-tenant clusters handle data-intensive SIEM ingestion and Spark enrichment in isolation, while the shared microservice clusters provide low-latency session and workflow state management across tenants. This enables Securonix to balance tenant isolation with operational efficiency at scale.

The following diagram illustrates the solution architecture.

Executing a phased migration at enterprise scale

The migration was completed in two weeks across 250+ clusters. The first sprint focused on proof-of-concept validation and compatibility testing. After performance benchmarking confirmed readiness, the team executed a phased production rollout without downtime.

Securonix drove the evaluation process, with their engineering team taking the lead on validating the migration path and reviewing cluster configurations. AWS Solutions Architects supported the effort by providing technical expertise from initial proof of concept through final production cutover.

The migration team developed a comprehensive three-phase plan designed to validate performance at each step before proceeding:

Phase 1 (Compatibility validation): The team analyzed their Redis usage patterns against Valkey’s command set. They identified two primary workloads for initial migration: enrichment lookups (read-heavy operations for identity, asset, and threat intelligence data) and correlation windows (read/write operations for analytics rule state management). Valkey supports all Redis data structures used by both workloads, confirming zero application code changes would be required.
Phase 2 (Parameter group configuration and one-click upgrade): After compatibility validation, the team created custom Valkey parameter groups mirroring their existing Redis configurations. Using the AWS Management Console’s one-click upgrade feature, the team executed the migration in the development environment. AWS automatically provisioned new Valkey nodes, migrated data without interruption, and maintained service availability throughout.
Phase 3 (Incremental production cutover): With validation complete, Securonix executed a carefully orchestrated production migration. They began with low-impact customer clusters and progressively rolled out to all 250+ clusters. Each cluster migration was performed as a single in-place upgrade operation through the AWS Management Console, taking approximately 10–20 minutes with zero downtime.

Monitoring and observability

Securonix implemented monitoring using Amazon CloudWatch metrics tailored for ElastiCache for Valkey performance tracking. Valkey’s multi-threaded architecture improved CPU efficiency and allowed each node to handle higher concurrency. Key metrics, including CPU utilization and real-time latency, confirmed that performance was maintained throughout and after the migration, with sub-millisecond response times. One of the most notable post-migration observations was the complete elimination of replication lag that had previously introduced data consistency challenges.

The team also used CloudWatch alarms to monitor cluster health across all regional deployments, providing the operational visibility needed to manage a fleet of 250+ clusters with confidence.

Results

The migration to ElastiCache for Valkey achieved their primary goals: 20% cost reduction while maintaining zero downtime and sub-millisecond latency.

“Our collaboration with AWS reflects a shared commitment to building scalable, secure, and high-performance cloud-native platforms that empower customers to stay ahead of evolving cyber threats. At Securonix, we successfully migrated our Redis-based caching layer to Amazon ElastiCache for Valkey with zero downtime. ElastiCache for Valkey’s low-latency architecture significantly enhanced our high-throughput security analytics pipeline, delivering an over 30% improvement in query performance for real-time threat detection. The migration also drove about 20% overall cost reduction, aided by Graviton-based compute optimization and fully managed clustering with auto-failover. The transition was smooth, eliminating operational bottlenecks while improving resiliency, observability, and scalability across global regions.”

— Sanjeev Kishore Yarnapati, Director of Infrastructure at Securonix

Business impact: The migration delivered a 20% reduction in caching infrastructure costs across all 250+ clusters (approximately $135,000 in annual savings), freeing the team to reinvest in other technical innovations. The zero-downtime migration demonstrated that large-scale infrastructure transitions can be executed without disrupting the enterprise security operations that Securonix’s customers depend on.

Performance enhancements: CPU utilization became more predictable and efficient due to Valkey’s multi-core, asynchronous I/O model. This reduced the need for additional read replicas during peak workloads, allowing each node to handle higher concurrency. Sub-millisecond latency was consistently maintained for enrichment and correlation workloads.

Operational benefits: Full Redis protocol compatibility meant zero application code changes were required across the entire fleet. ElastiCache’s in-place upgrade path delivered consistent, auditable migration across 250+ clusters while preserving operational simplicity for ongoing scaling, patching, backups, and high availability management.

Lessons learned and best practices

Through their migration journey, Securonix gained valuable insights that can benefit other organizations considering a similar transition.

“If we were starting over, we would profile workloads first and invest in thorough development environment validation earlier — it gave us the confidence to move quickly in production” the team reflected. Understanding specific performance requirements and usage patterns upfront saves significant time and reduces risk.

The migration’s success stemmed from three strategies that other organizations can apply.

First, thorough preparation proved essential. Securonix began by conducting workload profiling to understand their specific performance patterns and requirements. This baseline data informed every subsequent decision and helped identify potential challenges before they became problems.

Collaboration with AWS early in the process accelerated the timeline significantly. Rather than attempting the migration independently, Securonix engaged AWS Solutions Architects during the planning phase. This partnership provided access to best practices, technical guidance, and support that streamlined the entire process.

The phased approach minimized risk while maintaining momentum. Starting with development environments allowed the team to validate the migration process, test performance benchmarks, and refine their procedures before touching production systems. Each phase built confidence and refined the process for the next.

Conclusion

In this post, we shared how Securonix migrated to ElastiCache for Valkey to achieve cost savings without sacrificing performance or operational simplicity. Through careful workload profiling, a phased in-place upgrade strategy, and close collaboration with AWS, Securonix reduced annual caching costs by 20% while improving application performance. The team completed the migration of 250+ clusters in only two weeks without service interruption and with no application code changes required.

The successful migration has positioned Securonix for continued innovation. They plan to upgrade to ElastiCache 8.x for additional memory optimizations and to expand automation coverage across their global deployment. The cost savings have strengthened Securonix’s ability to invest in the AI and analytics capabilities that power their next-generation SIEM platform.

If you run caching workloads on AWS, you can use ElastiCache for Valkey to reduce costs while maintaining or improving performance. Migrate your existing ElastiCache for Redis OSS workloads to ElastiCache for Valkey. Contact your AWS account team to discuss migration strategies tailored to your workloads.

If you have questions or suggestions about this post, leave a comment in the comments section.

About the authors

Running pgvector in production on Amazon Aurora PostgreSQL

Stefan Aichholzer — Thu, 25 Jun 2026 18:47:03 +0000

Running pgvector on Amazon Aurora PostgreSQL-Compatible Edition gives you a production-grade vector store on a database you already know, backed by the operational tooling, high availability, and scaling behaviour of Amazon Aurora. That combination is why pgvector has become a common choice for Retrieval Augmented Generation (RAG) workloads moving from proof of concept into production with Service Level Agreements (SLAs). Production traffic does introduce a predictable set of operational considerations: query latency as the corpus grows, recall on filtered vector searches, memory headroom during index builds, and connection behaviour under load. This post is scoped to the database operations that keep the RAG retrieval layer healthy. Model customization through fine-tuning or continued pre-training is out of scope.

In this post, we cover the operational practices that keep a pgvector workload healthy once you depend on it: choosing the right index and distance function, scaling with quantization and partitioning, managing Hierarchical Navigable Small World (HNSW) churn, sizing for memory-resident operation, and the observability signals that catch problems early.

How this post is organized

First, choose an index type (HNSW or Inverted File with Flat Compression) for your dataset and write pattern. Second, set up a baseline schema and query with the AWS-recommended parameters. Third, pick a distance operator that matches your embedding model. Fourth, scale the index to your target dataset size with quantization, parameter tuning, and partitioning. Fifth, plan for churn, capacity, and observability before traffic arrives. Each section builds on the one before.

The examples throughout use a multi-tenant document-store schema on Aurora PostgreSQL-Compatible with Amazon Titan Text Embeddings V2, an Amazon Bedrock foundation model, as the embedding model. Each SQL example is runnable against an Aurora PostgreSQL-Compatible cluster with the vector extension enabled.

This post covers the self-managed path, where you are responsible for the vector store and retrieval pipeline. For a fully managed RAG capability that handles ingestion, embedding, and retrieval for you, use Amazon Bedrock Knowledge Bases, which supports Amazon Aurora PostgreSQL with pgvector as one of its vector store options.

Prerequisites

To follow the examples in this post, you need:

An Aurora PostgreSQL-Compatible cluster running PostgreSQL version 17.4+, 16.8+, 15.12+, 14.17+, or 13.20+, which supports pgvector 0.8.0. pgvector 0.8.0 is available in all AWS commercial Regions and AWS GovCloud (US) Regions, except AWS China Regions. For the launch details, see the pgvector 0.8.0 Aurora announcement. For the current Region list, see the Aurora PostgreSQL extension versions page.
The vector extension enabled on your database.
An embedding model. The examples assume Amazon Titan Text Embeddings V2, a foundation model available through Amazon Bedrock that produces 1024-dimensional embeddings by default.

Choosing the right index strategy

pgvector ships two Approximate Nearest Neighbor (ANN) index types: HNSW and IVFFlat. Both trade perfect accuracy for speed, because a brute-force exact search across millions of vectors is too slow for online queries. The AWS Prescriptive Guidance on vector database options covers where pgvector fits among AWS vector stores. The Self-managed multi-tenant vector search with Amazon Aurora PostgreSQL post walks through the pgvector basics if you want a refresher on the terminology.

For production RAG on Aurora PostgreSQL, HNSW is the default choice for most workloads. But there are two cases where the right answer is no index at all: small datasets, and partitioned datasets where 100% recall is required. The rest of this section explains why HNSW wins for the common case, when IVFFlat still makes sense, and when skipping the index entirely is the correct production decision.

How HNSW works, and why it is the default

HNSW stands for Hierarchical Navigable Small World. It builds a multi-layer proximity graph. Layer 0 contains every vector, densely connected to its nearest neighbors. Each higher layer samples a smaller subset of vectors with longer-range connections. A single vector is the entry point, present in every layer. For a detailed walk-through of both HNSW and IVFFlat internals beyond the summary here, see the deep dive into IVFFlat and HNSW techniques.

A search starts at the entry point in the top layer. Within a layer, pgvector performs a greedy best-first walk along the graph edges, moving to whichever connected neighbor is closest to the query vector. When it can no longer get closer, it descends to the same node in the layer below and continues the walk there. At layer 0 the walk expands into a beam search controlled by the hnsw.ef_search parameter, and the top candidates are returned.

The following diagram shows that descent in action, from the entry point at the top layer down to the beam search at layer 0.

HNSW layered descent. A query enters at the entry point in the top layer, performs a greedy walk within each layer to find the local nearest neighbor, descends to the same node in the layer below, and expands into a beam search at layer 0 to return the top-k results.

This structure makes HNSW fast to query and lets it accept inserts incrementally. You can write new vectors directly into a live index without pausing or batching. Recall and latency are controlled at query time through hnsw.ef_search, so you can tune per workload without rebuilding the index.

The trade-off is build cost. HNSW indexes take longer to build and use more memory than IVFFlat, because the graph itself must be stored alongside the vectors. For the production RAG pattern this post targets, the trade-off is worth it: you build the index once, and queries stay fast provided you manage churn (covered later in the Managing index churn section).

When IVFFlat still makes sense

IVFFlat stands for Inverted File with Flat compression. Instead of a graph, it groups vectors into clusters using k-means. Each cluster has a representative point at its center, called a centroid. At query time, pgvector compares the query vector to the centroids, picks the nearest clusters, and searches only inside those clusters.

The following diagram shows the space partitioned into clusters, with a query compared against the centroids and then against the vectors inside the nearest clusters.

IVFFlat clustering. The vector space is partitioned into clusters with a centroid at the center of each. A query is compared to the centroids first and then searched only inside the nearest clusters.

IVFFlat is cheaper to build than HNSW: clustering is simpler than graph construction, and the index stores no graph, so it uses less memory. However, IVFFlat fixes the centroids at build time. If your data changes after the index is built, new vectors might not match the existing clusters well, and recall drops. The fix is to rebuild the index, which recomputes the centroids against the current data. Centroid retraining means re-running k-means over the full dataset to produce a fresh set of cluster centers.

IVFFlat fits a narrow set of workloads: large, mostly static corpora with an existing batch rebuild schedule. Everything else belongs on HNSW.

When no index is the right choice

There are two production patterns where no index outperforms either ANN option.

The first is the small-dataset case. For tables with roughly 10,000 to 50,000 vectors, a parallel sequential scan is fast enough on its own, and skipping the index avoids build time, maintenance cost, and the recall loss inherent to approximate search. This is the case called out in the pgvector 0.8.0 best practices, and it is the usual starting point for new workloads before the corpus grows.

The second is the partitioned, recall-critical case. If your schema partitions data so that every query touches a bounded subset (for example, per tenant or per user), and your use case requires 100% recall, a brute-force parallel scan within a partition can beat an ANN index. The Ring engineering team runs exactly this pattern in production: 100 to 200 billion embeddings are spread across per-user partitions of roughly 1 GB each. Each partition is scanned with max_parallel_workers_per_gather = 16, with no vector index at all. Removing the index was the change that let PostgreSQL pick parallel sequential scans over single-threaded index scans, which drove EBS throughput from about 50 MB/s to nearly 500 MB/s. Full details are in the Ring billion-scale semantic video search post.

Two conditions must hold for this pattern to work. First, the per-partition scan has to fit the latency budget, which in practice means each partition stays small enough to be served mostly from the buffer cache or local NVMe. Second, your workload must genuinely need 100% recall. If the ANN recall achievable with HNSW is acceptable, HNSW is simpler and faster. When both conditions are met, skipping the index removes an entire class of operational concerns: no build cost, no churn management, no memory sizing for index pages.

Quick decision guide

The following flowchart summarises the decision, starting from dataset size and walking through write pattern, HNSW build cost, and partitioning to a concrete index choice.

Index choice flowchart. Pick an index type based on dataset size, write pattern, HNSW build cost, and partitioning.

The “skip the index” branch covers both cases from the previous section: small tables where a sequential scan is already fast enough, and partitioned schemas where 100% recall is required within a bounded per-partition scan. Ring runs the second pattern in production, with per-customer partitions so every search reads only that customer’s data. For how they designed for it, see the preceding no-index discussion and the Ring billion-scale semantic video search post.

A runnable baseline

Before going deeper into similarity functions and scaling, here is a baseline schema, index, and query that the rest of the post builds on. It reflects the AWS-recommended HNSW parameters from the pgvector 0.8.0 best practices section and the iterative scan default we recommend for production.

The baseline uses HNSW, which fits the common production RAG case: per-tenant partitions are large enough that a sequential scan is not competitive, and ANN recall at the default settings is acceptable. If your workload looks like the no-index pattern covered earlier, small tables, or partitioned schemas where 100% recall is required within a bounded per-partition scan, omit the HNSW index from the following example and keep the rest of the schema as shown.

-- Enable the extension (one-time, per database)
CREATE EXTENSION IF NOT EXISTS vector;

-- A multi-tenant document store with Titan Text Embeddings V2 (1024 dimensions)
CREATE TABLE documents (
    id bigint GENERATED BY DEFAULT AS IDENTITY PRIMARY KEY,
    tenant_id text NOT NULL,
    content text NOT NULL,
    embedding vector(1024) NOT NULL,
    created_at timestamptz NOT NULL DEFAULT now()
);

-- B-tree on the filter column, so the planner has an alternative path
CREATE INDEX documents_tenant_idx ON documents (tenant_id);

-- HNSW with the AWS-recommended starting parameters, cosine distance
CREATE INDEX documents_embedding_hnsw_idx
ON documents USING hnsw (embedding vector_cosine_ops)
WITH (m = 16, ef_construction = 128);

-- Production-ready filtered vector query (iterative scan values are identifiers, not strings)
SET hnsw.iterative_scan = relaxed_order;
SET hnsw.ef_search = 100;

SELECT id, content
FROM documents
WHERE tenant_id = '<your-tenant-id>'
ORDER BY embedding <=> '<your-query-embedding>'
LIMIT 10;

On a populated documents table, this returns up to 10 rows ordered by cosine distance:

id	content
42	Q3 revenue grew 18% year over year, driven by enterprise…
117	The FY26 capital plan was approved at the November board…
203	Customer churn declined to 2.1% following the support model…
…	…
(10 rows)

In production code, wrap the two SET statements and the SELECT in a transaction and use SET LOCAL instead, so the values apply only to that query rather than the whole session. Swap vector_cosine_ops for vector_ip_ops (and the <=> operator for <#>) if your embeddings are unit-normalized. The next section explains when each choice applies.

Similarity functions at scale

pgvector supports several distance operators. The three that matter for text and semantic embeddings are listed in the following table. Each operator is paired with a specific HNSW operator class at index-creation time. If you use the wrong pair, the planner cannot use the index.

Operator	Meaning	HNSW operator class	When to use
<=>	Cosine distance	`vector_cosine_ops`	Safe default for text or semantic embeddings. Ignores vector magnitude.
<#>	Negative inner product	`vector_ip_ops`	Faster than cosine when vectors are already unit-normalized.
<->	L2 (Euclidean) distance	`vector_l2_ops`	Rarely appropriate for semantic search.

A note on <#>: pgvector returns the negative inner product because PostgreSQL can only use ASC-order index scans. This means ORDER BY embedding <#> query ASC returns the most similar vectors first. If you want a positive similarity score in the result, multiply by -1: SELECT (embedding <#> query) * -1 AS similarity.

Picking cosine or inner product

Amazon Titan Text Embeddings V2 normalizes by default through the normalize API parameter, which defaults to true. For Amazon Titan Text Embeddings V2 output with the default settings, inner product (<#>) is the right choice. On normalized vectors, cosine and inner product produce identical rankings, because cosine divides the inner product by the product of the vector norms, and for unit vectors that divisor is always 1. The norms and division still get computed per comparison, but inner product skips that work. If you override normalize to false, or use an embedding model that does not normalize (such as BGE or older Sentence-Transformers models), fall back to cosine.

Verify that your stored vectors are actually unit-normalized before switching operators. Norm should be 1.0 within floating-point tolerance:

SELECT id,
       sqrt(greatest(0, (embedding <#> embedding) * -1)) AS norm
FROM documents
LIMIT 10;

If one or more rows return a norm that is materially different from 1.0, your data is not normalized and the <#> shortcut will produce wrong results. The greatest(0, …) guard protects against sqrt of a tiny negative floating-point residual for vectors that are unit-normalized within tolerance.

Iterative scans for filtered queries

pgvector 0.8.0 introduced iterative index scans, which fix the overfiltering problem. Before 0.8.0, a query that combined a WHERE clause with a vector search often returned fewer results than the LIMIT asked for, because the index returned its top-k candidates before the filter was applied and most candidates were filtered out.

Iterative scans keep pulling candidates from the index until the query satisfies the filter or hits a configurable limit. Three modes are available:

off: the pre-0.8.0 behavior. Fastest, but can under-return.
strict_order: preserves exact distance ordering. Safer, but slower for selective filters.
relaxed_order: returns the correct count with approximate ordering within the result set. We recommend this mode for most production use cases.

Two related parameters bound the scan when you enable iterative mode: hnsw.max_scan_tuples (default 20,000) caps how far the scan walks the index, and hnsw.scan_mem_multiplier (default 1) caps how much memory the scan uses as a multiple of work_mem. Raise scan_mem_multiplier first if max_scan_tuples increases alone do not improve recall for a filtered query.

Scaling to millions of vectors

Three levers matter when the dataset grows beyond a few hundred thousand vectors: quantization, HNSW parameter tuning, and partitioning.

Quantization reduces memory footprint at a small cost to recall. pgvector supports halfvec (16-bit floats) and binary quantization. The AWS pgvector 0.7.0 benchmarks show halfvec cuts memory roughly in half with minimal recall loss, and binary quantization produces large build speed-ups at the cost of recall that you recover with a re-ranking pass, shown in the next subsection. For most workloads, start with halfvec. Move to binary quantization only when you already have a re-rank pipeline in place.

The benchmarks used OpenAI (5M vectors, 1536 dimensions) and Cohere (10M vectors, 768 dimensions) datasets. Amazon Titan Text Embeddings V2 at 1024 dimensions sits between them, so the memory and build-time trade-offs should apply proportionally. Validate against your own data before committing to a quantization strategy.

The following chart shows the memory and recall trade-off across float32, halfvec, and binary representations on those benchmarks.

Memory footprint and recall for float32, halfvec, and binary vector representations, sourced from the AWS pgvector 0.7.0 benchmarks. Halfvec halves memory with minimal recall loss; binary quantization reduces memory further but requires a re-rank pass to recover recall.

Two-stage retrieval with binary quantization

An HNSW index and its full-precision vectors must stay memory-resident for performance. When the working set exceeds what RAM can hold, the choices are to shrink the in-memory representation (quantization), extend effective memory with Aurora Optimized Reads, or both. Binary quantization with re-ranking does the shrinking: coarse candidate selection runs on tiny binary vectors that fit easily, and re-ranking by cosine distance pulls full-precision vectors only for the final top-N.

The following SQL pattern implements that two-stage retrieval: Hamming distance for the coarse pass, cosine distance for the re-rank. The re-rank is what recovers most of the recall lost to quantization.

-- Coarse recall pass (fast, on binary-quantized vectors), then re-rank by cosine
SELECT id, content
FROM (
    SELECT id, content, embedding
    FROM documents
    WHERE tenant_id = '<your-tenant-id>'
    ORDER BY binary_quantize(embedding)::bit(1024)
            <~> binary_quantize('<your-query-embedding>')
    LIMIT 100
) candidates
ORDER BY embedding <=> '<your-query-embedding>'
LIMIT 10;

The outer query sees only the 100 coarse candidates, so the expensive cosine comparison runs on a small set. The inline binary_quantize() cast is shown for clarity. For production at scale, materialize the binary vectors in a stored column and index them with bit_hamming_ops so the coarse pass uses an index rather than computing the quantization at query time. For benchmarks of this pattern on pgvector 0.8.0 and Aurora, see Supercharging vector search performance and relevance with pgvector 0.8.0 on Amazon Aurora PostgreSQL.

HNSW exposes three parameters worth tuning:

m: max connections per layer. Higher values improve recall but increase memory use and build time. AWS recommends m = 16 as the starting point.
ef_construction: dynamic candidate list size during build. Higher values improve index quality at build time. AWS recommends ef_construction = 128. The pgvector default is 64.
ef_search: dynamic candidate list size during query. Higher values improve recall at query latency cost. The pgvector default is 40, which is often too low for production. Tune per workload. Do not hardcode.

Partitioning keeps individual indexes manageable. Partition by tenant for multi-tenant workloads, by time for append-heavy workloads such as event or log data, or by category when queries are scoped to a known subset. Partitioning also supports parallel index builds and per-partition rebuilds.

For bulk loads, defer indexing. Load the data first, then build the index once at the end. Inserting into a live HNSW index is slower than a single post-load build, and the resulting index is often better structured.

When the working set outgrows RAM

RAM residency is the first target, but at some dataset sizes keeping the entire HNSW graph in shared_buffers stops being economical. Aurora Optimized Reads extends effective cache capacity by up to 5x the instance memory using local NVMe storage as a tiered cache, with up to 8x lower read latency for queries that would otherwise fetch from Aurora storage. The feature documentation lists pgvector nearest-neighbor search across millions of vector embeddings as a target use case.

Optimized Reads tiered cache requires an Aurora I/O-Optimized cluster on an NVMe-backed instance family (r6gd, r8gd, or r6id) and is enabled automatically on those instance classes. On an Aurora Standard cluster, Optimized Reads provides only temporary object acceleration, not tiered cache. Monitor the AuroraOptimizedReadsCacheHitRatio Amazon CloudWatch metric alongside BufferCacheHitRatio to see how much of the traffic misses RAM but still hits NVMe rather than storage.

Treat tiered cache as an extension of the memory budget, not a substitute for it: RAM is still faster than NVMe, and NVMe is still faster than Aurora storage. Size the instance so the hot working set stays in RAM, and let Optimized Reads absorb the long tail.

Managing index churn

HNSW indexes degrade with deletes and updates. Each change leaves an invalid entry in the graph. Over time, this hurts recall and inflates index size. There is no in-place compaction.

The following chart illustrates how recall drifts downward between rebuilds and recovers at each scheduled REINDEX CONCURRENTLY.

Illustrative HNSW recall over time. Without scheduled rebuilds, recall drifts downward as invalid graph entries accumulate. With a scheduled REINDEX CONCURRENTLY, recall recovers at each rebuild. Actual curves depend on workload, write mix, and index parameters.

Three patterns work in production:

REINDEX CONCURRENTLY rebuilds the index without blocking writes, but is resource-intensive. Schedule it during low-traffic windows and monitor maintenance_work_mem headroom. For large indexes, plan hours, not minutes.

REINDEX INDEX CONCURRENTLY documents_embedding_hnsw_idx;

Partition-based rebuilds are often cleaner. If you partition by time, you can drop and rebuild an entire partition’s index in one operation rather than churning a monolithic graph. This also caps the scope of a failed rebuild.

Append-only with periodic compaction suits workloads where old data becomes irrelevant. Write new vectors to an active partition, then periodically move or drop older partitions. The active index stays small and fast.

Your write pattern determines which one to pick. Workloads that are mostly inserts with few updates do fine on a scheduled REINDEX CONCURRENTLY. Workloads with heavy updates or deletes are better served by partitioning (by time or by batch) and rebuilding partitions independently. Workloads where relevance is time-bounded, for example a search over only the last 90 days, suit the append-only pattern with periodic compaction of old partitions.

Capacity planning

Churn management determines how often the graph is rebuilt. Capacity planning determines whether each rebuild and each query has the memory it needs. The two go together.

An HNSW index must stay memory-resident. When it spills to disk, search latency degrades, because the graph traversal pattern is random and each miss is a page fault. Plan for the index to fit in RAM with headroom for concurrent queries and maintenance operations.

The HNSW graph consumes more memory than the raw vector data, driven primarily by the m parameter. Size Aurora instances to keep shared_buffers plus connection and work memory below the instance’s available RAM.

Choose a memory-optimized instance class for Aurora PostgreSQL vector workloads, since the HNSW graph must fit in RAM. The Amazon Relational Database Service (Amazon RDS) r-series instance families are sized for memory-bound workloads such as vector search. For the current list of supported instance classes, see the Aurora DB instance classes documentation.

Tune these PostgreSQL parameters:

shared_buffers: Aurora sets a reasonable default, but verify it covers your expected index working set.
effective_cache_size: signals the query planner about available OS-level cache. Set to roughly 75% of instance memory on Aurora.
maintenance_work_mem: critical during index builds. Too low, and CREATE INDEX or REINDEX spill and slow down. pgvector emits a NOTICE when the HNSW graph no longer fits in maintenance_work_mem. If you see it in your build logs, raise the parameter on the instance and re-run the build.
work_mem: per-operation. With many concurrent vector queries, low work_mem causes spills. Too high a value risks memory pressure under load.
max_parallel_maintenance_workers: pgvector 0.7.0 added parallel HNSW builds. Set this to take advantage of larger instances during index creation.

For pricing, see the Amazon Aurora pricing page.

Observability

You cannot tune what you cannot see. Four observability layers matter for pgvector: query-level statistics, instance-level metrics, wait-event analysis, and application-defined custom metrics. Connection handling rounds them out.

Query-level statistics come from pg_stat_statements and the Aurora-specific aurora_stat_statements function. The Aurora variant adds storage I/O and peak memory columns, which matter for vector queries because a spilling search looks different in I/O than a cached one. To find the top vector queries by total time and peak memory:

SELECT substring(query, 1, 80) AS query,
       calls,
       round(total_exec_time::numeric, 1) AS total_ms,
       max_exec_peakmem
FROM aurora_stat_statements(true)
WHERE query ILIKE '%<=>%' OR query ILIKE '%<#>%'
   OR query ILIKE '%<->%' OR query ILIKE '%<~>%'
ORDER BY total_exec_time DESC
LIMIT 10;

total_exec_time is in milliseconds and max_exec_peakmem is in bytes. The peak memory columns require Aurora PostgreSQL 16.3, 15.7, or 14.12 and higher. On earlier minor versions, remove max_exec_peakmem from the SELECT list.

Instance-level metrics in CloudWatch tell you whether the index fits in memory. Watch BufferCacheHitRatio (should stay above 99% for healthy vector workloads), SwapUsage (should be zero), and ReadIOPS (sustained high values on a read-heavy vector workload suggest the index is spilling).

Amazon CloudWatch Database Insights surfaces slow queries with wait event breakdowns. Use it to find vector queries blocked on I/O or lock contention.

Custom metrics close the gap between database health and application health. Two are worth building:

Recall tracking: periodically run a fixed set of evaluation queries whose correct top-k results you know, compare what pgvector returns against the known correct set, and emit the recall percentage to CloudWatch. Drops in recall indicate index drift or query parameter regression.
p99 latency for vector searches, tagged by query type. Tail latency on vector search often moves before CPU or memory metrics do, because a handful of queries that evict the HNSW graph from cache can degrade recall and latency without changing averages.

Connection handling also needs monitoring. Vector queries are memory-heavy, so an over-subscribed connection pool exhausts work_mem quickly. Use Amazon RDS Proxy in front of Aurora and watch DatabaseConnections, ClientConnections, and MaxDatabaseConnectionsAllowed.

Clean up

If you created the baseline documents table and its indexes to follow along with the examples, remove them when you are done to avoid incurring ongoing storage charges for unused data:

DROP INDEX IF EXISTS documents_embedding_hnsw_idx;
DROP INDEX IF EXISTS documents_tenant_idx;
DROP TABLE IF EXISTS documents;

The vector extension does not incur charges on its own, but if you enabled it only for these examples, you can remove it with DROP EXTENSION IF EXISTS vector;. Do not drop the extension if other databases in the cluster are using it.

If you provisioned a dedicated Aurora PostgreSQL-Compatible cluster for testing, delete the cluster and its automated backups with the AWS Command Line Interface (AWS CLI) once you are done. Aurora PostgreSQL-Compatible clusters incur charges for as long as they exist, including storage charges on stopped clusters, so delete the cluster after testing to stop these charges:

# Delete all DB instances in the cluster first
aws rds delete-db-instance \
    --db-instance-identifier <your-db-instance-id> \
    --skip-final-snapshot

# Then delete the cluster
aws rds delete-db-cluster \
    --db-cluster-identifier <your-cluster-id> \
    --skip-final-snapshot \
    --delete-automated-backups

Manual snapshots persist after a cluster is deleted and continue to incur storage charges until you delete them explicitly. If you took any during testing, remove them:

# Remove any manual snapshots you took during testing
aws rds delete-db-cluster-snapshot \
    --db-cluster-snapshot-identifier <your-snapshot-id>

Replace , , and with the values from your test environment. The --skip-final-snapshot flag is appropriate for a disposable test cluster. Do not use it on a cluster that holds data you want to keep.

Key takeaways

Before going live, verify these five:

Plan for HNSW rebuilds from day one. Choose REINDEX CONCURRENTLY on a schedule, partition-based rebuilds, or append-only with compaction based on your write pattern.
Set hnsw.ef_search explicitly at session or query level. The default (40) is often too low for production recall. A value of 100 is a sensible starting point.
Size maintenance_work_mem to cover your largest index build, with headroom. A build that spills to disk produces a lower-quality graph.
Account for concurrent vector searches when sizing connection quotas. Each concurrent query consumes work_mem. Multiply out and use Amazon RDS Proxy in front of the cluster.
Treat BufferCacheHitRatio as the first metric to check. It is the earliest signal that an index has outgrown its instance.

Conclusion

Running pgvector on Amazon Aurora PostgreSQL at production scale rewards teams who design for their workload up front. Corpus size, write mix, filter patterns, and recall targets shape the right index choice, parameter settings, and capacity plan. The sooner you make those decisions deliberately, the smoother the path from proof of concept to SLA-backed production.

In this post, we covered the operational practices that matter: picking the right index and distance function, scaling with quantization and partitioning, managing HNSW churn, sizing for memory-resident operation, and the observability signals that catch problems early. Treat this as a checklist for your first production rollout, and revisit it when traffic or dataset size doubles.

About the authors

Centralized traffic inspection for Oracle Database@AWS

Sameer Malik — Thu, 25 Jun 2026 18:45:53 +0000

In a previous post, Implement network connectivity patterns for Oracle Database@AWS, we covered three connectivity patterns. These are direct peering between an application VPC and the Oracle Database@AWS network, single-Region connectivity using AWS Transit Gateway, and multi-Region connectivity using AWS Cloud WAN. Whether your traffic flows between application VPCs, on-premises networks, and the Oracle Database@AWS network, or heads outbound to the internet, it needs to be inspected. This post walks you through two centralized inspection patterns that route traffic through a dedicated inspection VPC before it reaches its destination: one using AWS Transit Gateway and another using AWS Cloud WAN with service insertion.

Oracle Database@AWS is an offering that provides access to Oracle Exadata Database Service and Oracle Autonomous Database on dedicated Oracle Exadata infrastructure, managed by Oracle Cloud Infrastructure (OCI) within AWS data centers. Organizations in regulated industries like financial services, healthcare, and government often need to inspect network traffic flowing to and from these databases for intrusion detection and prevention (IDS/IPS), data loss prevention (DLP), or domain filtering.

Prerequisites

Before proceeding, you should be familiar with ODB networks, ODB peering, peered CIDRs, and the ODB transit VPC concept from the previous connectivity patterns post. You’ll need an ODB network peered to a transit VPC with connectivity to AWS Transit Gateway or AWS Cloud WAN. You should also be familiar with AWS Cloud WAN service insertion concepts, as well as Amazon Virtual Private Cloud (Amazon VPC) concepts such as route tables, subnets, NAT gateways, and internet gateways.

Important: Today you cannot deploy firewall endpoints directly in the ODB transit VPC. Traffic destined for addresses outside the transit VPC’s own CIDRs is forwarded directly over the transit gateway or AWS Cloud WAN attachment, bypassing subnet route table lookups entirely. Although you can deploy firewall endpoints in individual application VPCs for some use cases, this distributed approach introduces additional route table configuration overhead and must be repeated for every VPC, making centralized inspection the practical choice at scale.

Pattern 1: Centralized traffic inspection with AWS Transit Gateway

As shown in figure 1, this pattern connects the ODB network, an ODB transit VPC peered to it, application VPCs, a centralized inspection VPC (firewall endpoints, NAT gateway, internet gateway), and on-premises networks to a transit gateway as attachments. On-premises connectivity uses AWS Direct Connect or AWS Site-to-Site VPN. The transit gateway uses two route tables to steer traffic. The pre-inspection route table is associated with the spoke attachments (application VPCs, on-premises, ODB transit VPC) and has a default route (0.0.0.0/0) pointing to the inspection VPC attachment. The post-inspection route table is associated with the inspection VPC attachment and has specific routes for application VPC CIDRs, on-premises CIDRs, and the ODB network CIDR pointing to the respective attachments.

Figure 1: East-west and north-south traffic inspection using Transit Gateway with a centralized inspection VPC.

Packet walkthrough for east-west traffic flow

The following steps walk through the east-west traffic flow (labeled ‘a’ / Orange in Figure 1):

An Amazon Elastic Compute Cloud (Amazon EC2) instance in the application VPC resolves the Oracle database hostname and initiates a connection using the resolved IP address within the ODB network CIDR.
The application VPC subnet route table has a route for the ODB network CIDR (or a default route) pointing to the transit gateway. The packet is forwarded to the transit gateway through the application VPC attachment.
The transit gateway performs a route table lookup in the pre-inspection route table (associated with the application VPC attachment). The default route points to the inspection VPC attachment. The packet is forwarded to the inspection VPC.
The packet arrives in the transit gateway attachment subnet of the inspection VPC. The subnet route table routes the packet to the firewall endpoint for inspection.
The inspection appliance processes the packet according to your security policies. If the packet passes inspection, the subnet route table directs post-inspection traffic back to the transit gateway.
The transit gateway looks up the post-inspection route table (associated with the inspection VPC attachment). A specific route for the ODB network CIDR points to the ODB transit VPC attachment, and the packet is forwarded there.
The ODB transit VPC subnet route table routes the packet to the ODB peering connection. The packet traverses ODB peering into the ODB network and reaches the Oracle database.
Response traffic from the ODB network follows the symmetric reverse path.

Packet walkthrough for north-south traffic flow (internet outbound traffic)

If your Oracle database workloads require internet access, you can route this traffic through the centralized inspection VPC before it exits to the internet. Internet egress from ODB@AWS is possible through either the OCI side or the AWS side. This post covers the AWS-side egress pattern. This requires configuring specific destination CIDRs in the peered CIDRs prefix list on the ODB network. See Oracle DB@AWS Internet Outbound Traffic blog for more information.

The following steps walk through the north-south traffic flow (labeled ‘b’ / Green in Figure 1):

Traffic from the ODB network destined for the internet matches a route configured by the peered CIDRs prefix list on the OCI side, which directs the packet to the ODB peering connection as the next hop. The packet crosses into the ODB transit VPC.
The packet is forwarded to the transit gateway through the ODB transit VPC attachment.
The transit gateway performs a route table lookup in the pre-inspection route table. The default route points to the inspection VPC attachment. The packet is forwarded to the inspection VPC.
The packet arrives in the inspection VPC and is routed to the firewall endpoint for inspection.
After inspection, the packet is routed to the NAT gateway, which performs source address translation.
The NAT gateway forwards the packet to the internet gateway, which sends it out to the internet.
Return traffic from the internet follows the reverse path through the internet gateway, NAT gateway, firewall, transit gateway, and back to the ODB network through the ODB transit VPC.

Pattern 2: Centralized traffic inspection with AWS Cloud WAN

As shown in Figure 2, application VPCs and on-premises networks attach to the AWS Cloud WAN core network and are mapped to their respective segments (App VPC and Hybrid). The inspection VPC attachment is mapped to a Network Function Group (NFG). A transit gateway provides connectivity between the ODB transit VPC and the AWS Cloud WAN core network via a TGW route table attachment. A static route for the ODB network CIDR points to the ODB transit VPC attachment on the transit gateway, and this TGW route table is associated with the Cloud WAN TGW route table attachment. This propagates the ODB CIDR dynamically via BGP into the Cloud WAN segment and NFG route tables.

With this in place, Cloud WAN’s service insertion actions steer traffic through the inspection NFG. East-west flows use the send-via action, while north-south flows use the send-to action, as detailed in the following sections.

East-west traffic flow (send-via action)

Figure 2: East-west traffic inspection using AWS Cloud WAN with service insertion.

A send-via action is configured between the ODB Network segment and the App VPC segment (and hybrid segment for on-premises traffic) through the inspection NFG.

The following steps walk through the east-west traffic flow (labeled ‘c’ in Figure 2):

An Amazon EC2 instance in the application VPC resolves the Oracle database hostname and initiates a database connection.
The application VPC subnet route table has a route for the ODB network CIDR pointing to the AWS Cloud WAN core network attachment. The packet is forwarded to the AWS Cloud WAN core network.
A send-via action is configured between the App VPC segment (or hybrid segment for on-premises traffic) and the ODB Network segment. AWS Cloud WAN’s routing tables direct the packet to the inspection VPC through the NFG attachment.
The packet arrives in the inspection VPC and is routed to the firewall endpoint for inspection.
After inspection, the packet exits the inspection VPC and re-enters the AWS Cloud WAN core network through the NFG attachment.
AWS Cloud WAN routes post-inspection traffic to the Transit Gateway route table attachment. The ODB network CIDR is present because it was dynamically propagated through BGP from the transit gateway.
The transit gateway forwards the packet to the ODB transit VPC. The ODB transit VPC subnet route table routes the packet through the ODB peering connection into the ODB network, where it reaches the Oracle database.
Return traffic follows the symmetric reverse path. The send-via action is bidirectional; traffic from the ODB network passes through the same inspection VPC before reaching the application VPC.

On-premises traffic follows the same pattern, entering AWS Cloud WAN through the Direct Connect or AWS Site-to-Site VPN attachment in the hybrid segment and passing through the inspection VPC before reaching the ODB network.

North-south traffic flow (send-to action for internet outbound traffic)

Figure 3: North-south internet outbound traffic inspection using AWS Cloud WAN service insertion.

As shown in Figure 3, a send-to action is configured on the ODB Network segment pointing to the inspection NFG. Internet egress from ODB@AWS is possible through either the OCI side or the AWS side. This post covers the AWS-side egress pattern. Internet-bound traffic from the ODB network is steered to the inspection VPC, inspected, then exits through the NAT gateway and internet gateway. As with Pattern 1, this requires configuring specific destination CIDRs in the peered CIDRs prefix list.

The following steps describe this flow (labeled ‘d’ in Figure 3):

Traffic from the ODB network destined for the internet matches a route configured by the peered CIDRs prefix list on the OCI side, which directs the packet to the ODB peering connection as the next hop. The packet crosses into the ODB transit VPC.
The ODB transit VPC forwards the packet to the transit gateway. The transit gateway then sends the packet to the AWS Cloud WAN core network through the TGW route table attachment.
Because a send-to action is configured on the ODB Network segment, AWS Cloud WAN’s routing tables direct the internet-bound packet to the inspection VPC through the NFG attachment.
The packet arrives in the inspection VPC and is routed to the firewall endpoint for inspection.
After passing inspection, the packet is forwarded to the NAT gateway. The NAT gateway performs source NAT and forwards the packet to an internet gateway, which sends it out to the internet.
Return traffic from the internet arrives at the internet gateway, passes through the NAT gateway for address translation, and is routed back to the firewall for inspection. After inspection, the packet re-enters the AWS Cloud WAN core network through the NFG attachment. The NFG route table has a propagated route for the ODB network CIDR pointing to the TGW route table attachment, which forwards the packet to the transit gateway, ODB transit VPC, and back into the ODB network.

These patterns extend to multi-Region deployments. You can onboard additional Regions by attaching regional VPCs, transit gateways, and inspection infrastructure to the same AWS Cloud WAN core network.

Considerations

When implementing network inspection for Oracle Database@AWS, keep these considerations in mind:

ODB peered CIDRs – The peered CIDRs list defines all CIDRs that can communicate with the ODB network through the peering connection. For east-west traffic, include application VPC CIDRs (primary and secondary) and on-premises network CIDRs. For north-south internet-bound traffic exiting through AWS, include the specific internet destination CIDRs the database needs to reach. As of this writing, the 0.0.0.0/0 entry isn’t allowed, so you must split internet destinations into specific CIDR ranges that comply with the peered CIDR restrictions. You don’t need to include inspection VPC CIDRs because inspection is transparent and preserves the original source IP. For guidance on peered CIDRs configuration and restrictions, see Oracle DB@AWS Internet Outbound Traffic.
Appliance mode for flow symmetry – Turn on appliance mode on the inspection VPC’s transit gateway attachment (Pattern 1) or AWS Cloud WAN attachment (Pattern 2). Without it, forward and return paths may traverse different Availability Zones, breaking stateful inspection.
AWS Network Firewall native Transit Gateway attachment – AWS Network Firewall supports native Transit Gateway attachment for east-west inspection without a dedicated inspection VPC. North-south outbound traffic still requires a separate VPC with a NAT gateway.
DNS traffic path – DNS queries to the ODB network follow the same inspection path as data traffic. Account for this in your firewall rules, or place Amazon Route 53 Resolver outbound endpoints in the ODB transit VPC to bypass inspection.
Multi-Region deployments – Deploy inspection VPCs in each Region to avoid cross-Region latency and data transfer costs. In Pattern 2, AWS Cloud WAN’s feature (single-hop mode) controls which Region’s inspection VPC is used.

Conclusion

In this post, we demonstrated how to implement centralized network traffic inspection for Oracle Database@AWS using AWS Transit Gateway with a centralized inspection VPC for single-Region deployments, or AWS Cloud WAN with service insertion for multi-Region architectures. Both approaches route traffic through a dedicated inspection VPC because the ODB transit VPC doesn’t support inline inspection. For more information, see the AWS Network Firewall documentation, the AWS Cloud WAN service insertion documentation, and the Oracle Database@AWS User Guide.

About the authors

Build a Spring Boot REST API with Amazon Aurora DSQL

Mirron Panicker — Wed, 24 Jun 2026 19:34:40 +0000

In this post, you learn how to build a Spring Boot REST API that integrates with Aurora DSQL. You’ll configure the Aurora DSQL JDBC Connector for IAM authentication, implement optimistic concurrency control, and run the application across two regional nodes to observe active-active behavior.

Aurora DSQL reduces the operational complexity of managing multi-Region replication while Spring Boot provides the familiar framework for building REST APIs. This combination allows you to focus on application logic rather than database infrastructure management.

This post is intended for developers and solutions architects who are familiar with Java, Spring Boot, and relational databases.

Walkthrough

By the end of this walkthrough, you’ll have a working REST API that demonstrates the following:

Setting up Aurora DSQL with the Aurora DSQL JDBC Connector.
Handling optimistic concurrency control with retry logic.
Building a RESTful product inventory API using Spring Boot.
Running the application on two regional nodes and testing concurrent multi-Region writes.

Solution overview

The following diagram illustrates the architecture of the sample application.

Figure 1: Architecture diagram showing two Spring Boot application nodes deployed in AWS Regions us-east-1 and us-west-2. Each node connects through HikariCP connection pooling and the Aurora DSQL JDBC Connector with IAM authentication to regional endpoints of an Aurora DSQL multi-region cluster. The diagram illustrates synchronous cross-region replication between the regional endpoints.

The application uses the following components:

Spring Boot 3.3 – REST API framework.
HikariCP – Connection pooling.
Aurora DSQL JDBC Connector – IAM authentication, token refresh, TLS encryption, and database connectivity.
Application Load Balancer (ALB) – Distributes incoming traffic across Spring Boot nodes and routes around unhealthy instances.

Prerequisites

Before you begin, make sure you have the following:

An AWS account with an Aurora DSQL cluster created (check the Aurora DSQL documentation for current Region availability).
AWS Command Line Interface (AWS CLI) configured with credentials.
Java 17 or higher installed.
Maven 3.6 or higher installed.
An AWS Identity and Access Management (AWS IAM) user or role with the minimum permissions shown below.

AWS IAM Policy (minimum permissions):

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Action": ["dsql:DbConnectAdmin"],
      "Resource": "arn:aws:dsql:<region>:<account-id>:cluster/<cluster-id>"
    }
  ]
}

Replace <region>, <account-id>, and <cluster-id> with your actual values. For production workloads, follow the principle of least privilege and scope permissions to specific clusters.

Step 1: Create your multi-Region Aurora DSQL cluster

Create the cluster in your primary Region

aws dsql create-cluster \
    --region <PRIMARY_REGION> \
    --multi-region-properties '{"witnessRegion":"<WITNESS_REGION>"}'

Create the cluster in your secondary Region

aws dsql create-cluster \
    --region <SECONDARY_REGION> \
    --multi-region-properties '{"witnessRegion":"<WITNESS_REGION>","clusters":["arn:aws:dsql:<PRIMARY_REGION>:<ACCOUNT_ID>:cluster/<PRIMARY_CLUSTER_ID>"]}'

Peer the primary cluster with the secondary

aws dsql update-cluster \
    --region <PRIMARY_REGION> \
    --identifier <PRIMARY_CLUSTER_ID> \
    --multi-region-properties '{"witnessRegion":"<WITNESS_REGION>","clusters":["arn:aws:dsql:<SECONDARY_REGION>:<ACCOUNT_ID>:cluster/<SECONDARY_CLUSTER_ID>"]}'

Step 2: Set up the project

Clone the sample repository:

git clone https://github.com/aws-samples/aurora-dsql-samples.git
cd aurora-dsql-samples/java/spring_boot

Update src/main/resources/application.properties with your DSQL endpoint:

# The Aurora DSQL JDBC Connector handles IAM auth, token refresh, and SSL automatically
spring.datasource.url=jdbc:aws-dsql:postgresql://<your-endpoint>.dsql.<region>.on.aws
spring.datasource.driver-class-name=software.amazon.dsql.jdbc.DSQLConnector

# AWS IAM user or role name for authentication
spring.datasource.username=<username>

# AWS Region where your DSQL cluster is deployed
spring.cloud.aws.region.static=<region>

The application uses the Aurora DSQL JDBC Connector which automatically handles IAM authentication, token refresh, and TLS encryption.

Aurora DSQL data type support: Aurora DSQL supports a subset of PostgreSQL data types including UUID, VARCHAR, TEXT, INTEGER, BIGINT, DECIMAL, BOOLEAN, TIMESTAMP, DATE, and JSON. The sample application uses these types throughout. For the complete list of supported types and Aurora DSQL-specific limits, see Supported data types in Aurora DSQL.

Step 3: Handle optimistic concurrency

Aurora DSQL uses optimistic concurrency control instead of traditional locking. Optimistic concurrency control allows multiple transactions to proceed without locking resources, checking for conflicts only at commit time. When concurrent transactions conflict, one will receive a 40001 SQL state error. We handle this with retry mechanism.

There are two layers of handling required:

Layer 1: Tell HikariCP not to evict the connection on a 40001 error

public static class DsqlExceptionOverride implements SQLExceptionOverride {

    public SQLExceptionOverride.Override adjudicate(SQLException ex) {
        if ("40001".equals(ex.getSQLState())) {
            return SQLExceptionOverride.Override.DO_NOT_EVICT;
        }
        return SQLExceptionOverride.Override.CONTINUE_EVICT;
    }
}

dataSource.setExceptionOverrideClassName(DsqlExceptionOverride.class.getName());

Layer 2: Retry the transaction with exponential backoff

@Retryable(
    retryFor = OptimisticLockingFailureException.class,
    maxAttemptsExpression = "${dsql.retry.max-attempts:4}",
    backoff = @Backoff(
        delayExpression      = "${dsql.retry.initial-delay-ms:100}",
        multiplierExpression = "${dsql.retry.multiplier:2.0}",
        maxDelayExpression   = "${dsql.retry.max-delay-ms:2000}",
        random = true
    ))

You can tune via application.properties without changing code:

dsql.retry.max-attempts=4
dsql.retry.initial-delay-ms=100
dsql.retry.multiplier=2.0
dsql.retry.max-delay-ms=2000

Step 4: Build the REST API

The sample application includes a product inventory API that provides standard CRUD operations:

@RestController
@RequestMapping("/api/products")
public class ProductController {

    @PostMapping
    public ResponseEntity<Product> createProduct(@RequestBody Product product) {
        Product created = productService.createProduct(product);
        return ResponseEntity.status(HttpStatus.CREATED).body(created);
    }

    @GetMapping
    public ResponseEntity<List<Product>> getAllProducts() {
        return ResponseEntity.ok(productService.getAllProducts());
    }

    @PatchMapping("/{id}/stock")
    public ResponseEntity<Map<String, String>> updateStock(
            @PathVariable UUID id,
            @RequestParam int quantity) {
        productService.updateStock(id, quantity);
        return ResponseEntity.ok(Map.of("message", "Stock updated"));
    }
}

Step 5: Run and test the application

Connect to the EC2 instance running the application:

Make sure its Security Group allows outbound traffic on port 5432 to the Aurora DSQL cluster endpoint:

ssh -i <YOUR_KEY.pem> ec2-user@<EC2_PUBLIC_IP>

For detailed steps on connecting to an EC2 instance, refer to the Connect to your EC2 Instance.

Build and run the application:

mvn clean install
mvn spring-boot:run

Initialize the database schema:

curl -X POST http://<EC2_INSTANCE_IP>:8080/api/products/init

Expected response: HTTP/1.1 200 OK

Create a product:

curl -X POST http://<EC2_INSTANCE_IP>:8080/api/products \
    -H "Content-Type: application/json" \
    -d '{
  "name": "Sample Product",
  "description": "A sample product for testing",
  "price": 29.99,
  "stock": 100
}'

Expected response:

{
  "id": "550e8400-e29b-41d4-a716-446655440000",
  "name": "Sample Product",
  "description": "A sample product for testing",
  "price": 29.99,
  "stock": 100
}

Retrieve the products:

curl http://<EC2_INSTANCE_IP>:8080/api/products

Expected response:

[
  {
    "id": "550e8400-e29b-41d4-a716-446655440000",
    "name": "Sample Product",
    "description": "A sample product for testing",
    "price": 29.99,
    "stock": 100
  }
]

Update stock:

curl -X PATCH "http://<EC2_INSTANCE_IP>:8080/api/products/<product-id>/stock?quantity=50"

Expected response:

{
  "message": "Stock updated"
}

Step 6: Testing concurrent multi-Region writes

Update src/main/resources/application.properties with your regional DSQL endpoints.

# Node 1, Region: us-east-1
spring.datasource.url=jdbc:aws-dsql:postgresql://<us-east-1-endpoint>.dsql.us-east-1.on.aws
spring.datasource.driver-class-name=software.amazon.dsql.jdbc.DSQLConnector
spring.datasource.username=<username>
spring.cloud.aws.region.static=us-east-1

# Node 2, Region: us-west-2
spring.datasource.url=jdbc:aws-dsql:postgresql://<us-west-2-endpoint>.dsql.us-west-2.on.aws
spring.datasource.driver-class-name=software.amazon.dsql.jdbc.DSQLConnector
spring.datasource.username=<username>
spring.cloud.aws.region.static=us-west-2

Build the application.
```
mvn clean install
```
Start the application on each node.
```
mvn spring-boot:run
```

Update src/main/java/com/example/controller to add the concurrent write endpoint.

@RestController
@RequestMapping("/api/products")
public class LoadTestController {

    private final ProductService productService;
    private final ExecutorService executor = Executors.newFixedThreadPool(20);

    public LoadTestController(ProductService productService) {
        this.productService = productService;
    }

    @PostMapping("/load-test")
    public ResponseEntity<LoadTestResult> runLoadTest(
            @RequestParam UUID productId,
            @RequestParam int count,
            @RequestParam int delta) {
        long start = System.currentTimeMillis();
        AtomicInteger succeeded = new AtomicInteger();
        AtomicInteger failed = new AtomicInteger();

        List<CompletableFuture<Void>> futures = IntStream.range(0, count)
            .mapToObj(i -> CompletableFuture.runAsync(() -> {
                try {
                    productService.updateStock(productId, delta);
                    succeeded.incrementAndGet();
                } catch (Exception e) {
                    failed.incrementAndGet();
                }
            }, executor))
            .collect(Collectors.toList());

        CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
        Product product = productService.getProduct(productId);

        return ResponseEntity.ok(new LoadTestResult(
            productId, count,
            succeeded.get(), failed.get(),
            System.currentTimeMillis() - start,
            product.getStock()
        ));
    }

    @PreDestroy
    public void shutdown() { executor.shutdown(); }
}

public record LoadTestResult(
    UUID productId,
    int requested,
    int succeeded,
    int failed,
    long elapsedMs,
    int finalStock
) {}

Run the multi-Region concurrent write test

Example 1: Creating Products Concurrently

# On Node 1: create a product with stock 1000 & note the returned id
curl -X POST http://<node1-host>:8080/api/products \
    -H "Content-Type: application/json" \
    -d '{"name":"Multi-Region Test","description":"Active-active write test","price":9.99,"stock":1000}'

# Trigger both load tests at the same time
# Node 1 (us-east-1): 500 concurrent +1 updates
curl -X POST "http://<node1-host>:8080/api/products/load-test?productId=<id>&count=500&delta=1"

# Node 2 (us-west-2): 500 concurrent -1 updates
curl -X POST "http://<node2-host>:8080/api/products/load-test?productId=<id>&count=500&delta=-1"

# Verify the final stock from either node:
curl http://<node1-host>:8080/api/products/<id>

Expected results:

Node 1:

{
  "productId": "<id>",
  "requested": 500,
  "succeeded": 500,
  "failed": 0,
  "elapsedMs": 3240,
  "finalStock": 1000
}

Node 2:

{
  "productId": "<id>",
  "requested": 500,
  "succeeded": 500,
  "failed": 0,
  "elapsedMs": 3180,
  "finalStock": 1000
}

finalStock returns to 1000 because all 500 increments from us-east-1 and 500 decrements from us-west-2 are accounted.

Example 2: Single-entry concurrent UPDATE example

# Node 1 (us-east-1)
curl -X PUT "http://<EC2_INSTANCE_IP>:8080/api/products/$PRODUCT_ID" \
    -H "Content-Type: application/json" \
    -d '{"name":"Updated by Node 1","description":"us-east-1 write","price":19.99,"stock":200}' &

# Node 2 (us-west-2) --- at the same time
curl -X PUT "http://<EC2_INSTANCE_IP>:8081/api/products/$PRODUCT_ID" \
    -H "Content-Type: application/json" \
    -d '{"name":"Updated by Node 2","description":"us-west-2 write","price":29.99,"stock":300}'

One of the two requests will succeed immediately. The other will receive a 40001 OCC conflict from Aurora DSQL, and @Retryable will transparently retry it. Both requests return HTTP 200.

Check the Spring Retry debug logs to see the conflict and retry:

DEBUG RetryTemplate : Retry: count=0; for: 'ProductService.updateProduct'
DEBUG ExponentialRandomBackOffPolicy : Sleeping for 73
DEBUG RetryTemplate : Retry: count=1; for: 'ProductService.updateProduct'

Verify the final state from either node:

curl -s http://<EC2_INSTANCE_IP>:8080/api/products/$PRODUCT_ID | jq '{name, stock}'

Clean up

To avoid incurring future charges, delete the resources you created:

Delete the multi-Region Aurora DSQL clusters.

# Disable deletion protection for primary cluster
aws dsql update-cluster \
    --region <PRIMARY_REGION> \
    --identifier <PRIMARY_CLUSTER_ID> \
    --no-deletion-protection-enabled

# Delete primary cluster
aws dsql delete-cluster \
    --region <PRIMARY_REGION> \
    --identifier <PRIMARY_CLUSTER_ID>

# Disable deletion protection for secondary cluster
aws dsql update-cluster \
    --region <SECONDARY_REGION> \
    --identifier <SECONDARY_CLUSTER_ID> \
    --no-deletion-protection-enabled

# Delete secondary cluster
aws dsql delete-cluster \
    --region <SECONDARY_REGION> \
    --identifier <SECONDARY_CLUSTER_ID>

Remove the Application Load Balancer.

aws elbv2 delete-load-balancer \
    --load-balancer-arn <LOAD_BALANCER_ARN>

Stop and terminate the EC2 Instance running the Spring Boot application.

# To stop the instance
aws ec2 stop-instances \
    --instance-ids <INSTANCE_ID>

# To terminate the instance
aws ec2 terminate-instances \
    --instance-ids <INSTANCE_ID>

Key takeaways

The following are key considerations when building applications with Aurora DSQL:

Aurora DSQL JDBC Connector – The Aurora DSQL JDBC Connector handles IAM-based authentication, automatic token refresh, and TLS encryption. This can reduce the need for manual password management, token rotation logic, and SSL configuration.
Optimistic concurrency – Aurora DSQL uses optimistic concurrency control, which requires implementing retry logic but enables better scalability than pessimistic locking. When two users update the same row simultaneously, one receives a 40001 error. Two layers of handling are required: DsqlExceptionOverride tells HikariCP not to evict the connection (keeping the pool healthy under load), and @Retryable transparently retries the transaction with exponential backoff.
Connection pooling – Proper HikariCP configuration supports efficient connection reuse and optimal performance.

Security considerations

When deploying this application in a production environment, consider the following security best practices:

Follow the principle of least privilege by scoping your AWS IAM policy to the specific Aurora DSQL cluster ARN. Use dsql:DbConnect instead of dsql:DbConnectAdmin for application users that do not need administrative access. For secrets management, verify that your AWS credentials are managed through AWS IAM roles for Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS) task roles, or AWS IAM Roles Anywhere rather than long-lived access keys.

At the regional level, place an Application Load Balancer in front of your Spring Boot nodes. Configure Amazon Route 53 health checks on the ALB endpoints in each region. If a regional endpoint becomes unavailable and the ALB in that region starts failing health checks, Amazon Route 53 automatically shifts DNS traffic to the ALB in the surviving region. Because each node runs identical application code and connects to its nearest regional endpoint via configuration only, no application code changes are required during this failover.

Production considerations

The patterns in this post provide a foundation for production applications. For a production deployment, also consider the following:

For observability, add Amazon CloudWatch metrics for connection pool utilization, token refresh success and failure rates, and retry counts. Use structured logging with correlation IDs for request tracing. Implement a Spring Boot Actuator health indicator that verifies database connectivity, so your load balancer can detect unhealthy instances. Adjust HikariCP’s maximumPoolSize, minimumIdle, and connectionTimeout based on your expected concurrency, and monitor pool metrics to right-size these values. Be aware that Aurora DSQL uses Distributed Processing Units (DPUs) for pricing, which measure database activity including compute resources, I/O operations, and SQL workload execution, so review the Aurora DSQL pricing page to understand cost implications for your workload.

For regional failover, Aurora DSQL’s multi-Region cluster provides built-in high availability. If a regional endpoint becomes unavailable, the cluster remains operational through the secondary regional endpoint.

Conclusion

In this post, we demonstrated how you can build a Spring Boot REST API that integrates with Amazon Aurora DSQL. By using the Aurora DSQL JDBC Connector for IAM authentication and TLS encryption, and implementing optimistic concurrency control with Spring Retry, you can build scalable, globally distributed applications without the operational overhead of traditional databases.

The sample code provides foundational patterns for authentication, concurrency control, and error handling that you can adapt to your own applications. Whether you’re building a new application or evaluating Aurora DSQL for an existing workload, these patterns help you take advantage of the serverless, multi-Region capabilities of Aurora DSQL.

To provide feedback or contribute, visit the GitHub repository.

About the authors

Automating cross-account refresh for Amazon RDS Multi-AZ DB clusters

Bijay Ranabhat — Wed, 24 Jun 2026 19:29:45 +0000

Keeping non-production environments current with production data is a common operational need. In this post, you learn how to automate cross-account environment refresh for Amazon Relational Database Service (Amazon RDS) Multi-AZ DB clusters (available for PostgreSQL and MySQL) using a serverless pipeline that runs with a single trigger.

Amazon RDS supports snapshot sharing for DB instances but not for Multi-AZ DB clusters, therefore you can’t share Multi-AZ DB clusters between accounts by using cluster snapshots. This solution implements cross-account cluster sharing by using an intermediate instance snapshot, then uses AWS Lambda, AWS Step Functions, and Amazon EventBridge to orchestrate seven steps spanning two AWS accounts.

Solution overview

The pipeline automates environment refresh through seven steps spanning two AWS accounts. The architecture creates snapshots, restores intermediate instances, shares snapshots across accounts, and restores the final Multi-AZ DB cluster in the destination, all without manual intervention.

The following diagram illustrates the solution architecture:

The workflow consists of the following steps.

Source account

1. Create a cluster snapshot. A Lambda function creates a manual snapshot of the Multi-AZ DB cluster using the create-db-cluster-snapshot API. Step Functions manages the wait loop, polling the snapshot status every 30 seconds until it becomes available.

2. Restore to a temporary single-AZ instance. The Lambda function restores the cluster snapshot to a temporary single-AZ DB instance using the restore-db-instance-from-db-snapshot API with the DBClusterSnapshotIdentifier parameter. The RDS API supports restoring a cluster snapshot to a standalone single-AZ instance, which produces an instance from which you can create a shareable snapshot.

3. Create an instance snapshot. After the temporary instance becomes available, the Lambda function creates a standard DB instance snapshot using the create-db-snapshot API. Unlike cluster snapshots, instance snapshots can be shared across accounts.

4. Share the instance snapshot. The Lambda function shares the instance snapshot with the destination account using the modify-db-snapshot-attribute API and grants the necessary AWS Key Management Service (AWS KMS) permissions.

5. Clean up the temporary instance. After snapshot sharing completes, the pipeline deletes the temporary single-AZ instance with SkipFinalSnapshot=True. This instance was only needed to produce a shareable snapshot and is no longer required.

Cross-account handoff

When the source Step Functions workflow succeeds, an EventBridge rule in the source account forwards a success event to the destination account’s default event bus. The default event bus is the primary EventBridge endpoint in each AWS account that receives events from AWS services. A rule in the destination account catches this event and invokes a Starter Lambda function. The Starter Lambda function extracts the snapshot ARN from the Step Functions output and starts the destination workflow.

Destination account

6. Copy the shared snapshot. A Lambda function in the destination account copies the shared snapshot locally using the copy-db-snapshot API, re-encrypting it with the destination account’s customer-managed AWS KMS key. The pipeline references the shared snapshot by its full ARN (passed from the Starter Lambda function), which allows the copy operation to proceed without needing to list or discover the snapshot. Step Functions manages the wait loop until the copy completes.

7. Restore as a Multi-AZ DB cluster. The Lambda function restores the copied snapshot as a Multi-AZ DB cluster using the restore-db-cluster-from-snapshot API. The pipeline uses gp3 storage by default for cost efficiency, but you can change this to match your requirements. Step Functions waits until the cluster reaches the available state.

Through this architecture, the pipeline runs without intervention after you start it. The following sections walk through the implementation steps.

Prerequisites

Before you begin, verify that you have the following:

Two AWS accounts (source and destination) with permissions to create AWS Identity and Access Management (IAM) roles, AWS Lambda functions, Step Functions workflows, Amazon Simple Notification Service (Amazon SNS) topics, Amazon EventBridge rules, and AWS KMS keys
An Amazon RDS Multi-AZ DB cluster encrypted with a customer-managed AWS KMS key in the source account
Experience deploying AWS CloudFormation stacks and configuring IAM roles
The AWS Command Line Interface (AWS CLI) configured with named profiles for both accounts

Important: Your source Multi-AZ DB cluster must be encrypted with a customer-managed AWS KMS key. You cannot share snapshots encrypted with the default AWS managed key across accounts, and you cannot encrypt an unencrypted Multi-AZ DB cluster after creation.

Note: The prerequisite AWS CloudFormation templates automatically configure the cross-account AWS KMS key policies. If your production cluster already uses a customer-managed AWS KMS key, this solution provides a helper script to patch the existing key policy instead of creating a new key.

Implementation

You deploy AWS CloudFormation stacks in order: prerequisites in each account first, then the pipeline stacks. The prerequisite templates source-prereqs.yaml and destination-prereqs.yaml create supporting resources with the correct cross-account AWS KMS policies. The unified pipeline template rds-refresh-stack.yaml uses a DeploymentMode parameter to control which resources you create in each account.

The complete source code, AWS CloudFormation templates, and helper scripts are available in the accompanying GitHub repository.

Deploy prerequisites in the source account

The source prerequisites template creates a customer-managed AWS KMS key with a cross-account policy, a DB subnet group for the temporary instance, and an Amazon Simple Storage Service (Amazon S3) bucket for the Lambda deployment package. The Amazon S3 bucket is created with server-side encryption, public access blocking, and versioning enabled.

If your production cluster already uses a customer-managed AWS KMS key, you can grant the destination account access using one of the following approaches:

A. Console (recommended for existing clusters): Open the AWS KMS console in the source account, navigate to Customer managed keys, find the key used by your cluster, choose the Key policy tab, and choose Edit. Add a policy statement granting the destination account kms:Decrypt, kms:DescribeKey, kms:CreateGrant, and kms:ReEncryptFrom permissions with a kms:GrantIsForAWSResource condition.

B. Helper script (idempotent, safe to run multiple times):

./scripts/patch_kms_policy.sh <SourceKmsKeyId> <DestinationAccountId> --profile source-account

C. New key via CloudFormation (greenfield setup): Deploy the full prerequisites stack, which creates a new KMS key with the cross-account policy, a DB subnet group, and an S3 bucket:

aws cloudformation deploy \
  --profile source-account \
  --template-file iac/source-prereqs.yaml \
  --stack-name rds-refresh-prereqs \
  --parameter-overrides \
  DestinationAccountId=<DestinationAccountId> \
  SubnetIds=<SubnetA1>,<SubnetA2>,<SubnetA3>

Deploy prerequisites in the destination account

The destination prerequisites template creates a customer-managed AWS KMS key with the source account granted access, a DB subnet group spanning at least three Availability Zones, an Amazon Virtual Private Cloud (Amazon VPC) security group, a DB cluster parameter group with SSL enforcement turned on, and an Amazon S3 bucket with encryption and public access blocking.

aws cloudformation deploy \
  --profile destination-account \
  --template-file iac/destination-prereqs.yaml \
  --stack-name rds-refresh-prereqs \
  --parameter-overrides \
  VpcId=<VpcId> \
  SubnetIds=<SubnetB1>,<SubnetB2>,<SubnetB3> \
  SourceAccountId=<SourceAccountId>

Deploy the pipeline in the source account

In source mode, the unified template creates the following resources:

A Lambda function that handles cluster snapshot creation, temporary instance restore, instance snapshot creation, and cross-account sharing
A Step Functions workflow that orchestrates the Lambda function with wait loops, status checks, and error handling
An Amazon SNS topic that publishes notifications at each step for success and failure
An EventBridge rule that forwards the Step Functions workflow success event to the destination account’s default event bus
IAM execution roles with scoped permissions for Lambda, Step Functions, and EventBridge

Package and deploy the source pipeline:

aws cloudformation package \
  --profile source-account \
  --template-file iac/rds-refresh-stack.yaml \
  --s3-bucket <SourceS3BucketName> \
  --output-template-file iac/packaged-source.yaml

aws cloudformation deploy \
  --profile source-account \
  --template-file iac/packaged-source.yaml \
  --stack-name rds-refresh \
  --capabilities CAPABILITY_IAM \
  --parameter-overrides \
  DeploymentMode=source \
  PeerAccountId=<DestinationAccountId> \
  KmsKeyId=<SourceKmsKeyId> \
  SourceClusterIdentifier=<SourceClusterName> \
  TempInstanceSubnetGroup=<TempSubnetGroupName>

After the stack is created, note the NotificationTopicArn and SourceStateMachineArn from the stack outputs. You need both for the destination deployment and for triggering the pipeline:

aws cloudformation describe-stacks \
  --profile source-account \
  --stack-name rds-refresh \
  --query "Stacks[0].Outputs" \
  --output table

Deploy the pipeline in the destination account

In destination mode, the same template creates the following resources:

A Lambda function that handles shared snapshot copy with re-encryption and Multi-AZ DB cluster restore
A Starter Lambda function that forwards EventBridge events to the destination Step Functions workflow by extracting the snapshot ARN from the source workflow output
A Step Functions workflow that orchestrates the destination Lambda function with wait loops, status checks, and error handling
An EventBridge rule and bus policy that accepts events from the source account and invokes the Starter Lambda function
An Amazon SNS subscription that subscribes to the source account’s notification topic for cross-account alerts
IAM execution roles with scoped permissions for Lambda, Step Functions, and EventBridge

Package and deploy the destination pipeline:

aws cloudformation package \
  --profile destination-account \
  --template-file iac/rds-refresh-stack.yaml \
  --s3-bucket <DestinationS3BucketName> \
  --output-template-file iac/packaged-dest.yaml

aws cloudformation deploy \
  --profile destination-account \
  --template-file iac/packaged-dest.yaml \
  --stack-name rds-refresh \
  --capabilities CAPABILITY_IAM \
  --parameter-overrides \
  DeploymentMode=destination \
  PeerAccountId=<SourceAccountId> \
  KmsKeyId=<DestinationKmsKeyId> \
  SourceKmsKeyArn=arn:aws:kms:<Region>:<SourceAccountId>:key/<SourceKmsKeyId> \
  SourceSnsTopicArn=arn:aws:sns:<Region>:<SourceAccountId>:rds-refresh-notifications \
  DbSubnetGroup=<DestinationSubnetGroupName> \
  VpcSecurityGroups=<VpcSecurityGroupId> \
  DbClusterParameterGroup=<DbClusterParameterGroupName> \
  DbInstanceClass=<DbInstanceClass>

Test the solution

After you deploy both stacks, test the solution by triggering the pipeline with a single CLI command.

Start the source workflow:

aws stepfunctions start-execution \
  --profile source-account \
  --state-machine-arn arn:aws:states:<Region>:<SourceAccountId>:stateMachine:rds-refresh-source \
  --input '{"source_cluster_id": "<SourceClusterName>"}'

Replace <SourceClusterName> with the identifier of the Multi-AZ DB cluster you want to use. You can find this value in the Amazon RDS console under Databases or by running aws rds describe-db-clusters --profile source-account. The source_cluster_id in the input payload overrides the default cluster configured in the stack, so you can target a different Multi-AZ DB cluster without redeploying.

The command returns an execution ARN:

{
  "executionArn": "arn:aws:states:<Region>:<SourceAccountId>:execution:rds-refresh-source:<ExecutionName>",
  "startDate": "2025-04-10T14:30:00.000Z"
}

From here, the pipeline runs end-to-end:

The source workflow creates a cluster snapshot, restores a temporary single-AZ instance, creates an instance snapshot, shares it with the destination account, and cleans up the temporary instance.
When the source workflow succeeds, EventBridge forwards the event to the destination account.
The Starter Lambda function in the destination account extracts the snapshot ARN and starts the destination workflow.
The destination workflow copies the shared snapshot (re-encrypting with the destination account’s AWS KMS key) and restores it as a new Multi-AZ DB cluster.

Monitor the execution in the Step Functions console, or use the CLI:

aws stepfunctions describe-execution \
  --execution-arn arn:aws:states:<Region>:<SourceAccountId>:execution:rds-refresh-source:<ExecutionName> \
  --profile source-account

To verify the pipeline completed successfully, check the Amazon RDS events in the destination account:

aws rds describe-events \
  --source-type db-cluster \
  --duration 1440 \
  --profile destination-account

The output shows events confirming the following sequence:

Snapshot copy completed in the destination account
Multi-AZ DB cluster restore initiated
DB instances created within the restored cluster
Cluster status changed to available

Amazon SNS notifications are published at each step for both success and failure.

Clean up

Delete the following resources if you no longer need them.

The pipeline handles most cleanup of runtime resources automatically. After the instance snapshot is created and shared, the source workflow deletes the temporary single-AZ instance with SkipFinalSnapshot=True. Intermediate snapshots (the cluster snapshot and instance snapshot in the source account) are conditionally deleted based on retention policy parameters in the stack configuration.

In the destination account, the copied snapshot persists after the cluster is restored. Delete it manually after the restored cluster is running, or retain it as a point-in-time reference.

To remove the solution, delete the stacks in reverse order. Start with the pipeline stacks, then delete the prerequisite stacks:

# Destination account: delete pipeline stack, then prerequisites
aws cloudformation delete-stack --stack-name rds-refresh --profile destination-account
aws cloudformation delete-stack --stack-name rds-refresh-prereqs --profile destination-account

# Source account: delete pipeline stack, then prerequisites
aws cloudformation delete-stack --stack-name rds-refresh --profile source-account
aws cloudformation delete-stack --stack-name rds-refresh-prereqs --profile source-account

After you delete the stacks, check for remaining runtime resources. List snapshots in both accounts and delete snapshots that you no longer need:

# List remaining snapshots in the source account
aws rds describe-db-snapshots --profile source-account \
  --query "DBSnapshots[?contains(DBSnapshotIdentifier, 'rds-refresh')]"

# List remaining snapshots in the destination account
aws rds describe-db-snapshots --profile destination-account \
  --query "DBSnapshots[?contains(DBSnapshotIdentifier, 'rds-refresh')]"

Note: AWS CloudFormation does not manage resources created by the pipeline at runtime. Delete remaining Amazon RDS snapshots and the restored cluster in the destination account manually.

(Optional) If you created a test Multi-AZ DB cluster specifically for this solution, delete each member instance first, then delete the cluster. Multi-AZ DB clusters require each member instance to be deleted before you can remove the cluster itself:

# Delete the member instances
aws rds delete-db-instance --db-instance-identifier <ClusterName>-instance-1 --skip-final-snapshot --profile source-account
aws rds delete-db-instance --db-instance-identifier <ClusterName>-instance-2 --skip-final-snapshot --profile source-account
aws rds delete-db-instance --db-instance-identifier <ClusterName>-instance-3 --skip-final-snapshot --profile source-account

# Wait for each instance to be deleted, then delete the cluster
aws rds delete-db-cluster --db-cluster-identifier <ClusterName> --skip-final-snapshot --profile source-account

Conclusion

In this post, you learned how to build an automated cross-account environment refresh pipeline for Amazon RDS Multi-AZ DB clusters. The serverless architecture works around the snapshot sharing constraint by creating an intermediate single-AZ instance, providing a pipeline that runs without intervention after a single trigger.

To extend this solution, consider the following options based on your use case:

Scheduled refreshes. Add Amazon EventBridge Scheduler for automated weekly or monthly refreshes.
Team notifications. Integrate with Amazon Q Developer for Slack notifications on pipeline success or failure.
Data privacy. Add data masking or anonymization as a post-restore step for non-production environments.

The source code, AWS CloudFormation templates, and helper scripts are available in the accompanying GitHub repository. To report issues or suggest improvements, open an issue in the repository.

About the authors

PostgreSQL 18 on Amazon Aurora and Amazon RDS: Performance enhancements

Nazneen Jafri — Wed, 17 Jun 2026 22:29:13 +0000

Managing query performance across composite indexes, diagnosing memory spills in materialized CTEs, and upgrading major versions without plan regressions are everyday challenges for PostgreSQL users. PostgreSQL 18 addresses each of these directly with skip scan optimization that removes redundant indexes, EXPLAIN enhancements that expose storage behavior, and optimizer statistics that survive major version upgrades. You can try these features today on both Amazon Aurora PostgreSQL-Compatible Edition and Amazon Relational Database Service (Amazon RDS) for PostgreSQL.

This is Part 1 of a two-part series covering the key features in PostgreSQL 18. In this post, we focus on performance enhancements: skip scan optimization for multicolumn indexes, enhanced EXPLAIN output, automatic removal of unnecessary self-joins, and several vacuum and autovacuum improvements that help keep your database running efficiently. In Part 2, we cover security, monitoring, developer, and logical replication enhancements. For more information, see the PostgreSQL 18 release notes.

Prerequisites

You can read this post without running any commands, but if you want to try the examples, you must have these prerequisites:

Create an Aurora PostgreSQL cluster or RDS for PostgreSQL instance if you don’t already have one. For instructions, see Create an Amazon Aurora PostgreSQL-Compatible DB cluster or Create a PostgreSQL DB instance, respectively.
Create an Amazon Elastic Compute Cloud (Amazon EC2) instance to install the PostgreSQL client to access the Aurora PostgreSQL or RDS for PostgreSQL instance. For instructions, see Create your EC2 resources and launch your EC2 instance. Or you can set up connectivity between your RDS database and EC2 compute instance in 1-click.
Install the PostgreSQL client. On Amazon Linux 2023, you can use the following commands to download the psql command line tool:
Install all required build dependencies:
```
sudo dnf install -y gcc readline-devel libicu-devel zlib-devel openssl-devel make bison flex perl-core
```
Download PostgreSQL 18 source code. Replace 18.x in the commands below with the current minor version (for example, 18.4):
```
wget https://ftp.postgresql.org/pub/source/v18.x/postgresql-18.x.tar.gz
```
Extract the source:
```
tar -xvzf postgresql-18.x.tar.gz
```
Navigate to the source directory:
```
cd postgresql-18.x
```
Configure the build (set bindir to /usr/bin for system-wide installation):
```
./configure --bindir=/usr/bin --with-openssl
```
Build and install only the client binaries, libraries, and headers:
```
sudo make -C src/bin install
sudo make -C src/include install
sudo make -C src/interfaces install
```
Verify the installation:
```
psql --version
```

Skip scan optimization for multicolumn indexes

PostgreSQL 18 introduces skip scan optimization, which allows the query planner to use multicolumn B-tree indexes more efficiently when the leading columns are not specified in the WHERE clause. This optimization can improve query performance in scenarios where you have composite indexes, but queries don’t always filter the leading columns.

Understanding the skip scan optimization

In previous PostgreSQL versions, a multicolumn index was used efficiently if the query included the leading column(s) in the WHERE clause. For example, with an index on (status, created_date), a query filtering only on created_date could not use the index.

Skip scan allows PostgreSQL to skip through the distinct values of the leading column(s) and use the index for subsequent columns, effectively treating the multicolumn index as multiple single-column indexes.

Example scenario

Let’s create a table with a multicolumn index and observe the skip scan optimization:

-- Create a table for customer orders
CREATE TABLE customer_orders (
    order_id serial PRIMARY KEY,
    status varchar(20),
    created_date date,
    customer_id integer,
    amount numeric(10,2)
);

-- Create a multicolumn index
CREATE INDEX idx_orders_status_date ON customer_orders(status, created_date);

-- Insert sample data with a few distinct status values
INSERT INTO customer_orders (status, created_date, customer_id, amount)
SELECT
    CASE (random() * 4)::integer
        WHEN 0 THEN 'pending'
        WHEN 1 THEN 'processing'
        WHEN 2 THEN 'shipped'
        WHEN 3 THEN 'delivered'
        ELSE 'cancelled'
    END,
    current_date - (random() * 365)::integer,
    (random() * 10000)::integer,
    (random() * 1000)::numeric(10,2)
FROM generate_series(1, 1000000);

-- Analyze the table
ANALYZE customer_orders;

Now let’s run a query that filters only on created_date (not the leading column):

-- Query filtering on created_date only
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM customer_orders
WHERE created_date = '2026-03-15';

PostgreSQL 18 output (with skip scan):

Bitmap Heap Scan on customer_orders  (cost=47.66..5422.68 rows=2726 width=27) (actual time=0.581..3.223 rows=2659 loops=1)
   Recheck Cond: (created_date = '2026-03-15'::date)
   Heap Blocks: exact=2229
   Buffers: shared hit=2266
   ->  Bitmap Index Scan on idx_orders_status_date  (cost=0.00..46.98 rows=2726 width=0) (actual time=0.305..0.305 rows=2659 loops=1)
         Index Cond: (created_date = '2026-03-15'::date)
         Index Searches: 11
         Buffers: shared hit=37
Planning Time: 0.065 ms
Execution Time: 3.374 ms

In PostgreSQL 17 and earlier, this query would likely use a sequential scan or require a separate index on created_date:

PostgreSQL 17 output (without skip scan):

Gather  (cost=1000.00..13833.43 rows=2721 width=27) (actual time=0.995..72.942 rows=2700 loops=1)
   Workers Planned: 2
   Workers Launched: 2
   Buffers: shared hit=7353
   ->  Parallel Seq Scan on customer_orders  (cost=0.00..12561.33 rows=1134 width=27) (actual time=0.213..61.421 rows=900 loops=3)
         Filter: (created_date = '2026-03-15'::date)
         Rows Removed by Filter: 332433
Planning Time: 0.091 ms
Execution Time: 73.200 ms

When skip scan is beneficial

Skip scan optimization is most effective when:

The leading columns have low cardinality (few distinct values).
The query filters on non-leading columns.
The selectivity of the non-leading column filter is high.

Performance comparison

In our testing with Aurora PostgreSQL 17.10 and Aurora PostgreSQL 18.4, both on db.r6g.large with 1 million rows and 5 distinct values in the leading status column (warm buffer cache):

Without skip scan (Aurora PostgreSQL 17), the planner cannot use the composite index on (status, created_date) because the leading column is not in the WHERE clause. It falls back to a Parallel Seq Scan, reading all 7,353 blocks and finishing in 73.2 ms.

With skip scan (Aurora PostgreSQL 18), the planner uses the existing composite index with 11 index searches, reading only 2,266 blocks and finishing in 3.4 ms. That’s a 21x improvement with no additional index required.

The skip scan provides performance comparable to a dedicated single-column index without the storage overhead and maintenance cost of maintaining one.

Enhanced EXPLAIN

PostgreSQL 18 brings substantial improvements to the EXPLAIN command, making query performance analysis more intuitive and comprehensive. These enhancements provide deeper insights into query execution, resource usage, and optimization opportunities, helping developers and database administrators identify performance bottlenecks more effectively.

Automatic buffer statistics in EXPLAIN ANALYZE

OneOne of the significant usability improvements in PostgreSQL 18 is that EXPLAIN ANALYZE now includes buffer statistics by default, removing the need to explicitly specify the BUFFERS option. This change addresses a long-standing pain point where critical I/O information was hidden unless you remember to add the BUFFERS flag.

Prior to PostgreSQL 18, analyzing query performance required this syntax:

-- PostgreSQL 17 and earlier: BUFFERS option required
EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM customer_orders
WHERE created_date >= '2025-06-01'
  AND created_date < '2025-07-01';

Without the BUFFERS option, you would see execution time and row counts but miss information about cache hits, disk reads, and I/O patterns. In PostgreSQL 18, buffer statistics are included automatically:

-- PostgreSQL 18: Buffer statistics included by default
EXPLAIN ANALYZE
SELECT * FROM customer_orders
WHERE created_date >= '2025-06-01'
  AND created_date < '2025-07-01';

The output now shows:

Seq Scan on customer_orders  (cost=0.00..22353.00 rows=84546 width=27)
  (actual time=0.009..53.400 rows=82676 loops=1)
  Filter: ((created_date >= '2025-06-01'::date) AND (created_date < '2025-07-01'::date))
  Rows Removed by Filter: 917324
  Buffers: shared hit=18234 read=1245
Planning Time: 0.073 ms
Execution Time: 56.664 ms

The buffer statistics immediately reveal that most data was served from cache (18,234 shared buffer hits) with only 1,245 blocks read from disk. This helps you understand whether performance issues stem from I/O bottlenecks or other factors.

Memory and disk usage for materialized nodes

PostgreSQL 18 introduces storage visibility for materialized nodes in query plans. When the executor materializes a CTE for repeated access, the plan now reports whether the intermediate results were held in memory or spilled to disk, along with the peak storage consumed. This information was completely invisible in PostgreSQL 17 and earlier. You had to guess based on work_mem settings and row width estimates.

Example 1: Memory storage (default)

EXPLAIN (ANALYZE, BUFFERS)
WITH top_products AS MATERIALIZED (
    SELECT product_id, product_name, price
    FROM products
    WHERE price > 500
)
SELECT * FROM top_products
ORDER BY price DESC
LIMIT 10;

Output:

 Limit  (cost=394.97..394.99 rows=10 width=238) (actual time=3.028..3.031 rows=10.00 loops=1)
   Buffers: shared hit=64
   CTE top_products
     ->  Seq Scan on products  (cost=0.00..189.00 rows=4950 width=22) (actual time=0.012..1.535 rows=4948.00 loops=1)
           Filter: (price > '500'::numeric)
           Rows Removed by Filter: 5052
           Buffers: shared hit=64
   ->  Sort  (cost=205.97..218.34 rows=4950 width=238) (actual time=3.027..3.028 rows=10.00 loops=1)
         Sort Key: top_products.price DESC
         Sort Method: top-N heapsort  Memory: 26kB
         Buffers: shared hit=64
         ->  CTE Scan on top_products  (cost=0.00..99.00 rows=4950 width=238) (actual time=0.013..2.273 rows=4948.00 loops=1)
               Storage: Memory  Maximum Storage: 296kB
               Buffers: shared hit=64
Planning:
   Buffers: shared hit=36
Planning Time: 0.193 ms
Execution Time: 3.057 ms
(18 rows)

The line to focus on is Storage: Memory Maximum Storage: 296kB on the CTE Scan node. This tells you the entire materialized result set, all 4,948 rows, was held in RAM with a peak footprint of 296kB. No temporary files were created. No disk I/O was needed beyond reading the source table from shared buffers.

Example 2: Disk storage (low work_mem)

Now let’s see what happens when we artificially constrain work_mem so the materialized CTE can no longer fit in memory:

-- Set low work_mem to force disk storage
SET work_mem = '64kB';

EXPLAIN (ANALYZE, BUFFERS)
WITH top_products AS MATERIALIZED (
    SELECT product_id, product_name, price
    FROM products
    WHERE price > 500
)
SELECT * FROM top_products
ORDER BY price DESC
LIMIT 10;

Output:

 Limit  (cost=394.97..394.99 rows=10 width=238) (actual time=3.184..3.187 rows=10.00 loops=1)
   Buffers: shared hit=64, temp written=20
   I/O Timings: temp write=0.127
   CTE top_products
     ->  Seq Scan on products  (cost=0.00..189.00 rows=4950 width=22) (actual time=0.013..1.391 rows=4948.00 loops=1)
           Filter: (price > '500'::numeric)
           Rows Removed by Filter: 5052
           Buffers: shared hit=64
   ->  Sort  (cost=205.97..218.34 rows=4950 width=238) (actual time=3.183..3.184 rows=10.00 loops=1)
         Sort Key: top_products.price DESC
         Sort Method: top-N heapsort  Memory: 26kB
         Buffers: shared hit=64, temp written=20
         I/O Timings: temp write=0.127
         ->  CTE Scan on top_products  (cost=0.00..99.00 rows=4950 width=238) (actual time=0.015..2.411 rows=4948.00 loops=1)
               Storage: Disk  Maximum Storage: 160kB
               Buffers: shared hit=64, temp written=20
               I/O Timings: temp write=0.127
Planning Time: 0.102 ms
Execution Time: 3.268 ms
(19 rows)

Notice the differences. Storage: Disk replaces Storage: Memory, and several new indicators appear. temp written=20 shows temporary file I/O occurred, and I/O Timings: temp write=0.127 quantifies the time spent writing to those temp files. Execution time increased from 3.057 ms to 3.268 ms, modest here because the dataset is small. On larger result sets, the penalty from disk spills compounds significantly.

Why this matters in practice

This storage reporting transforms how you diagnose and tune materialized CTEs. When a query that normally runs in milliseconds suddenly slows down, checking whether a previously in-memory CTE has flipped to Storage: Disk gives you an immediate explanation. The dataset grew past the memory threshold. The Maximum Storage value tells you exactly how much space the materialization needed, so you can make a targeted work_mem adjustment rather than guessing.

That said, the right response isn’t always to increase work_mem. Remember that work_mem is allocated per-operation and per-query within each session, so setting it too high on a system with many concurrent connections can lead to memory pressure. A better approach is to use the Maximum Storage value as a guide. If a CTE needs 296kB and your work_mem is 4 MB, you have plenty of room. If it needs 3.9 MB on a 4 MB setting, a modest bump to 8 MB keeps it in memory without being reckless. For queries where materialization overhead is the bottleneck, you might also consider whether the MATERIALIZED hint is truly necessary. Sometimes, letting the planner inline the CTE produces a better plan altogether.

The combination of Storage, Maximum Storage, temp written, and I/O Timings gives you a full picture of materialization cost that was unavailable before PostgreSQL 18. This turns what used to be an educated guess into a data-driven tuning decision.

Index searches metric in EXPLAIN ANALYZE

PostgreSQL 18 introduces a new metric in EXPLAIN ANALYZE output: Index Searches. This metric shows how many times the database traversed the index tree during query execution, providing valuable insights into index usage patterns.

What is the Index Searches counter?

The Index Searches counter appears in Index Scan, Bitmap Index Scan, and Index-Only Scan nodes. It reports the total number of index tree traversals across all node executions and loops.

Example: IN clause with multiple values

Consider a query searching for products with specific prices:

EXPLAIN (ANALYZE, BUFFERS)
SELECT * FROM products
WHERE price IN (100.50, 500.75, 750.25, 999.99);

Output:

Bitmap Heap Scan on products  (cost=17.17..30.51 rows=4 width=29)
                              (actual time=0.046..0.047 rows=1.00 loops=1)
  Recheck Cond: (price = ANY ('{100.50,500.75,750.25,999.99}'::numeric[]))
  Heap Blocks: exact=1
  Buffers: shared hit=4 read=5
  ->  Bitmap Index Scan on idx_products_price  (cost=0.00..17.17 rows=4 width=0)
                                                (actual time=0.039..0.039 rows=1.00 loops=1)
        Index Cond: (price = ANY ('{100.50,500.75,750.25,999.99}'::numeric[]))
        Index Searches: 4
        Buffers: shared hit=3 read=5
Planning Time: 1.907 ms
Execution Time: 0.064 ms

Notice Index Searches: 4 in the output. The query searches for 4 different price values, and PostgreSQL performed 4 separate index tree traversals, one for each value in the IN clause. Each search starts from the index root and navigates to the appropriate leaf page.

Why this matters

Before PostgreSQL 18, you saw that an index was used but not how many times the executor descended into the tree. Now you can. A high search count on an IN clause tells you each value triggered a separate root-to-leaf traversal. If those values happen to be contiguous, rewriting the predicate as a BETWEEN range might cut the number of searches down. In nested loop joins, the metric shows the total searches across all loop iterations. You can quickly tell whether the inner index is being hammered and whether a different join strategy would be cheaper.

The three enhancements covered here (automatic buffer statistics, materialized CTE storage and memory reporting, and index search counts) are only a subset of the EXPLAIN improvements shipped in PostgreSQL 18. For the full list of EXPLAIN-related enhancements, see the PostgreSQL 18 release notes.

Automatic removal of unnecessary self-joins

PostgreSQL 18 introduces Self-Join Elimination (SJE), a query planner optimization that automatically detects and removes inner joins of a table to itself when the join is probably redundant. This applies to SELECT queries, where the planner can safely determine that one side of the self-join contributes no additional rows or columns to the result and eliminate it entirely.

When a self-join is unnecessary

A self-join can be eliminated when two conditions are met:

The join condition is on a unique or primary key column. This means at most one matching row per outer row, so the join can’t produce duplicates or filter rows.
No columns from the inner table alias are referenced in the SELECT list or WHERE clause that couldn’t be satisfied by the outer alias alone.

When these conditions hold, the planner rewrites the query to scan the table once, removing the join entirely.

Example: ORM-generated self-join

Self-joins on primary keys are commonly generated by ORMs and query builders that compose queries from reusable fragments without awareness of redundancy. Consider an employees table with a primary key on id:

CREATE TABLE employees (
    id     serial PRIMARY KEY,
    name   text,
    dept   text,
    salary numeric
);

A query joining the table to itself on the primary key:

EXPLAIN (ANALYZE, COSTS OFF)
SELECT e1.id, e1.name, e1.salary
FROM employees e1
JOIN employees e2 ON e1.id = e2.id;

PostgreSQL 17 and earlier (executes a real join):

Hash Join
  Hash Cond: (e1.id = e2.id)
  ->  Seq Scan on employees e1
  ->  Hash
        ->  Seq Scan on employees e2

PostgreSQL 18 (join is eliminated, single scan):

Seq Scan on employees e2  (actual time=0.017..8.749 rows=100000 loops=1)
  Buffers: shared hit=770
Planning Time: 0.318 ms
Execution Time: 12.768 ms

The planner recognizes that joining employees to itself on the primary key id is redundant. Every row matches exactly one row in the other alias, and that row is the same physical row. The join is rewritten to a simple scan.

With a WHERE clause

SJE also works when additional filters are present:

EXPLAIN (ANALYZE, COSTS OFF)
SELECT e1.id, e1.name, e1.dept, e1.salary
FROM employees e1
JOIN employees e2 ON e1.id = e2.id
WHERE e1.dept = 'Engineering';

Seq Scan on employees e2  (actual time=0.031..6.985 rows=33333 loops=1)
  Filter: (dept = 'Engineering'::text)
  Rows Removed by Filter: 66667
  Buffers: shared hit=770
Planning Time: 0.357 ms
Execution Time: 8.486 ms

Again, a single scan. The self-join is eliminated and the filter is applied directly.

When self-join elimination does not apply

SJE requires a unique or primary key join condition. A self-join on a non-unique column, such as finding colleagues in the same department, can’t be eliminated because multiple rows can match:

EXPLAIN (ANALYZE, COSTS OFF)
SELECT e1.id, e1.name, e2.name AS colleague
FROM employees e1
JOIN employees e2 ON e1.dept = e2.dept
WHERE e1.id = 1
LIMIT 5;

Nested Loop
  Join Filter: (e1.dept = e2.dept)
  ->  Index Scan using employees_pkey on employees e1
        Index Cond: (id = 1)
  ->  Seq Scan on employees e2

Here the join is preserved. dept is not unique, so one employee can match many colleagues, and the join genuinely changes the result set.

Scope: SELECT only

The PostgreSQL 18 implementation applies SJE only to SELECT queries. Self-joins in UPDATE, DELETE, and MERGE statements aren’t eliminated. This is intentional. DML queries involve EvalPlanQual (EPQ) re-evaluation for concurrent row updates, where the two table aliases might need to reference different row versions (snapshot compared to latest). Eliminating the join in that context would produce incorrect results. The PostgreSQL community has scoped the initial implementation conservatively, with DML support planned for a future release.

Fine-tuned autovacuum control with autovacuum_vacuum_max_threshold

PostgreSQL 18 introduces autovacuum_vacuum_max_threshold, a new parameter that caps the number of dead tuples that can accumulate before autovacuum triggers a VACUUM, regardless of table size. This addresses a long-standing problem with very large tables where the scale-factor-based formula can delay autovacuum for far too long.

The problem with large tables

Autovacuum decides when to vacuum a table by comparing the number of updated or deleted tuples against a threshold calculated as:

vacuum threshold = autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor × table_rows

With the defaults (threshold = 50, scale_factor = 0.2), this works well for small and medium tables. But for very large tables, the formula produces thresholds that allow enormous amounts of bloat to accumulate before autovacuum fires:

Table size	Dead tuples needed to trigger autovacuum (before PG18)
1 million rows	200,050
100 million rows	20,000,050
500 million rows	100,000,050
1 billion rows	200,000,050

On a 1 billion row table, autovacuum waits for 200 million dead tuples before running. That’s significant table bloat, degraded query performance, and increased risk of transaction ID wraparound pressure.

The workaround before PG18 was to manually set autovacuum_vacuum_scale_factor = 0 and autovacuum_vacuum_threshold = N per table, but this required identifying every large table and maintaining those settings as tables grew.

How autovacuum_vacuum_max_threshold works

PostgreSQL 18 changes the formula to:

vacuum threshold = MIN(autovacuum_vacuum_max_threshold,
                       autovacuum_vacuum_threshold + autovacuum_vacuum_scale_factor × table_rows)

The new parameter acts as a ceiling. When the scale-factor calculation exceeds the max threshold, autovacuum fires at the cap instead. The default is 100,000,000 tuples, meaning no table will accumulate more than 100 million dead tuples before autovacuum runs, regardless of how large the table is.

SHOW autovacuum_vacuum_max_threshold;

 autovacuum_vacuum_max_threshold
---------------------------------
 100000000
(1 row)

For the 1 billion row table example, the effective threshold is now capped at 100 million instead of 200 million.

Configuring in Amazon RDS for PostgreSQL and Aurora PostgreSQL

autovacuum_vacuum_max_threshold is a dynamic parameter (sighup-level). It can be changed in your parameter group and applied without a reboot.

Amazon RDS for PostgreSQL: Modify the DB instance parameter group.

Aurora PostgreSQL: Modify the DB cluster parameter group (applies to all instances in the cluster).

In both cases, the parameter change is applied without a reboot. You can verify using SHOW autovacuum_vacuum_max_threshold; from any connected session.

Lower the global cap for all tables:

-- In parameter group: set to 10 million dead tuples max
autovacuum_vacuum_max_threshold = 10000000

Override per table:

-- Cap a specific high-churn table at 500,000 dead tuples
ALTER TABLE orders SET (autovacuum_vacuum_max_threshold = 500000);

-- Disable the cap for a specific table (use scale factor only)
ALTER TABLE archive_data SET (autovacuum_vacuum_max_threshold = -1);

Disable the global cap entirely:

-- In parameter group: revert to pre-PG18 behavior
autovacuum_vacuum_max_threshold = -1

Choosing the right value

The default of 100 million is a reasonable safety net for most workloads. For databases with large tables with high churn across both old and new rows, lowering it to 10 million can help autovacuum stay ahead of dead tuple accumulation while balancing for freeze activity. For append-only or read-heavy tables where dead tuples accumulate slowly, the default or even -1 (disabled) may be appropriate.

You can check the current effective threshold for any table by querying pg_stat_user_tables alongside pg_class:

SELECT
    relname,
    n_dead_tup,
    n_live_tup,
    LEAST(
        current_setting('autovacuum_vacuum_max_threshold')::bigint,
        current_setting('autovacuum_vacuum_threshold')::bigint +
        (current_setting('autovacuum_vacuum_scale_factor')::float * reltuples)::bigint
    ) AS effective_vacuum_threshold
FROM pg_stat_user_tables
JOIN pg_class ON relname = pg_class.relname
WHERE schemaname = 'public'
ORDER BY n_dead_tup DESC
LIMIT 10;

vacuum_truncate server variable

PostgreSQL 18 introduces vacuum_truncate as a server-wide GUC (Grand Unified Configuration) parameter. This gives database administrators a single control point to disable VACUUM’s file truncation behavior across all tables, without modifying individual table storage options.

Understanding VACUUM truncation

When VACUUM runs, it goes through multiple phases, including scanning the heap, vacuuming indexes and the heap, and eventually truncating the heap. The truncation phase removes empty pages from the end of a table data file and returns that disk space to the operating system. While this sounds beneficial, it comes with an important cost: truncation requires an AccessExclusiveLock on the table, the same lock level used by ALTER TABLE, DROP TABLE, and similar DDL operations.

During the first three phases, VACUUM holds only a ShareUpdateExclusiveLock, which allows concurrent reads and writes. But when it enters the truncation phase, it escalates to AccessExclusiveLock, briefly blocking all other access to the table. On busy OLTP systems with large tables, this lock escalation can cause noticeable query latency spikes.

Impact on read replicas

The truncation lock has a particularly significant effect on Aurora PostgreSQL and RDS for PostgreSQL read replicas. When the primary instance truncates a table, it writes an XLOG_STANDBY_LOCK record to the WAL stream. The replica’s WAL replay process must acquire the same AccessExclusiveLock on the relation before it can apply this record. If a long-running query on the replica is holding a conflicting lock, the replica will cancel that query after a 30-second wait in order to proceed with WAL replay. Users on the read replica see:

ERROR:  canceling statement due to conflict with recovery
DETAIL:  User was holding a relation lock for too long.

This is a common source of unexpected query cancellations on read replicas, particularly when autovacuum is actively truncating large tables on the primary.

Before PostgreSQL 18: per-table control only

Prior to PostgreSQL 18, vacuum_truncate existed only as a storage-level parameter set per table:

-- Disable truncation for a specific table (PostgreSQL 12+)
ALTER TABLE orders SET (vacuum_truncate = off);

This approach required identifying every table where truncation was causing issues and modifying each one individually, a maintenance burden in databases with hundreds of tables.

PostgreSQL 18: server-wide GUC

PostgreSQL 18 promotes vacuum_truncate to a server-wide GUC, so you can control truncation behavior globally through your parameter group. The default value remains on (truncation enabled), preserving backward compatibility.

Check the current setting:

SHOW vacuum_truncate;

 vacuum_truncate
-----------------
 on
(1 row)

Configuring vacuum_truncate in Amazon RDS for PostgreSQL

To disable truncation globally, create a custom parameter group with vacuum_truncate = off and apply it to your instance. This is a dynamic parameter and no reboot is required. You can verify the setting after the reboot:

SHOW vacuum_truncate;

 vacuum_truncate
-----------------
 off
(1 row)

Demonstrating the behavior difference

You can observe the effect of vacuum_truncate directly in the VACUUM verbose output, specifically how many pages were removed, how many remain, and whether a truncation line appears.

With vacuum_truncate = on (default):

VACUUM VERBOSE vacuum_demo;

INFO:  vacuuming "postgres.public.vacuum_demo"
INFO:  table "vacuum_demo": truncated 4425 to 45 pages
INFO:  finished vacuuming "postgres.public.vacuum_demo": index scans: 0
pages: 4380 removed, 45 remain, 4425 scanned (100.00% of total)
tuples: 990000 removed, 10000 remain, 0 are dead but not yet removable

The line truncated 4425 to 45 pages confirms that VACUUM removed 4,380 empty pages and returned the space to the OS. This truncation requires an AccessExclusiveLock.

With vacuum_truncate = off:

VACUUM VERBOSE vacuum_demo;

INFO:  vacuuming "postgres.public.vacuum_demo"
INFO:  finished vacuuming "postgres.public.vacuum_demo": index scans: 0
pages: 0 removed, 4425 remain, 4425 scanned (100.00% of total)
tuples: 990000 removed, 10000 remain, 0 are dead but not yet removable

The truncation line is completely absent. Dead tuples are still removed (990,000 rows cleaned up), but the empty pages remain in the file. No AccessExclusiveLock is acquired, and no WAL truncation record is sent to replicas.

When to use vacuum_truncate = off

Disabling truncation is appropriate when:

Read replicas experience frequent query cancellations because of canceling statement due to conflict with recovery errors caused by VACUUM truncation on the primary.
Tables have high write churn where deleted space will quickly be reused by new inserts. Truncating and then re-extending the file wastes I/O.
Large tables on busy OLTP systems where even a brief AccessExclusiveLock causes visible latency spikes.

Keep truncation enabled (the default) when:

Tables are periodically bulk-deleted and the space genuinely won’t be reused, truncation returns disk space to the OS.
Storage costs matter and you want VACUUM to actively reclaim space.
No read replicas are present, removing the WAL conflict concern.

Per-table override

The server-wide GUC can still be overridden at the table level. If you set vacuum_truncate = off globally but want truncation enabled for a specific table:

ALTER TABLE archive_data SET (vacuum_truncate = on);

Conversely, if the global setting is on but a specific high-traffic table needs truncation disabled:

ALTER TABLE orders SET (vacuum_truncate = off);

This layered control (server default using parameter group, per-table override using storage options) gives you precise control over truncation behavior across your database.

autovacuum_worker_slots

PostgreSQL 18 introduces autovacuum_worker_slots, a new parameter that separates the concept of allocating autovacuum worker slots from using them. This separation allows autovacuum_max_workers to be adjusted at runtime without a server restart, a significant operational improvement for tuning autovacuum under changing workloads.

The problem before PostgreSQL 18

Before PostgreSQL 18, autovacuum_max_workers was the only parameter controlling autovacuum concurrency. Because it directly controlled how many backend slots were reserved at startup, it had a postmaster context, meaning any change required a full server restart. If you needed to temporarily increase autovacuum workers to catch up on table bloat during a high-write period, you had no choice but to restart the instance.

How it works in PostgreSQL 18

PostgreSQL 18 introduces a two-level model:

autovacuum_worker_slots sets the maximum number of backend slots reserved for autovacuum workers at startup. This is a postmaster-level parameter and still requires a restart to change. It acts as a hard ceiling.
autovacuum_max_workers controls how many of those slots are actually used at any given time. It’s now a dynamic parameter (sighup context), meaning it can be changed at runtime by reloading the configuration, and no restart required.

SELECT name, setting, context,
       CASE context
           WHEN 'postmaster' THEN 'Requires restart'
           WHEN 'sighup'     THEN 'Runtime (no restart)'
       END AS change_method
FROM pg_settings
WHERE name IN ('autovacuum_worker_slots', 'autovacuum_max_workers')
ORDER BY name;

          name           | setting |  context   |    change_method
-------------------------+---------+------------+----------------------
 autovacuum_max_workers  | 3       | sighup     | Runtime (no restart)
 autovacuum_worker_slots | 16      | postmaster | Requires restart
(2 rows)

By default, autovacuum_worker_slots is 16 and autovacuum_max_workers is 3. You can increase autovacuum_max_workers up to 16 at any time without restarting the instance.

Configuring in Amazon RDS for PostgreSQL

In RDS and Aurora PostgreSQL, you can update autovacuum_max_workers in your parameter group with Apply immediately, no reboot needed. To increase the ceiling (autovacuum_worker_slots), update the parameter group and reboot the instance.

Example: temporarily increase autovacuum workers during a high-write period

Update autovacuum_max_workers in your parameter group from 3 to 6, apply immediately, and verify:

SHOW autovacuum_max_workers;

 autovacuum_max_workers
------------------------
 6
(1 row)

Revert it back when the workload normalizes.

Important: autovacuum_max_workers can’t exceed autovacuum_worker_slots. If you attempt to set it higher, PostgreSQL caps it at the slots value. To allow higher autovacuum_max_workers, first increase autovacuum_worker_slots (which requires a restart), then adjust autovacuum_max_workers freely within the new ceiling.

Per-table vacuum and analyze timing

PostgreSQL 18 adds four new columns to pg_stat_all_tables (and its variants pg_stat_user_tables, pg_stat_sys_tables) that track the cumulative time spent on maintenance operations per table:

Column	Description
`total_vacuum_time`	Cumulative time spent in manual VACUUM operations (milliseconds)
`total_autovacuum_time`	Cumulative time spent in autovacuum operations (milliseconds)
`total_analyze_time`	Cumulative time spent in manual ANALYZE operations (milliseconds)
`total_autoanalyze_time`	Cumulative time spent in autoanalyze operations (milliseconds)

Before PostgreSQL 18, pg_stat_user_tables showed when the last vacuum or analyze ran (last_vacuum, last_autovacuum) but gave no indication of how long it took. Identifying which tables consumed the most maintenance time required either parsing log files or using pg_stat_progress_vacuum during an active operation. These new columns make that information permanently available and cumulative.

Example

Create a table with 5 million rows, delete half of them, then run VACUUM VERBOSE and ANALYZE VERBOSE:

CREATE TABLE timing_demo (id serial, val text);
INSERT INTO timing_demo SELECT i, md5(i::text) FROM generate_series(1, 5000000) i;
DELETE FROM timing_demo WHERE id % 2 = 0;

VACUUM VERBOSE timing_demo;

INFO:  vacuuming "postgres.public.timing_demo"
INFO:  finished vacuuming "postgres.public.timing_demo": index scans: 0
pages: 0 removed, 41667 remain, 41667 scanned (100.00% of total)
tuples: 2500000 removed, 2500000 remain, 0 are dead but not yet removable
...
system usage: CPU: user: 0.18 s, system: 0.00 s, elapsed: 0.18 s

ANALYZE VERBOSE timing_demo;

INFO:  analyzing "public.timing_demo"
INFO:  "timing_demo": scanned 30000 of 41667 pages, containing 1799980 live rows and 0 dead rows
INFO:  finished analyzing table "postgres.public.timing_demo"
...
system usage: CPU: user: 0.10 s, system: 0.00 s, elapsed: 0.10 s

Now query the new timing columns:

SELECT relname,
       round(total_vacuum_time::numeric, 2)     AS total_vacuum_ms,
       round(total_analyze_time::numeric, 2)    AS total_analyze_ms
FROM pg_stat_user_tables
WHERE relname = 'timing_demo';

   relname   | total_vacuum_ms | total_analyze_ms
-------------+-----------------+------------------
 timing_demo |          189.00 |           107.00
(1 row)

The elapsed: 0.18 s from VACUUM VERBOSE maps directly to 189ms in total_vacuum_time, and elapsed: 0.10 s from ANALYZE VERBOSE maps to 107ms in total_analyze_time. The view accumulates these values across every manual and automatic operation, giving you a running total of maintenance time per table.

Because these columns are cumulative, running VACUUM and ANALYZE again adds to the totals, giving you a running picture of maintenance overhead over the lifetime of the table.

Practical use cases

Identify tables with high maintenance overhead

SELECT relname,
       round(total_vacuum_time::numeric, 2)      AS vacuum_ms,
       round(total_autovacuum_time::numeric, 2)  AS autovacuum_ms,
       round(total_analyze_time::numeric, 2)     AS analyze_ms,
       round(total_autoanalyze_time::numeric, 2) AS autoanalyze_ms
FROM pg_stat_user_tables
ORDER BY (total_vacuum_time + total_autovacuum_time) DESC
LIMIT 10;

Compare manual and automatic maintenance time

Tables where total_autovacuum_time is very high relative to total_vacuum_time might benefit from tuning autovacuum thresholds. Tables where both are high are candidates for investigating bloat, index health, or write patterns.

Clean up resources

If you created database objects while following along with this post, run the following commands to remove them:

-- Remove tables created in this post
DROP TABLE IF EXISTS customer_orders;
DROP TABLE IF EXISTS employees;
DROP TABLE IF EXISTS products;
DROP TABLE IF EXISTS vacuum_demo;
DROP TABLE IF EXISTS timing_demo;

-- Reset any session-level parameter changes
RESET work_mem;

If you created an Aurora PostgreSQL cluster or RDS for PostgreSQL instance specifically for this walkthrough and no longer need it, delete the instance to avoid ongoing charges. For instructions, refer to Deleting a DB instance or Deleting an Aurora DB cluster. Similarly, terminate any Amazon EC2 instance that you created solely for the PostgreSQL client connection.

Conclusion

With skip scan, your existing composite indexes now handle queries they previously could not without maintaining additional indexes. Enhanced EXPLAIN turns CTE tuning from guesswork into data-driven decisions. Self-join elimination automatically fixes a class of ORM-generated inefficiency. And the new autovacuum controls help keep your largest tables healthy without manual per-table configuration. These improvements are available today on both Aurora PostgreSQL-Compatible Edition and Amazon RDS for PostgreSQL.

In Part 2, we cover security enhancements such as MD5 password support, monitoring improvements for parallel workers and replication, the new uuidv7() function for developers, optimizer statistics preservation during upgrades, and logical replication enhancements. To get started, create an Aurora PostgreSQL DB cluster or an RDS for PostgreSQL DB instance on PostgreSQL 18 and try these features today.

About the authors

PostgreSQL 18 on Amazon Aurora and Amazon RDS: Security, monitoring, and developer enhancements

Nazneen Jafri — Wed, 17 Jun 2026 22:29:02 +0000

In Part 1 of this series, we explored the performance enhancements in PostgreSQL 18, including skip scan optimization, enhanced EXPLAIN output, automatic self-join removal, and vacuum/autovacuum improvements. In this second part, we focus on security, monitoring, developer productivity, and logical replication enhancements that improve operational efficiency and the overall developer experience.

Deprecation of MD5 password

PostgreSQL 18 deprecates MD5 password authentication in favor of more secure authentication methods. MD5 authentication still works in PostgreSQL 18, but it will be removed in a future release. We recommend migrating to SCRAM-SHA-256.

If you use Amazon Relational Database Service (Amazon RDS) or Amazon Aurora PostgreSQL-Compatible Edition (Aurora PostgreSQL), the default password_encryption is already set to scram-sha-256. However, if you’re still using MD5, or if existing users still have MD5-hashed passwords from before the default was updated, you need to act. You can modify the password_encryption parameter through your DB parameter group: create a new custom parameter group or modify your existing one, set password_encryption to scram-sha-256, then apply the parameter group to your instance. Because this is a dynamic parameter, the change takes effect immediately without a reboot. Then have affected users reset their passwords so that new SCRAM hashes are generated.

PostgreSQL 18 also introduces a new GUC parameter called md5_password_warnings, which is enabled (on) by default. It’s a dynamic parameter (PostgreSQL context: superuser) and can be enabled or disabled in your parameter group without a reboot. When active, it causes CREATE ROLE and ALTER ROLE to emit a deprecation warning in the engine logs whenever a password is stored using MD5 encryption. Use the warnings as a signal to identify which roles still have MD5 passwords and migrate those roles to SCRAM-SHA-256. After the migration finishes, the warnings naturally disappear because there are no more MD5 passwords to warn about, at which point turning the parameter off has no effect.

Enhanced monitoring of parallel worker activity

PostgreSQL 18 adds two new columns to both pg_stat_database and pg_stat_statements that track parallel worker activity:

Column	Description
`parallel_workers_to_launch`	Number of parallel workers the planner intended to launch
`parallel_workers_launched`	Number of parallel workers actually launched

The gap between these two values is the key signal. When the planner requests workers but the system can’t provide them because max_parallel_workers or max_worker_processes is exhausted, queries silently fall back to fewer workers or serial execution. Before PostgreSQL 18, this was invisible. You could see that parallelism was happening using IPC wait events in Performance Insights, but you had no way to quantify how often workers failed to launch or which queries were affected.

Note: for Aurora PostgreSQL users: In Aurora PostgreSQL 18, the default value of max_parallel_workers_per_gather parameter has been changed to 0, disabling parallel query by default. In Aurora PostgreSQL 17 and earlier, this parameter was set to 2. You can modify it to re-enable parallel queries to support your specific workload requirements. For more information, see Parallel queries for Aurora PostgreSQL or Parallel queries for RDS for PostgreSQL.

Database-level monitoring

The pg_stat_database view accumulates parallel worker activity across queries for the database. A persistent worker_shortage indicates that you’re consistently hitting max_parallel_workers or max_worker_processes limits:

SELECT datname,
       parallel_workers_to_launch,
       parallel_workers_launched,
       parallel_workers_to_launch - parallel_workers_launched AS worker_shortage
FROM pg_stat_database
WHERE datname = current_database();

 datname  | parallel_workers_to_launch | parallel_workers_launched | worker_shortage
----------+----------------------------+---------------------------+-----------------
 postgres |                         66 |                        20 |              46
(1 row)

This database has requested 66 workers total but only launched 20, with 46 workers failing to launch. This is a strong signal to increase max_parallel_workers, or to disable parallelism for some queries to reduce contention.

Query-level monitoring

pg_stat_statements shows the same metrics per query, so you can identify which specific queries are most affected:

SELECT LEFT(query, 60) AS query,
       calls,
       parallel_workers_to_launch,
       parallel_workers_launched,
       ROUND(100.0 * parallel_workers_launched / NULLIF(parallel_workers_to_launch, 0), 2) AS launch_success_rate
FROM pg_stat_statements
WHERE parallel_workers_to_launch > 0
ORDER BY parallel_workers_to_launch DESC
LIMIT 5;

                       query                        | calls | parallel_workers_to_launch | parallel_workers_launched | launch_success_rate
-----------------------------------------------------+-------+----------------------------+---------------------------+---------------------
 SELECT COUNT(*) FROM timing_demo WHERE id % $1 = $2 |     2 |                          4 |                         4 |              100.00
 SELECT COUNT(*) FROM timing_demo                    |     1 |                          2 |                         2 |              100.00
(2 rows)

A launch_success_rate below 100 percent means the system is consistently unable to provide the workers the planner requested. For example, a query running 13 times that requests 2 workers per call (26 total) but gets only 1 per call (13 total) shows a 50 percent success rate. Half the time, a worker couldn’t be launched because of resource contention.

Finding problematic queries

To find queries where workers consistently fail to launch, use the following query. For affected queries, consider either increasing max_parallel_workers to provide more worker capacity, or disabling parallelism for those specific queries with pg_hint_plan or session-level SET max_parallel_workers_per_gather = 0:

SELECT LEFT(query, 80) AS query_preview,
       calls,
       parallel_workers_to_launch,
       parallel_workers_launched,
       parallel_workers_to_launch - parallel_workers_launched AS workers_failed
FROM pg_stat_statements
WHERE parallel_workers_to_launch > 0
  AND parallel_workers_launched < parallel_workers_to_launch
ORDER BY workers_failed DESC
LIMIT 10;

These metrics complement the IPC wait events visible in Amazon RDS Performance Insights (IPC:ExecuteGather, IPC:BgWorkerStartup, IPC:ParallelFinish). Wait events tell you that parallelism is happening. The new columns tell you whether the system is delivering the workers the planner requested.

Replication monitoring improvements in pg_stat_subscription_stats

PostgreSQL 18 adds new columns to pg_stat_subscription_stats that track specific conflict types encountered during logical replication apply. Previously, the view only reported aggregate error counts. Now it breaks down conflicts by category, making it much easier to diagnose replication issues.

New conflict columns

SELECT column_name
FROM information_schema.columns
WHERE table_name = 'pg_stat_subscription_stats'
ORDER BY ordinal_position;

           column_name
---------------------------------
 subid
 subname
 apply_error_count
 sync_error_count
 confl_insert_exists
 confl_update_origin_differs
 confl_update_exists
 confl_update_missing
 confl_delete_origin_differs
 confl_delete_missing
 confl_multiple_unique_conflicts
 stats_reset

The seven new confl_* columns track:

Column	Conflict scenario
`confl_insert_exists`	INSERT violates a unique constraint (row already exists) on subscriber
`confl_update_origin_differs`	UPDATE on a row modified by a different replication origin
`confl_update_exists`	UPDATE conflicts with existing data
`confl_update_missing`	UPDATE target row is missing on the subscriber
`confl_delete_origin_differs`	DELETE on a row modified by a different origin
`confl_delete_missing`	DELETE target row is missing on the subscriber
`confl_multiple_unique_conflicts`	Multiple unique constraint violations in a single operation

Querying conflict statistics

SELECT subname,
       apply_error_count,
       confl_insert_exists,
       confl_update_missing,
       confl_delete_missing
FROM pg_stat_subscription_stats;

Why these matter

Before PostgreSQL 18, when logical replication stopped because of a conflict, the only information available was the total apply_error_count and the error message in the logs. Determining the type of conflict required parsing log files.

The new columns let you identify conflict patterns at a glance:

High confl_insert_exists – The subscriber has rows the publisher is trying to insert, likely caused by data being written directly to the subscriber, or a failed initial sync.
High confl_update_missing or confl_delete_missing – Rows exist on the publisher but not the subscriber. This is common after manual data manipulation on the subscriber.
High confl_update_origin_differs – Rows are being modified by multiple origins. This is relevant in bidirectional replication setups.

Optimizer statistics preservation during pg_upgrade

Every major PostgreSQL upgrade previously ignored optimizer statistics. After upgrading, the query planner had no information about table sizes, column distributions, or index selectivity. Every query ran with default estimates until ANALYZE finished across the database. With pg_upgrade (the fast upgrade path), the impact was particularly severe. The upgrade finishes in minutes, but running ANALYZE on a large database could take hours, during which query performance was degraded.

What changes in PostgreSQL 18

pg_upgrade now automatically transfers optimizer statistics from the old cluster to the new one as part of the upgrade process. No manual intervention is required. Two new catalog functions handle the import: pg_restore_relation_stats() and pg_restore_attribute_stats().

For dump/restore upgrades, pg_dump --with-statistics exports statistics explicitly.

After a pg_upgrade to PostgreSQL 18, the new cluster has the same planner statistics as the old one. Queries run with good plans from the first connection.

What is not preserved

Extended statistics (created with CREATE STATISTICS) are not preserved. Only the object definition is carried over, not the computed data. After upgrading, run:

vacuumdb --all --analyze-only --missing-stats-only

Note: The --all flag applies this command to each database in the cluster.

The new --missing-stats-only flag (also added in PostgreSQL 18) collects only statistics that are absent. In practice, only the extended statistics after an upgrade. This is far faster than a full ANALYZE of the entire database.

Monitoring statistics (pg_stat_* views) are also not preserved, so autovacuum and autoanalyze lose their memory of which tables were last processed. For busy, large tables, consider running a manual VACUUM (ANALYZE) shortly after the upgrade.

For Amazon RDS for PostgreSQL

For RDS major version upgrades (such as PostgreSQL 16 to 18), the upgrade process uses pg_upgrade internally. With PostgreSQL 18 as the target, optimizer statistics are preserved automatically. After the upgrade finishes, verify statistics are present:

SELECT relname, reltuples, relpages
FROM pg_class
WHERE relkind = 'r'
  AND reltuples > 0
ORDER BY reltuples DESC
LIMIT 5;

If extended statistics exist on any tables, identify them and run the targeted analyze:

-- Find tables with extended statistics that need re-analyzing
SELECT n.nspname AS schema,
       c.relname AS table,
       s.stxname
FROM pg_statistic_ext s
JOIN pg_class c ON c.oid = s.stxrelid
JOIN pg_namespace n ON n.oid = c.relnamespace;

Generate timestamp-ordered UUIDs with uuidv7()

PostgreSQL 18 introduces uuidv7(), a native function for generating UUID version 7 as specified in RFC 9562. UUIDv7 combines a Unix timestamp in milliseconds with random bits, producing UUIDs that are both globally unique and naturally sortable by creation time.

The problem with UUIDv4

The existing gen_random_uuid() function (now also aliased as uuidv4()) generates fully random UUIDs. While globally unique, random UUIDs cause a well-known B-tree index problem: every new row inserts at a random position in the index, causing frequent page splits and cache misses. On high-insert workloads with UUID primary keys, this leads to index bloat and degraded write performance.

-- UUIDv4: fully random, no ordering
SELECT gen_random_uuid() AS uuidv4 FROM generate_series(1,3);

                uuidv4
--------------------------------------
 a0b20c40-3cb4-4a0c-8f41-2aba1ba9cfb0
 cace1513-5687-4847-9d92-5fb80fd71e33
 882aa09d-39d2-4acc-886c-5fe91730b490

No pattern, each UUID is scattered randomly.

UUIDv7: Timestamp-ordered

SELECT uuidv7() FROM generate_series(1,5);

                uuidv7
--------------------------------------
 019d7445-07d1-7797-832d-ffcf72dcd3c3
 019d7445-07d1-77b2-a049-bf0a3c7b9fa4
 019d7445-07d1-77b5-ab69-3d75d1416736
 019d7445-07d1-77b7-8e90-d195313b70e6
 019d7445-07d1-77b9-8108-3e70dfdd7e6d

The first 12 characters (019d7445-07d1) encode the millisecond timestamp. They’re identical for UUIDs generated within the same millisecond. The remaining bits provide uniqueness and sub-millisecond monotonicity. New rows always insert near the end of the B-tree index, eliminating random page splits.

Extracting the timestamp

PostgreSQL 18 extends uuid_extract_timestamp() to support UUIDv7:

SELECT uuid_extract_timestamp(uuidv7()) AS extracted, now() AS current_time;

          extracted          |         current_time
-----------------------------+-------------------------------
 2026-04-09 22:02:54.289+00  | 2026-04-09 22:02:54.287568+00

The extracted timestamp matches now() to millisecond precision. You can recover when any UUIDv7 was generated directly from the value itself.

Time-shifted UUIDs

uuidv7() accepts an optional interval to shift the embedded timestamp:

SELECT uuidv7('-1 hour') AS uuid,
       uuid_extract_timestamp(uuidv7('-1 hour')) AS timestamp;

                 uuid                 |          timestamp
--------------------------------------+----------------------------
 019d740e-1951-7b76-befc-691ed8d8e6ae | 2026-04-09 21:02:54.289+00

Using uuidv7() as a primary key

CREATE TABLE events (
    id uuid DEFAULT uuidv7() PRIMARY KEY,
    payload text,
    created_at timestamptz DEFAULT now()
);

Because UUIDv7 values are monotonically increasing, ORDER BY id returns rows in insertion order. The timestamp is embedded in the key itself, removing the need for a separate sort column in many use cases.

Default streaming changed to parallel

PostgreSQL 18 changes the default value of the streaming option for CREATE SUBSCRIPTION from off to parallel. This makes parallel apply the default behavior for logical replication, improving throughput and reducing lag for large transactions.

Background

Logical replication applies changes on the subscriber by replaying transactions from the publisher’s WAL stream. Before PostgreSQL 16, large transactions were buffered entirely on the subscriber before being applied. This caused replication lag to spike during bulk operations and required significant memory or temporary file usage.

PostgreSQL 16 introduced streaming = parallel as an option, allowing large transactions to start streaming to the subscriber before they commit on the publisher, with multiple apply workers processing changes simultaneously. However, it was not the default. DBAs had to explicitly opt in.

PostgreSQL 18 makes parallel the default, reflecting its maturity and the performance benefits it provides for most workloads.

What changes

Before PostgreSQL 18, streaming defaulted to off:

CREATE SUBSCRIPTION my_sub
    CONNECTION 'host=publisher dbname=mydb'
    PUBLICATION my_pub;
-- equivalent to: streaming = off

PostgreSQL 18, streaming defaults to parallel:

CREATE SUBSCRIPTION my_sub
    CONNECTION 'host=publisher dbname=mydb'
    PUBLICATION my_pub;
-- equivalent to: streaming = parallel

With streaming = parallel:

Large transactions begin replicating before they commit on the publisher.
Multiple apply workers process changes simultaneously.
Parallel apply reduces replication lag for bulk operations.
The subscriber uses less memory because it doesn’t buffer full transactions.

Impact on existing subscriptions

Existing subscriptions created before upgrading to PostgreSQL 18 retain their previous streaming setting. Check the current setting:

SELECT subname, substream FROM pg_subscription;

The substream column shows f (off), t (on), or p (parallel).

To update an existing subscription:

ALTER SUBSCRIPTION my_sub SET (streaming = parallel);

Parallel streaming requires logical replication protocol version 4 or higher, supported between PostgreSQL 16+ publishers and subscribers.

Inactive replication slot auto-invalidation

PostgreSQL 18 introduces idle_replication_slot_timeout, a new parameter that automatically invalidates replication slots that have been inactive for longer than a specified duration. This addresses one of the most common operational hazards with logical replication: abandoned slots that silently accumulate WAL and eventually cause disk exhaustion.

The problem with inactive replication slots

Replication slots prevent the WAL sender from discarding WAL segments that haven’t been consumed by the slot’s subscriber. When a subscriber disconnects because of a network failure, application crash, or misconfiguration, the slot remains active on the primary, holding back WAL indefinitely. On busy systems, this can fill the pg_wal directory within hours, causing the primary to stop accepting writes.

Before PostgreSQL 18, the only protection was manual monitoring: periodically querying pg_replication_slots for slots with active = false and a stale confirmed_flush_lsn, then manually dropping them.

The new parameter

idle_replication_slot_timeout specifies how long a slot can remain inactive before it’s automatically invalidated.

The default is 0 (disabled):

SHOW idle_replication_slot_timeout;

 idle_replication_slot_timeout
-------------------------------
 0
(1 row)

When set to a non-zero value, PostgreSQL automatically invalidates any slot that has had no active connection for longer than the specified duration. The slot is marked as invalid rather than dropped. It remains visible in pg_replication_slots with invalidation_reason = 'idle_timeout', so you can identify and clean it up.

Configuring in Amazon RDS for PostgreSQL

Set the parameter in your parameter group (no reboot required):

idle_replication_slot_timeout = 1d

Monitor for invalidated slots:

SELECT slot_name,
       slot_type,
       active,
       inactive_since,
       invalidation_reason
FROM pg_replication_slots
WHERE invalidation_reason IS NOT NULL;

Drop invalidated slots:

SELECT pg_drop_replication_slot(slot_name)
FROM pg_replication_slots
WHERE invalidation_reason = 'idle_timeout';

For Aurora PostgreSQL, where WAL retention directly affects storage costs and cluster availability, we recommend enabling this parameter for any cluster using logical replication.

COPY command enhancements

PostgreSQL 18 adds two improvements to the COPY command that give you more control over error handling during bulk data loads: REJECT_LIMIT and a new LOG_VERBOSITY level called silent.

REJECT_LIMIT for controlled error tolerance:

Before PostgreSQL 18, the ON_ERROR = 'ignore' option for COPY FROM would skip all rows with data type conversion errors, with no upper bound. This was useful for best-effort loading, but dangerous in production: a corrupted file could silently discard thousands of rows with no safeguard. You had to choose between failing on the first error (the default) or tolerating an unlimited number of errors. PostgreSQL 18 introduces REJECT_LIMIT, which sets a maximum number of errors COPY FROM will tolerate before aborting. After the error count exceeds the specified value, the command fails, even with ON_ERROR = 'ignore' set.

-- Load a CSV file, skip rows with type conversion errors, but fail if more than 100 rows are bad
COPY sensor_readings FROM '/data/readings.csv'
WITH (FORMAT csv, HEADER true, ON_ERROR ignore, REJECT_LIMIT 100);

If the file contains 50 malformed rows, the load completes and reports 50 rows skipped. If it contains 101 malformed rows, the command fails at the 101st error. This gives you a safety net: you accept that real-world data has occasional issues, but you catch situations where the source file is fundamentally broken.

LOG_VERBOSITY silent:

When ON_ERROR = 'ignore' is active, PostgreSQL emits a NOTICE for each discarded row (at verbose level) or a summary count at the end (at default level). For large loads where you expect and accept a known error rate, these messages add noise to your logs without providing actionable information. PostgreSQL 18 adds a third LOG_VERBOSITY level: silent. This suppresses all messages about discarded rows, including the final summary count.

COPY sensor_readings FROM '/data/readings.csv'
WITH (FORMAT csv, HEADER true, ON_ERROR ignore, REJECT_LIMIT 100, LOG_VERBOSITY silent);

Combine REJECT_LIMIT with LOG_VERBOSITY silent for production ETL pipelines where you want a hard cap on errors but do not need per-row logging.

OLD and NEW in RETURNING clauses (INSERT, UPDATE, DELETE, and MERGE)

PostgreSQL 18 introduces OLD and NEW aliases in the RETURNING clause of INSERT, UPDATE, DELETE, and MERGE commands. This allows a single DML statement to return both the previous and current state of modified rows, eliminating the need for separate queries or trigger-based workarounds to capture before-and-after values.

Previously, RETURNING had fixed behavior depending on the command type: INSERT returned the newly inserted row, UPDATE returned the row after modification, and DELETE returned the row as it existed before deletion. There was no way to get both old and new values from a single statement.

-- UPDATE: return both old and new prices
UPDATE products SET price = price * 1.10
  WHERE price <= 99.99
  RETURNING name,
            old.price AS old_price,
            new.price AS new_price,
            new.price - old.price AS price_change;

   name    | old_price | new_price | price_change
-----------+-----------+-----------+--------------
 Widget A  |     49.99 |     54.99 |         5.00
 Widget B  |     99.99 |    109.99 |        10.00

The same syntax works with MERGE, where combining OLD/NEW with merge_action() gives you a full change report from a single statement:

MERGE INTO products p USING staging s ON p.product_no = s.product_no
  WHEN MATCHED THEN UPDATE SET price = s.price, name = s.name
  WHEN NOT MATCHED THEN INSERT VALUES (s.product_no, s.name, s.price)
  RETURNING merge_action(),
            p.product_no,
            old.price AS old_price,
            new.price AS new_price;

These features require no configuration changes. On Amazon RDS for PostgreSQL and Aurora PostgreSQL running PostgreSQL 18, OLD and NEW work in RETURNING clauses for all DML commands. Applications that previously relied on triggers or multi-statement transactions for change capture can consolidate into single statements, reducing round trips and simplifying application code.

Conclusion

In this second part of our series, we covered the security, monitoring, developer, and logical replication enhancements in PostgreSQL 18. These include the deprecation of MD5 passwords in favor of SCRAM-SHA-256, improved parallel worker monitoring, and the new uuidv7() function for timestamp-ordered UUIDs. Together, these features strengthen the operational and development experience on Aurora PostgreSQL and Amazon RDS for PostgreSQL.

Together with the performance enhancements covered in Part 1, PostgreSQL 18 delivers improvements across performance, security, observability, and developer productivity.

Get started today: Upgrade your Aurora PostgreSQL cluster or Amazon RDS for PostgreSQL instance to version 18. For more information, see the Aurora PostgreSQL upgrade documentation or the Amazon RDS for PostgreSQL upgrade guide. If you have questions or feedback, reach out through AWS re:Post or leave a comment.

About the authors

Deep dive into Amazon Aurora PostgreSQL lock analysis with CloudWatch Database Insights

Kanhaiya Lal — Wed, 17 Jun 2026 16:06:43 +0000

Lock contention is one of the common performance challenges in relational databases such as Amazon Aurora PostgreSQL and Amazon Relational Database Service (Amazon RDS) for PostgreSQL. While PostgreSQL relies on locks to enforce ACID (Atomicity, Consistency, Isolation, and Durability) guarantees and coordinate concurrent access to shared data, excessive locking can become a bottleneck under high concurrency, increasing latency, reducing transaction throughput, and in severe cases, causing deadlocks.

In this post, we show you how to use Amazon CloudWatch Database Insights for lock analysis in Amazon Aurora PostgreSQL. You learn how to enable the feature, interpret lock tree visualizations, resolve common lock-related issues, and maintain optimal database performance. This lock tree analysis feature also applies to Amazon RDS for PostgreSQL.

Solution overview

Amazon CloudWatch Database Insights is a database observability tool that aggregates logs, metrics, and events from Amazon Aurora databases and other Amazon RDS databases into a centralized dashboard. The tool helps you monitor performance, optimize databases, and troubleshoot issues. For Amazon Aurora PostgreSQL, CloudWatch Database Insights offers an advanced mode feature, lock tree analysis, that provides a hierarchical visualization of real-time and historical lock contention. With this feature, you can identify blocking and waiting sessions, understand lock contention patterns, and trace the root causes of transaction delays.

Key capabilities include:

Lock tree visualization – Displays the relationships between blocking and waiting sessions, with detailed metrics like session IDs, queries, and wait times.
Real-time monitoring – Captures snapshots every 15 seconds, and flags high locking when contention persists for 15 minutes.
Historical analysis – Retains locking data for 15 months in Advanced Mode, for both Aurora PostgreSQL and RDS for PostgreSQL, so you can perform trend analysis and post-incident reviews. Lock tree analysis isn’t available in Standard Mode.

Understanding PostgreSQL lock fundamentals in Amazon Aurora

Before exploring CloudWatch Database Insights, it’s important to establish a solid understanding of how PostgreSQL manages locking, the foundation for diagnosing and resolving concurrency issues.

PostgreSQL lock management

PostgreSQL provides a sophisticated locking framework that operates at multiple levels of detail, including table-level, row-level, and page-level locks. The database engine applies the least restrictive lock necessary for any given operation. This fine-grained approach minimizes contention by locking only the specific resource being accessed, rather than broader structures. These locks are particularly valuable for application-controlled locking scenarios where Multi-Version Concurrency Control (MVCC) alone is insufficient. In most cases, PostgreSQL automatically acquires the appropriate lock mode at the minimal required level of detail to enforce the requested transaction isolation level and maintain ACID compliance. This prevents referenced tables from being dropped or modified in ways that would conflict with an active transaction.

Prerequisites

Amazon CloudWatch Database Insights helps with troubleshooting performance issues for all Amazon Aurora PostgreSQL versions and provides features such as lock analysis, execution plan analysis, and fleet-wide monitoring. To analyze lock trees, you must have CloudWatch Database Insights enabled in Advanced Mode. For more information, see Turning on the Advanced mode of Database Insights for Amazon Aurora.

Lock tree analysis is also supported for Amazon RDS for PostgreSQL. To enable Advanced Mode for RDS, see Turning on the Advanced mode of Database Insights for Amazon RDS. There is a cost associated with enabling Advanced Mode. Review the CloudWatch pricing page for more details.

Accessing lock analysis

After you enable CloudWatch Database Insights Advanced Mode, you can access lock analysis through the Amazon CloudWatch Database Insights console:

Sign in to the AWS Management Console and open the Amazon CloudWatch console.
In the navigation pane, choose Insights, then Database Insights.
Select Database Instance on the left-hand panel.
Choose your Amazon Aurora PostgreSQL DB instance.
In the DB Load Analysis tab, select the Lock analysis tab.

The Lock analysis tab displays lock contention information. Use the Sliced by menu to filter by dimensions such as Blocking object, Blocking session, or Blocking SQL. These dimensions help identify top blockers contributing to database load over time. The following screenshot shows CloudWatch Database Insights.

Understanding lock tree analysis

A lock tree is a hierarchical visualization of locking relationships, illustrating:

Blocking sessions: Sessions holding locks that prevent others from proceeding.
Waiting sessions: Sessions waiting for locks to be released.
Lock modes: Lock mode held or requested.

Viewing the lock tree

Select a snapshot time from the list of available snapshots. Snapshots are captured every 15 seconds, with high locking flagged if contention persists for 15 consecutive minutes.
Expand nodes by choosing the arrow just below the session id to view the hierarchy of blocking and waiting sessions.

Each node provides detailed metrics, customizable through the Settings icon:

Column name	Definition	Default column
`session_id`	Unique session ID, derived from `HEX(pg_stat_activity.backend_start).HEX(pg_locks.pid)`	Yes
`pid`	Process ID of the backend, from `pg_locks.pid`	Yes
`blocked_sessions_count`	Number of sessions blocked by this lock	Yes
`last_query_executed`	Last query executed, from `pg_stat_activity.query`	Yes
`wait_event`	Wait event name if waiting, NULL otherwise, from `pg_stat_activity.wait_event`	Yes
`blocking_time_(In Seconds)`	Time since lock start in seconds, from `pg_locks.waitstart`	Yes
`blocking_mode`	Lock mode held by blocking session, from `pg_locks.mode`	No
`application`	Application name connected, from `pg_stat_activity.application_name`	No

For more column definitions, see the Lock snapshot data documentation. To troubleshoot performance issues caused by locks, you can analyze lock trees for Amazon Aurora PostgreSQL databases with Amazon CloudWatch Database Insights using the preceding metrics.

Real-world scenario examples

In this section, we explore two scenarios that demonstrate how lock tree analysis helps you resolve common lock-related issues in Amazon Aurora PostgreSQL.

Scenario 1: E-commerce order processing — identifying and resolving lock contention

Background: During a Black Friday sales surge, an ecommerce platform running on Aurora PostgreSQL (version 17.5, db.r7g.4xlarge) began experiencing order processing delays. Users reported that order confirmations were hanging, and the support team suspected a database issue.

Problem identification: Using CloudWatch Database Insights, the DBA navigated to the Lock analysis tab and immediately identified the source of the problem. The Database load chart, when sliced by Blocking SQL, revealed that a single transaction was dominating the DB load. This is a clear indicator of lock contention.

Transaction A:

BEGIN;
UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 12345;
UPDATE orders SET status = 'confirmed' WHERE order_id = 67890;

Transaction A executed both statements but hasn’t yet committed. The transaction is open and holding locks on both tables.

Transaction B (concurrently):

UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 12345;

Transaction B attempts to update the same inventory row (product_id 12345) but can’t proceed because Transaction A holds a conflicting lock.

Lock tree analysis: The lock tree visualization shows the blocking relationship clearly:

Pid 28560 (Transaction A) is the root blocker. It’s blocking 1 session. Its wait event is ClientRead, which means the database has finished executing the transaction’s statements but is waiting for the client application to send the next command (such as COMMIT or ROLLBACK). This is the telltale sign that the application failed to properly close its transaction.
Pid 28573 (Transaction B) is the blocked session. It’s waiting on the TransactionId wait event with ExclusiveLock blocking mode. This confirms that it’s waiting for Transaction A’s transaction to either commit or roll back before it can acquire its own lock on the same row.

Important note: The SQL displayed under the column Last query executed might not be the blocked or blocking SQL statement. It represents the last SQL statement executed by the blocked or blocking session. Alternatively, you can run the following query to find the exact blocking or blocked SQL statement:

-- Analyze lock conflicts
SELECT
    blocked_locks.pid AS blocked_pid,
    blocked_activity.usename AS blocked_user,
    blocking_locks.pid AS blocking_pid,
    blocking_activity.usename AS blocking_user,
    blocked_activity.query AS blocked_statement,
    blocking_activity.query AS blocking_statement
FROM pg_catalog.pg_locks blocked_locks
JOIN pg_catalog.pg_stat_activity blocked_activity
    ON blocked_activity.pid = blocked_locks.pid
JOIN pg_catalog.pg_locks blocking_locks
    ON blocking_locks.locktype = blocked_locks.locktype
    AND blocking_locks.database IS NOT DISTINCT FROM blocked_locks.database
    AND blocking_locks.relation IS NOT DISTINCT FROM blocked_locks.relation
JOIN pg_catalog.pg_stat_activity blocking_activity
    ON blocking_activity.pid = blocking_locks.pid
WHERE NOT blocked_locks.granted;

blocked_pid | blocked_user | blocking_pid | blocking_user | blocked_statement 							     | blocking_statement
------------+--------------+--------------+---------------+------------------------------------------------------------------------+-------------------------------------
      10468 | postgres     |       10480  | postgres      | UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 12345; | UPDATE orders SET status = 'confirmed' WHERE order_id = 67890;
      10468 | postgres     |       10468  | postgres      | UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 12345; | UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 12345;
      10468 | postgres     |       10468  | postgres      | UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 12345; | UPDATE inventory SET quantity = quantity - 1 WHERE product_id = 12345;
(3 rows)

Resolution strategy:

In this case, terminating the blocking session (Pid 28560) is the appropriate action using pg_terminate_backend():

SELECT pg_terminate_backend(28560);

Note: Before you terminate transactions, evaluate the potential effect that each transaction has on the state of your database and your application.

pg_cancel_backend – Use pg_cancel_backend(pid) to send a SIGINT signal that cancels the current long-running query while keeping the database connection active. The backend can continue processing other queries after gracefully terminating the current operation.
pg_terminate_backend – Use pg_terminate_backend(pid) to send a SIGTERM signal that forcefully terminates the connection, ending the query and rolling back any open transactions or held locks associated with the process.

Also, consider the following:

Lock ordering: Enforce consistent lock acquisition. For example, always process the inventory table before the orders table.
Catch and retry logic: Implement exponential backoff retries in the application.
Transaction scope reduction: Preprocess data outside transactions to minimize lock duration.

Scenario 2: Idle-in-transaction

Database concurrency and performance can degrade when sessions remain idle in a transaction state. This is typically caused by missing COMMIT statements or improper exception handling (such as a missing ROLLBACK). These sessions continue to hold locks even when no active work is being performed, blocking other sessions that need to modify the same data.

Background: A telecommunications application’s billing module acquires a ROW EXCLUSIVE lock while updating call costs for call detail record (cdr_id 9001) in the call_detail_records table. Because the transaction is missing a COMMIT or ROLLBACK statement, it remains open and continues to hold the lock. When the Fraud Detection System subsequently attempts to update the same call detail record for scoring, it is blocked, stalling processing for the call_detail_records table.

Problem analysis: The billing module calculates and applies the call cost for cdr_id 9001, completing its intended work. However, because of a bug in the application, neither COMMIT; nor ROLLBACK; is run after the update. As a result, the transaction remains open and the session enters an idle in transaction state, continuing to hold the ROW EXCLUSIVE lock on the record.

BEGIN;
UPDATE call_detail_records SET rate_status = 'RATED', billing_cost = 0.50 WHERE cdr_id = 9001;
-- (Do nothing, leave the session open indefinitely)

Concurrently, the Fraud Detection System attempts to update the same call detail record (cdr_id 9001) as a high-risk record. This session hangs indefinitely (and never returns) because the previous session holds a conflicting lock.

UPDATE call_detail_records SET fraud_score = 99.50 WHERE cdr_id = 9001;

Lock dependency visualization: The lock tree shows PostgreSQL process id (Pid 13835, the Fraud Detection System) as the blocked session. It’s blocked by the PostgreSQL process id (Pid 14193, the billing module session) with a wait on the TransactionId wait event. This indicates that Pid 13835 is waiting for Pid 14193 to release the transaction lock before it can proceed.

Pid 14193, the main blocking session, is in a ClientRead wait event. It has completed its work (calculating and applying the call cost) and is now waiting for the client to send the next command. This is the expected state for an idle in transaction session: the database has finished running the update, but the transaction remains open because the application hasn’t issued a COMMIT; or ROLLBACK.

Resolution strategy:

In this case, canceling the blocking process (Pid 14193) is the option using pg_terminate_backend().

Note: Before you terminate transactions, evaluate the potential effect that each transaction has on the state of your database and your application.

SELECT pg_terminate_backend(14193);

Best practices to prevent lock contention and idle-in-transaction issues

In this section, we cover key practices across application design, database configuration, and monitoring that help you minimize lock contention and avoid idle-in-transaction issues in your PostgreSQL environments.

1. Application design and code practices

Always use explicit transaction boundaries (BEGIN / COMMIT / ROLLBACK). Every code path, including exception handlers, must close the transaction.
Keep transactions short to minimize the duration locks are held.
Implement robust error handling with try-catch blocks, placing COMMIT or ROLLBACK in a finally block.
Avoid long-running SELECT...FOR UPDATE operations that hold row-level locks for extended periods.
Use row-level locking instead of table-level locking wherever possible to reduce contention.

2. Database configuration

Set idle_in_transaction_session_timeout (recommended: 30 seconds–5 minuted) to automatically terminate sessions that remain idle within an open transaction.
Configure lock_timeout for DDL operations to prevent long waits on schema changes.
Set statement_timeout to enforce query execution limits and prevent runaway queries.
Access shared resources in a consistent order across transactions to remove circular wait conditions that cause deadlocks.

3. Monitoring and observability

Use pg_stat_activity to identify sessions in an idle-in-transaction state:

SELECT now(), pid, usename, datname, client_addr, wait_event, state, now() - state_change AS state_time, query
FROM pg_stat_activity
WHERE state = 'idle in transaction'
ORDER BY state_change DESC;

Use pg_locks to investigate blocking and waiting sessions.
Implement CloudWatch alarms for long-running transactions and lock wait events.
Monitor the Lock:transactionid wait event in CloudWatch Database Insights to detect sessions blocked by open transactions.

Conclusion

In this post, we showed you how to use Amazon CloudWatch Database Insights to diagnose locking issues in Aurora PostgreSQL. By combining lock tree analysis with sound database design and proactive monitoring, you can improve the performance and reliability of your databases. We encourage you to explore these capabilities in your environments and share your experiences with the AWS community.

For more information, see the CloudWatch Database Insights documentation and the AWS Database Blog.

About the authors

Converting an RDS for SQL Server instance from license included to Bring Your Own Media (BYOM)

Mesgana Gormley — Tue, 16 Jun 2026 20:52:01 +0000

Amazon Relational Database Service (Amazon RDS) for SQL Server recently launched Bring Your Own Media (BYOM), so you can use your existing SQL Server licenses with fully managed RDS instances (see Unlock license mobility with Bring Your Own Media on fully managed Amazon RDS for SQL Server). This is particularly valuable if you have existing Microsoft licensing agreements and want to optimize your cloud spending by using those investments on AWS.

If you’re already running RDS for SQL Server with the license-included (LI) model, you can now convert those instances to BYOM in place, no database migration required. With this feature, you can transition your RDS for SQL Server instances from licensing provided by AWS to using your own SQL Server media while retaining the benefits of fully managed RDS infrastructure. To use BYOM, SQL Server Enterprise or Standard Edition with license mobility is required. For more information about license mobility, refer to License Mobility.

To simplify the setup process, we have also streamlined the BYOM engine version experience. You upload your SQL Server installation media to Amazon Simple Storage Service (Amazon S3) and register it as a BYOM engine version. Amazon RDS automatically creates new BYOM engine versions when minor version upgrades become available, so you can stay up to date. When you choose to upgrade, the new version is ready for you without additional manual steps.

In this post, we walk you through the end-to-end conversion process: preparing your installation media, creating a BYOM engine version, and performing the in-place license model change.

Solution overview

Follow these steps to convert your existing license-included RDS for SQL Server instance to BYOM:

Prepare and upload your SQL Server installation files to Amazon S3.
Create a BYOM engine version that Amazon RDS uses to build your specific database engine configuration.
Modify your existing license-included instance to use the BYOM licensing model with the BYOM engine version.

Prerequisites

Before you begin, verify that you meet the following prerequisites:

An AWS Identity and Access Management (IAM) principal (role or user) with AmazonRDSFullAccess (AWS managed policy for RDS operations) and s3:GetObject, s3:CreateBucket, and s3:PutObject permissions to create, upload, and access SQL Server installation files in Amazon S3.
An S3 bucket to store installation files. All installation files must be stored in the same S3 bucket and in the same AWS Region where your RDS instance resides.
An existing RDS for SQL Server instance with the license-included model that you want to convert.
SQL Server installation media (ISO file) that matches the engine version of your existing instance.

Important considerations before you begin

Converting from BYOM back to license-included isn’t supported. Plan your conversion carefully because this change isn’t reversible.
BYOM engine versions can’t be shared across Regions or AWS accounts. However, the installation media can be shared. You must create a BYOM engine version in each Region and AWS account where you want to use BYOM.
BYOM is supported on Enterprise and Standard editions of SQL Server 2019 and 2022.
Option groups that include SQL Server Reporting Service (SSRS) or SQL Server Analysis Service (SSAS) aren’t supported with BYOM.
If your instance is configured for Multi-AZ, the conversion applies to both the primary and standby instances.
During the conversion, the instance enters a modifying state. For Multi-AZ instances, a rolling in-place conversion is performed where the standby is modified first, then a failover occurs (typically 30–60 seconds of failover time), and the old primary is updated. For Single-AZ instances, the instance shuts down during the conversion and restarts when the conversion is complete. Schedule the conversion during a maintenance window for production workloads.
If your instance has read replicas, remove them before converting, then recreate them after. The new replicas will automatically inherit the BYOM licensing model.
After conversion, you are no longer charged for the SQL Server license through AWS. You’re responsible for confirming compliance with your own Microsoft licensing agreements.
To create a cross-Region read replica, the BYOM engine version must be created in the target Region first.
If you want to upgrade the minor version as part of your license-included to BYOM conversion, you must first create a BYOM engine version for the target higher minor version.

Step 1: Prepare and upload installation files

To get started, you need the SQL Server installation media file that matches the engine version running on your existing RDS instance. You can obtain this from your Microsoft Volume Licensing Service Center (VLSC) or Visual Studio subscription.

Identify your current engine version

First, check the engine version of your existing license-included instance:

aws rds describe-db-instances \
  --db-instance-identifier my-li-instance \
  --query "DBInstances[0].{EngineVersion:EngineVersion,LicenseModel:LicenseModel}" \
  --region <your-region> \
  --output table

Sample output:

Create an S3 bucket and upload your installation files

Create an S3 bucket in the same Region as your RDS instance:
```
aws s3 mb s3://amzn-s3-demo-bucket --region <your-region>
```

Upload your SQL Server ISO file (replace the following media file name and S3 bucket name according to your environment):

aws s3 cp \
  SW_DVD9_NTRL_SQL_Svr_Ent_Core_2022_64Bit_English_OEM_VL_X23-28404.ISO \
  s3://amzn-s3-demo-bucket/sqlserver-2022-ee/ \
  --region <your-region>

If you manage multiple AWS accounts, you can store the SQL Server installation media in a central S3 bucket and allow access to all accounts within your AWS Organization. This avoids uploading the same ISO file to each account individually.

Add the following bucket policy to your central S3 bucket to allow access from all accounts in your organization: Refer Adding a bucket policy by using the Amazon S3 Console.

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "CrossAccountAccess",
      "Effect": "Allow",
      "Principal": {
        "AWS": "arn:aws:iam::TARGET-ACCOUNT-ID:role/YOUR-IAM-ROLE"
      },
      "Action": [
        "s3:GetObject",
        "s3:ListBucket"
      ],
      "Resource": [
        "arn:aws:s3:::YOUR-BUCKET-NAME",
        "arn:aws:s3:::YOUR-BUCKET-NAME/*"
      ],
      "Condition": {
        "StringEquals": {
          "aws:PrincipalOrgID": "o-xxxxxxxxxxxx"
        }
      }
    }
  ]
}

Replace the placeholders with your actual values: o-xxxxxxxxxxxx with your AWS Organizations ID, TARGET-ACCOUNT-ID with the AWS account ID that needs access, YOUR-IAM-ROLE with the IAM role used to access the installation media, and YOUR-BUCKET-NAME with your S3 bucket name.

Each account still needs to create its own BYOM engine version (since BYOM engine versions can’t be shared across accounts) but can reference the installation media from the central bucket.

Note: The S3 bucket must be in the same Region as the RDS instance where you’re creating the BYOM engine version.

Step 2: Create a BYOM engine version

Amazon RDS uses BYOM engine version to validate and register your SQL Server installation files packaged into a reusable template. The BYOM engine version must use the same engine version as your existing instance.

Create the BYOM engine version using the AWS CLI

Execute the following command to create the BYOM engine version using the uploaded media file:

aws rds create-custom-db-engine-version \
  --engine sqlserver-ee \
  --engine-version 16.00.4225.2.v1 \
  --database-installation-files-s3-bucket amzn-s3-demo-bucket \
  --database-installation-files SW_DVD9_NTRL_SQL_Svr_Ent_Core_2022_64Bit_English_OEM_VL_X23-28404.ISO \
  --region <your-region>

You can also create the BYOM engine version using the RDS Console, as shown in the following screenshot.

Monitor BYOM engine version creation status

BYOM engine version creation typically takes a few minutes. Monitor the status until it shows available:

aws rds describe-db-engine-versions \
  --engine sqlserver-ee \
  --engine-version 16.00.4225.2.v1 \
  --include-all \
  --region <your-region> \
  --query "DBEngineVersions[?DBEngineVersionArn!=null].{Version:EngineVersion,Status:Status}" \
  --output table

Sample output:

Note: You must wait until the status changes from Validating/Creating to Available before proceeding to Step 3.

Step 3: Convert the instance from license-included to BYOM

After your BYOM engine version reaches available status, you can modify your existing license-included instance to use the BYOM licensing model. This operation changes the licensing model in place without requiring you to recreate the instance or migrate data.

If your instance is a Multi-AZ instance, a rolling in-place conversion is performed to minimize downtime. For a Single-AZ instance, downtime is required. We recommend performing this task during your maintenance window.

Note: Converting from BYOM back to license-included isn’t supported. Plan your conversion carefully.

Use the AWS CLI to modify the instance

aws rds modify-db-instance \
  --db-instance-identifier my-li-instance \
  --engine-version 16.00.4225.2.v1 \
  --license-model bring-your-own-media \
  --apply-immediately \
  --region <your-region>

You can also convert the license model by modifying the instance using the RDS Console, as shown in the following screenshots.

Note: If your selected BYOM engine version has a higher minor version than the instance’s current version, RDS performs the minor version upgrade along with the license model change in a single operation.

Verify the conversion

After the modification is complete, verify that the license model has changed:

aws rds describe-db-instances \
  --db-instance-identifier my-li-instance \
  --query "DBInstances[0].{ID:DBInstanceIdentifier,Status:DBInstanceStatus,LicenseModel:LicenseModel,EngineVersion:EngineVersion}" \
  --output table \
  --region <your-region>

List existing DB instances and license model

To list all existing SQL Server instances still using the license-included model:

aws rds describe-db-instances \
  --region <your-region> \
  --query "DBInstances[?starts_with(Engine,'sqlserver') && LicenseModel=='license-included'].{ID:DBInstanceIdentifier,Engine:Engine,License:LicenseModel,MultiAZ:MultiAZ,ReadReplica:ReadReplicaSourceDBInstanceIdentifier}" \
  --output table

Track license consumption

You can track the license consumption of your RDS instances using AWS License Manager. Follow these steps to enable and track your license usage:

Step 1: Onboard AWS License Manager

Before you can track the instances, License Manager needs a service-linked role that gives it permission to read your account’s resource inventory. If you’ve opened the License Manager console before and allowed the one-time setup, this role already exists and you can move straight to Step 2.

If this is your first time, the AWS Management Console creates it automatically the moment you visit the service page, or you can create it explicitly using the AWS CLI:

aws iam create-service-linked-role --aws-service-name license-manager.amazonaws.com

Step 2: Create a self-managed license for your RDS BYOM instances

A self-managed license tells License Manager what to look for and how to count it. You will point it at Amazon RDS resources running SQL Server and choose vCPU as the counting unit. Create a JSON file called rds-byom-license-config.json and the following content. This example tracks a self-managed SQL Server Enterprise Edition license for Amazon RDS. To track SQL Server Standard Edition license, create a similar JSON file and change the value of ProductInformationFilterValue to sqlserver-se.

{
  "Name": "RDS-SQLServer-BYOM-EE",
  "Description": "Self-managed license for RDS SQL Server Enterprise Edition BYOM",
  "LicenseCountingType": "vCPU",
  "LicenseCountHardLimit": false,
  "ProductInformationList": [
    {
      "ResourceType": "RDS",
      "ProductInformationFilterList": [
        {
          "ProductInformationFilterName": "Engine Edition",
          "ProductInformationFilterValue": ["sqlserver-ee"],
          "ProductInformationFilterComparator": "EQUALS"
        }
      ]
    }
  ]
}

Run the following command to create the license configuration using the JSON file.

aws license-manager create-license-configuration \
  --region <your-region> \
  --cli-input-json file://rds-byom-license-config.json

Sample output:

Note the license configuration ARN in the output. You will need it in the next step.

You can also create the license configuration through the AWS License Manager console by navigating to Self-managed licenses and choosing Create self-managed license.

Step 3: Verify that your RDS instances are being tracked

With the license in place, it will take some time to discover and report your BYOM instances. You can either inspect the tracked resources using the command below:

aws license-manager list-usage-for-license-configuration \
  --region <your-region> \
  --license-configuration-arn <license-configuration-arn>

Considerations

License Manager tracks usage per Region. For multi-Region RDS BYOM deployments, configure License Manager independently in each Region.
Cross-Region read replicas are tracked by License Manager in the Region where the replica resides.

Clean up

To avoid ongoing charges, clean up the resources you created when you no longer need them.

Delete the RDS instance:

aws rds delete-db-instance \
  --db-instance-identifier convert-li-to-byom-demo \
  --skip-final-snapshot \
  --region <your-region>

Delete the BYOM engine version (only after all associated instances and snapshots are removed):

aws rds delete-custom-db-engine-version \
  --engine sqlserver-ee \
  --engine-version 16.00.4225.2.v1 \
  --region <your-region>

Delete the S3 bucket:

aws s3 rb s3://amzn-s3-demo-bucket --force

License configuration can’t be deleted while resources are still associated with it. The RDS BYOM instances it tracks must be deleted first.
```
aws license-manager delete-license-configuration \
  --region <your-region> \
  --license-configuration-arn <license-configuration-arn>
```

Conclusion

In this post, we showed you how to convert an existing license-included RDS for SQL Server instance to the Bring Your Own Media (BYOM) licensing model. The process involves uploading your SQL Server installation media to Amazon S3, creating a BYOM engine version, and modifying your existing instance to use the new licensing model.

If you want to optimize your SQL Server licensing costs on AWS, this conversion gives you flexibility. By using existing Microsoft licensing agreements through BYOM, you can reduce your overall cloud spending while maintaining the fully managed benefits of Amazon RDS, including automated backups, patching, monitoring, and high availability.

For more information about BYOM on Amazon RDS for SQL Server, see Amazon RDS for SQL Server documentation.

About the authors

Similarweb’s migration from HBase to Amazon DynamoDB

Idan Lahav — Tue, 16 Jun 2026 18:35:15 +0000

Managing massive data volumes at scale presents significant operational challenges. At Similarweb we faced these challenges with Apache HBase and found a solution in Amazon DynamoDB.

Similarweb is a digital intelligence platform that provides AI-powered insights into website traffic, app usage, and market trends to help businesses benchmark competitors and optimize growth strategies.

We faced growing scalability and operational complexity issues with our existing Apache HBase infrastructure, which prompted us to explore more flexible and efficient alternatives. This post walks you through our journey migrating our data storage from Apache HBase to DynamoDB. We discuss the technical challenges, migration approach, data modeling strategies, cost optimization techniques, and key benefits achieved along the way. By migrating to DynamoDB, we enhanced performance and scalability and reduced maintenance overhead, so our teams can innovate more rather than focus on infrastructure management. We explore the lessons learned, and the impact this migration had on our business operations.

Background

In our web applications, serving large volumes of data to users requires a robust database solution. That solution must store data effectively, retrieve it quickly, and perform operations such as aggregations, grouping, and sorting before it delivers results to the user interface. To achieve this, we employ a dual approach based on the nature of the data and query patterns, designed to keep the web application responsive and interactive.

Our first approach relates to dynamic queries that depend on user input, or involve complex calculations, such as computing data for power sets of site groups. This can result in an extremely large number of combinations and therefore cannot be feasibly pre-calculated. Therefore, we use cloud analytics databases (OLAP systems), such as Firebolt, for these types of queries. We also use Firebolt for complex query patterns including JOINs, GROUP BY, and other SQL-like query operations. These databases excel at on-the-fly processing of large datasets that aren’t suitable for ETL-based pre-aggregation.

Conversely, our second approach, for features with predictable, pre-calculable access patterns, we rely on key-value (KV) stores such as DynamoDB. By pre-computing data through periodic ETL processes, DynamoDB provides fast and straightforward access to aggregated information, which gives users a responsive experience while balancing performance and scalability.

Requirements for retrieving traffic and engagement data

To illustrate how we use pre-calculated data, let’s walk through a classic use case for website traffic and engagement over time that we show on our platform’s UI.

Figure 1: In the Similarweb Traffic and Engagement report, users can select the site to analyze (1), choose a date range (2), and set the geographic scope such as worldwide or a specific country (3). After making these selections, the graph displays key metrics like visits and unique users over the chosen time range.

We store visit statistics for a website, say, example.com, captured with details such as site name, date, country (ISO code), and number of visits.

Site	Country (ISO)	Date	Visits	TimeOnSite	BounceRate	…
example.com	840	2026-01-22	7	15.2	7
example.com	840	2026-01-23	11	11.7	1
example.com	840	2026-01-24	3	20.3	4
example.com	840	2026-01-25	12	13.2	5
example.com	840	2026-01-26	9	19.1	7

Our access patterns might involve retrieving all visit data for example.com over a specific date range across all countries or querying the visit data for a specific country, such as the U.S. (840). In this scenario, by pre-computing and storing visit counts for each combination of date, country, and site during the ETL process, we can achieve rapid response times for these common access patterns while minimizing computational overhead at query time.

These pre-computed metrics look simple, but at Similarweb’s scale the underlying write load and query diversity pushed our existing HBase cluster to its limits. In our case, instead of being continuously written, the data is ingested in large batches at scheduled intervals, such as daily, weekly, or monthly, using Spark jobs.

Massive scale and flexible access

At Similarweb, our traffic and engagement datasets are massive, totaling over 255 terabytes (TB). Unlike transactional applications that trickle data in continuously, our analytics pipeline breathes in huge bursts. We ingest approximately 7 billion records per table in tight, scheduled batches that must finish within a couple of hours to keep our data fresh.

However, writing the data is only half the battle. After it’s stored, this data must be instantly available for diverse and complex read patterns. Our users perform more than simple key lookups. They slice and dice data across dimensions to benchmark competitors and analyze trends.

The following figure shows our high-level data flow before the migration.

Figure 2 (Before Migration): Raw events land in the data lake, Spark ETL computes daily aggregates, and bulk writes store the results in HBase. The .NET backend reads site and country metrics by key (plus date) and serves them to the Traffic and Engagement UI. All arrows represent data flow.

Beyond simple key-value lookups

We needed a database solution capable of handling massive write spikes while serving single-digit millisecond reads for specific filtering patterns. Specifically, we needed to support these four access patterns for any given site:

Site-level aggregation: Retrieving all traffic for a site.
```
SELECT * WHERE Site = {site}
```
Specific country breakdown: Drilling down into a specific country.
```
SELECT * WHERE Site = {site} AND Country = {country}
```

Time-based trending: Fetching history over a specific window.

SELECT * WHERE Site = {site} AND Date BETWEEN {start} AND {end}

Complex combination: Filtering by both country and date range.

SELECT * WHERE Site = {site} AND Country = {country} AND Date BETWEEN {start} AND {end}

To achieve this, our ideal database had to meet four strict criteria:

High write throughput: Ingest billions of records in hours.
Versatile query support: Handle the dimension-based queries listed in the preceding section without scanning full tables.
Scalability with performance: Maintain high read performance even during peak write times.
Cost transparency: Provide a predictable pricing model that allows us to estimate costs before execution, rather than being surprised by the bill.

Why HBase hit a wall

HBase served us well for years, but at Similarweb’s scale it gradually shifted from “a database we use” into “a system we operate”. The core limitation was not raw capability. It was the operational risk and instability that appeared after we combined very large batch writes with strict expectations for always-on reads.

1. RegionServer instability became an on-call driver
A recurring source of incidents was HBase RegionServers getting out of sync with the rest of the cluster, or being down. When a RegionServer drifted or misbehaved, it could affect availability and latency for the regions it hosted. Even when recovery was possible, it was unsettling and time-consuming, and it happened often enough to become a real operational burden.

2. Storage and disk upgrades were a nightmare
Managing and upgrading disks in a large HBase footprint was consistently high friction. Disk changes are not isolated events in a distributed system. They ripple through performance, stability, and operational procedures. What should have been routine infrastructure work frequently turned into multi-step maintenance with real risk, especially when we were also trying to protect ingestion windows and read SLAs.

3. The architecture benefits did not match our access patterns
HBase’s wide-column model shines when you read a subset of columns from large rows. In our case, access patterns often required reading the full set of metrics for a key, so we were paying the operational cost of the system without consistently benefiting from its strongest design advantages.

4. High costs because the database was sized for peak
Batch loads in the terabyte range created short, intense write peaks, while the product still needed fast, predictable reads. Keeping an always-on cluster sized for peak meant paying for capacity that we did not actually use most of the day.

These challenges did not show up as a single catastrophic failure. They showed up as accumulated operational toil: more night pages, more time spent on cluster health, and more risk in routine maintenance. That compounding overhead is what pushed us to look for a fully managed alternative.

How DynamoDB addresses these challenges

DynamoDB allowed us to:

Remove cluster operations from the critical path. With DynamoDB, there are no region servers to provision, no rebalancing, and no manual capacity planning at the infrastructure layer. That directly reduced day-to-day operational work and the number of failure modes tied to database health.
Scale for batch ingestion without permanent overprovisioning. Our write pattern is predictable: we know how many records we will write and the time window to finish. DynamoDB lets us treat capacity as a dial: we scale provisioned write capacity units (WCUs) up immediately before an ingestion run and scale it back down as soon as it completes. This aligned costs with the batch window instead of forcing us to keep a large cluster running 24/7.
Keep read performance stable while writes spike. The access patterns behind the UI are mostly key-based lookups and date-range queries. The partitioned architecture of DynamoDB plus Query operations on partition and sort keys let us serve those reads with consistently low latency, even as tables grow very large and ingestion jobs run in parallel.
Make cost behavior explicit and predictable. Because DynamoDB capacity and request patterns map cleanly to our workload, we can estimate write and read cost from record counts, item sizes, and the expected query shape. That made cost modeling part of the design, instead of an after-the-fact surprise.
Improve resilience and disaster recovery options. DynamoDB gives managed backups and recovery primitives out of the box. For multi-Region needs, DynamoDB Global Tables can replicate data across Regions so reads can be served locally and recovery does not depend on rebuilding large clusters under pressure.

With this foundation in place, we could focus on the parts that are specific to our workload: how we ingest efficiently at high throughput, and how we model keys to satisfy our access patterns at low cost. The following figure shows our data flow after the migration.

Figure 3 (After Migration): Precomputed traffic and engagement metrics are served through two paths: a key-value lane backed by DynamoDB for predictable, low-latency access, and an analytics lane using Firebolt for dynamic, ad hoc queries. The .NET backend routes requests accordingly. Historically, Lane A used HBase. We replaced it with DynamoDB.

Data modeling

To unlock the full performance and cost benefits of DynamoDB, it was critical to design our data model around our actual query patterns. Our goal was to create efficient access paths that minimize both response times and read/write capacity usage.

Let’s revisit our Traffic and Engagement example, where the core access patterns include retrieving visit data for a site across date ranges, optionally filtered by country.

Naive approach: partition key

An initial approach might be to create a flat, unique partition key like:

PK = {site}_{country}_{date}

Primary key	Visits	TimeOnSite	BounceRate
PK
example.com_840_2026-01-21	4	24	12.31

To retrieve a month’s worth of data, such as visits to example.com in the US (country code 840) during January 2026, we’d generate 31 individual keys and issue a BatchGetItem request:

example.com_840_2026-01-01
example.com_840_2026-01-02
...
example.com_840_2026-01-31

While this design works, it becomes inefficient and costly at scale. Within a single BatchGetItem request, each item retrieval is counted as a separate read operation, consuming one read capacity unit per item, even when the payload size is small.

Optimal design: composite keys with partition and sort keys

A more scalable model uses composite primary keys with a partition key and a sort key:

Partition key (PK): {site}_{country}
Sort key (SK): {date}

Primary key		Visits	TimeOnSite	BounceRate
PK	SK
example.com_840	2026-01-21	4	24	12.31

With this setup, you can query all records for a site-country pair over a date range using the efficient Query API in DynamoDB:

Query(PK="example.com_840", SK BETWEEN "2026-01-01" AND "2026-01-31")

This reduces the number of API calls and dramatically cuts down on read costs by using range queries on sort keys.

Cost comparison: Query vs BatchGetItem

Consider a use case where you want to retrieve 500 days of data per site-country combination, each entry being ~200 bytes:

Note: RCUs scale with item size in 4 KB chunks. Eventually consistent reads would halve RCUs.

Approach	Read API	RCU Calculation	Estimated Cost
Flat PK	BatchGetItem	500 items × 1 RCU = 500 RCU	$0.065
Composite PK + SK	Query	100 KB / 4 KB = 25 RCU	$0.00325

Savings: Over 20x cheaper using a composite key design.

Worldwide use case

While our composite key model, PK = {site}_{country}, SK = {date}, efficiently supports common queries filtered by site, country, and date range, it introduces a challenge when we need to query visit data across all countries. For example, retrieving all visits to example.com worldwide, either for all time or within a specific date range:

SELECT * WHERE Site = 'example.com'
SELECT * WHERE Site = 'example.com' AND Date BETWEEN '2026-01-01' AND '2026-01-31'

In our existing schema, the country code is embedded in the partition key, which is essential for evenly distributing write and read load across partitions. However, this also means that you must know the country in order to query the data, something we don’t want for global aggregation use cases.

A straightforward but inefficient solution is to fan out Query API calls across every country-specific partition:

# Fan-out Query: retrieve visits for a site across all countries
results = []
for country in country_codes:  # ~200 ISO 3166 codes
    pk = f"example.com_{country}"
    response = query(
        TableName='TrafficTable',
        KeyConditionExpression="PK = :pk AND SK BETWEEN :start AND :end",
        ExpressionAttributeValues={
            ":pk": pk,
            ":start": "2026-01-01",
            ":end": "2026-01-31"
        }
    )
    results.extend(response['Items'])

# Aggregate country-level records into a single worldwide view
worldwide = {}
for item in results:
    date = item['SK']
    visits = item['visits']
    worldwide[date] = worldwide.get(date, 0) + visits

# worldwide = {"2026-01-01": 148200, "2026-01-02": 136400, ...}

While functionally correct, this approach has several drawbacks:

High cost: Over 200 Query requests per site/date query.
Increased latency: Querying and aggregating results across 200 queries can significantly increase response time.
No BatchQueryItem: Unlike BatchGetItem, there’s no native way to batch multiple Query requests into a single API call.
Operational overhead: Managing 200+ parallel queries can put load on the application and increase risk of throttling.

To support efficient global queries without incurring the cost and complexity of fan-out, we introduced a special synthetic country code (for example, 999) during the ETL process. In practice, we precompute and store aggregated worldwide metrics as part of the ETL pipeline and write them to a dedicated “global” partition. This is done by using a designated partition key for worldwide data, where PK = {site}_999.

Primary key		Visits	TimeOnSite	BounceRate
PK	SK	Visits	TimeOnSite	BounceRate
example.com_840	2026-01-21	4	24	12.31
example.com_999	2026-01-23	33	17	16.5

This enables querying worldwide data with a single Query request:

Query(PK="example.com_999", SK BETWEEN "2026-01-01" AND "2026-01-31")

This way we don’t have any performance overhead while reading, while also incurring less cost because we use one Query request instead of multiple.

Of course, the “999” approach is not without cost. It adds extra ETL complexity because we now need to compute an additional worldwide rollup per site and date, and it also increases storage because we persist an extra item alongside every site-country record. Still, when we look at the system end to end, it’s a clear win. We shift work from read time to write time, eliminate the need for 200+ fan-out queries, reduce application-side orchestration, and get consistently faster worldwide reads. In practice, the added ETL and storage cost is outweighed by the savings in query cost and latency, so the overall solution ends up cheaper and faster.

In the next section, we explore how we further use DynamoDB features to save time and costs during initial data migration and monthly data ingestion.

Writing to DynamoDB

Batch ingestion is the heartbeat of our analytics pipeline. Instead of a continuous stream, we rely on scheduled Apache Airflow Directed Acyclic Graphs (DAGs) that trigger ETL Spark jobs on Databricks at daily, weekly, or monthly cadences, each tuned to the freshness requirements of its downstream feature. Every run pushes billions of items, often several terabytes, to DynamoDB in a short time window so the tables are ready for read traffic as soon as possible.

DynamoDB offers two distinct capacity modes to fit different workload patterns. The on-demand mode is a serverless, pay‑per‑request model that automatically scales to meet traffic demands, with no capacity planning required and you pay only for what you use. In contrast, the provisioned mode requires you to specify the read and write throughput that you want ahead of time, and billing is based on this provisioned capacity whether fully used or not. In our case, because we already know the total number of records to write and the time window for ingestion, we can precisely calculate and set the write capacity units needed. This makes provisioned mode significantly more cost‑effective than on‑demand for our scheduled batch loads.

End-to-end batch workflow

Airflow schedules the load
- DAG parameters include table name and the date ranges for reading from the source.
- Jobs are staggered to avoid overlapping peaks.
Databricks Spark runs the ETL
- Spark partitions align with the DynamoDB partition key to maximize parallelism.
- We use the DynamoDB Connector for Apache Spark, which batches writes and handles retry logic with exponential back-off.
Capacity is scaled up just-in-time
- Before the first write to the target DynamoDB table, our infrastructure script calls UpdateTable to raise the table’s provisioned write capacity, or write capacity units (WCU), to the calculated peak. The script sets this level automatically based on the target duration and the record count.
Data is written in parallel
- We use the DynamoDB Connector for Apache Spark to write data at a throughput closely aligned with our provisioned capacity. Typically, we target about 1.1 times the provisioned write capacity, accepting a controlled level of throttling to achieve optimal utilization without underutilizing our resources.
Capacity is scaled back down automatically
- As soon as the ETL finishes writing all records, we schedule a follow-up task that calls UpdateTable again to lower the provisioned capacity write level.

def run_etl(table_name, records_count, target_duration_hours=1):
    # Step 3 -- Calculate and scale up capacity just-in-time
    desired_wcu = ceil(records_count / (target_duration_hours * 3600))
    desired_wcu = clamp(desired_wcu, MIN_WCU, MAX_WCU)
    wait_until_table_is_active(table_name)
    current_wcu = describe_table(table_name).provisioned_write_capacity
    update_table(table_name, wcu=current_wcu + desired_wcu)  # UpdateTable API
    wait_until_table_is_active(table_name)
    try:
        # Step 4 -- Write data in parallel via the Spark DynamoDB Connector
        spark.write(target_table=table_name,
                    write_throughput_ratio=1.1)  # ~110% of provisioned WCU
    finally:
        # Step 5 -- Scale capacity back down automatically
        update_table(table_name, wcu=current_wcu)  # UpdateTable API
        wait_until_table_is_active(table_name)

Python pseudocode of calculating and scaling up DynamoDB table’s provisioned capacity

Using Import Table from Amazon Simple Storage Service (Amazon S3)

To further reduce the time and cost of writes to DynamoDB, we used the DynamoDB Import from S3 feature to import data directly from Amazon Simple Storage Service (Amazon S3) when migrating from HBase and for some periodic writes. This eliminates the need for per-record writes and avoids consuming write capacity units during ingestion.

Benefits

Up to 90% cost reduction for bulk ingestion.
No Databricks compute usage required for ETL ingestion.
Simplified operations with retry logic hidden and operational overhead removed by the native import.
Faster ingestion compared to our conventional Spark jobs.

Since the Import Table feature charges based on the total volume of data ingested rather than per-item write operations, it offers significant cost savings, especially when migrating large tables with small items, like in our case.

Import from S3 workflow

As part of the pre-ETL automation, data is read from the specified S3 path and converted into the DYNAMODB_JSON format supported by the Import from S3 feature.
We invoke the ImportTable API with the S3 path of the formatted data and table definition.
A monitoring task tracks the progress until the import completes.

When to store data in separate tables by period

DynamoDB Import from S3 feature is a game changer for large backfills, but it comes with an important constraint: it can only import into a new table. That limitation becomes an opportunity if your dataset is naturally time-partitioned and primarily accessed by recent periods.

For monthly datasets, we adopted a deliberate design: create one table per month, import that month’s data using Import from S3, and manage the lifecycle of those tables as the data ages.

Why this works well for monthly data

This approach fits monthly workloads especially well because:

Bulk loads are discrete: each month’s dataset is typically produced as a complete batch, making it a clean unit for import.
Operational simplicity at query time: applications can route queries to the relevant period table(s) instead of mixing cold and hot data in one large table.
Retention becomes trivial: instead of deleting old items, you can drop entire tables when they expire.
Cost optimization is easier: older monthly tables can move to Standard-IA table class as they age, reducing storage cost without changing application logic.

A practical naming convention makes this easy to automate, for example: {table_name}_2026-01.

How we run it end to end

For each month, we generate the dataset, convert it to DynamoDB JSON, one of the supported import formats, and run ImportTable into a new monthly table.
After import, we apply a retention policy that removes tables once they are no longer needed.
We transition aging monthly tables to Table Class Standard-IA to save on storage costs.
When a requested date range spans multiple months, the API fans out queries across the relevant monthly tables and merges results.

When not to use period-based tables

This pattern is powerful, but it is not universal.

For daily datasets, creating a table per day would explode the number of tables and introduce unnecessary operational overhead. In those cases, it’s better to keep a single long-lived table and continue with the standard ingestion path: Spark writes + provisioned capacity scaling.

Rule of thumb

Prefer separate tables by period when the period is coarse (monthly or larger), the data is loaded in bulk, and retention can be enforced at the table level.
Prefer a single table when periods are too fine-grained (daily), or when you need continuous incremental writes into the same physical table.

Conclusion

In this post, we showed how Similarweb migrated from Apache HBase to DynamoDB. This transition was driven by the need to simplify operations, scale efficiently, and reduce infrastructure overhead while continuing to deliver fast, reliable insights at scale.

Our legacy HBase setup, while powerful, struggled to meet the growing demands of our batch ingestion workflows and dynamic query requirements. Challenges like stability, operational maintenance, and scaling limitations prompted us to seek a more modern, serverless alternative.

By adopting DynamoDB, we achieved:

High-performance batch ingestion using ETL jobs with intelligent write provisioning.
Flexible and cost-efficient data modeling that supports diverse query patterns at scale.
Reduced operational burden through the fully managed and serverless DynamoDB architecture.
Low operational burden and fast historical data migration by importing directly from Amazon S3.
Improved system reliability and scalability without the need for manual cluster management.

This migration enhanced the performance and stability of our data infrastructure and freed up our engineering teams to focus on building features and driving innovation. DynamoDB has proven to be a resilient and cost-effective foundation for our analytics pipeline, supporting Similarweb’s mission to deliver timely, actionable digital insights to our customers.