AWS Database Blog
Working with identity columns and sequences in Aurora DSQL
Amazon Aurora DSQL now supports PostgreSQL-compatible identity columns and sequence objects, so developers can generate unique integer identifiers with configurable performance characteristics optimized for distributed workloads.
In distributed database environments, generating unique, sequential identifiers is a fundamental challenge: coordinating across multiple nodes creates performance bottlenecks, especially under high concurrency workloads. In this post, we show you how to create and manage identity columns for auto-incrementing IDs, selecting between identity columns and standalone sequence objects, and improving cache settings while choosing between UUIDs and integer sequences for your workload requirements.
Solution overview
Aurora DSQL implements identity columns and sequence objects as distributed coordination primitives that generate unique integer values across the cluster. Identity columns provide streamlined syntax for the common auto-incrementing ID use case, while standalone sequences offer advanced control for complex scenarios.
Prerequisites
You must have the following prerequisites to follow along with this post.
AWS account requirements:
- Active AWS account with Aurora DSQL cluster provisioned
- Aurora DSQL cluster running
- Aurora DSQL requires all connections to use Transport Layer Security (TLS) encryption. To establish secure connections, your client system must trust the Amazon Root Certificate Authority (Amazon Root CA 1). This certificate is pre-installed on many operating systems.
Required permissions:
CREATEpermission on schema for creating sequences and tablesUSAGEorUPDATEpermission on sequences for generating valuesALTERpermission for modifying existing sequences or identity columns
Tools and software needed:
- PostgreSQL-compatible client (psql, DBeaver, or application database driver)
- SQL client configured to connect to Aurora DSQL endpoint
Create a table with an identity column
Selecting an appropriate cache value is an important part of using sequences and identity columns effectively. The cache setting determines how identifier allocation behaves under load, influencing both system throughput and how closely values reflect allocation order. Higher cache values allow for higher throughput by serving sequence numbers from locally pre-allocated ranges, but this efficiency comes with a trade-off: any reserved values that go unused are lost, making gaps and out-of-order allocation more apparent.
To maximize achievable throughput under concurrency, use a larger cache size, for example:
To minimize gaps in the sequence ordering, use a cache size of 1, for example:
Key configuration decisions:
- GENERATED ALWAYS: System controls all values; users can’t override
- GENERATED BY DEFAULT: Users can provide explicit value when needed
- CACHE 70000: Optimized for high-volume concurrent inserts
- CACHE 1: Closer to sequential ordering, lower throughput
Inserting data using identity columns
In this section, we show some examples of how identity columns work during INSERT statements.
Automatic value generation example
Multiple inserts example
Using the DEFAULT keyword explicitly example
Alternatively, the keyword DEFAULT can be specified in place of a value to explicitly request the sequence-generated value.
Overriding GENERATED ALWAYS (when necessary) example
Only works with OVERRIDING SYSTEM VALUE clause.
Create and use standalone SEQUENCE objects
While identity columns provide a convenient built-in solution for auto-incrementing values, there are situations where creating standalone sequences offers greater flexibility and control. Standalone sequences are particularly useful in three scenarios:
- When you need to share a single sequence across multiple tables to maintain a unified numbering system
- When you require ID generation outside of INSERT operations (such as pre-generating identifiers for batch processing or external systems)
- When you have advanced sequence manipulation requirements that go beyond auto-increment functionality
Create a sequence with a large cache for high concurrency example
Use Sequence in INSERT example
Generate value in application logic example
Query sequence and identity column state
DSQL provides several system views and functions to monitor sequence behavior and inspect their current state. You can check sequence parameters, view current values, and examine identity column configurations to understand how your auto-increment mechanisms are functioning.
View current sequence values, as an example
Check sequence parameters and current state.
Get current value in this session
First get the nextval and then get currval.
View all sequences.
Inspect identity column properties
View identity column configuration
Modify identity columns and sequences
After creating sequences or identity columns, you might need to adjust their behavior to meet changing requirements. With Aurora DSQL, you can modify various sequence properties including generation behavior, starting values, cache sizes, and increment steps.
You can use the ALTER TABLE/ALTER COLUMN commands to modify identity columns or ALTER SEQUENCE for standalone sequences.
Change identity column behavior example:
Switch from ALWAYS to BY DEFAULT to allow manual ID insertion during data migration or bulk imports.
Restart identity sequence at a specific value example
Reset the sequence after data cleanup or to reserve ID ranges for different purposes (for example, starting production IDs at 50000 after test data).
Modify cache size of underlying sequence example
Increase cache size to improve INSERT performance in high-throughput applications by reducing sequence-related I/O operations.
Modify standalone sequences examples
Change cache size to optimize performance for frequently accessed sequences in write-heavy workloads.
Change cache size
Restart sequence
Restart sequence after data archival or to align numbering with business requirements (for example, starting the new fiscal year at 10000).
Change increment
Change increment to create gaps between IDs for organizational purposes (for example, reserving ranges for different regions or departments).
Validation and testing
After configuring identity columns and sequences, it’s important to verify that they’re generating values as expected. You can test the automatic ID generation by inserting records and using the RETURNING clause to see the generated values immediately.
Verify identity column behavior
Insert test records and observe generated IDs to confirm the sequence is working correctly and starting from the expected value.
Verify sequential generation within session to ensure IDs increment properly with each subsequent insert.
Test concurrent behavior (requires multiple sessions) examples
When multiple database sessions insert records simultaneously, sequence caching can affect the order in which IDs are generated. This example simulates concurrent inserts from two different sessions to demonstrate how cache settings impact ID generation patterns. Understanding this behavior is crucial for applications that require predictable ID ordering or need to optimize for high-concurrency workloads.
Session 1:
Take note of the returned ID.
Session 2 (simultaneously):
Compare with Session 1 ID.
Expected behavior (with CACHE >= 65536):
Sequence values are unique but might not be generated in strictly increasing order across sessions, and gaps can occur when cached values go unconsumed. Consistent with PostgreSQL semantics for cached sequences under concurrent use, where both systems ensure distinct values without providing sequential ordering.
Expected behavior (with CACHE =1):
Sequence values follow allocation order more closely over time.
Consistent with PostgreSQL’s non-cached sequence behavior. Each session fetches one value at a time, so cross-session ordering is more predictable. To preserve strictly sequential ID vending you must set CACHE = 1 and concurrent requests must be avoided.
Implement workload-specific patterns
The recommended cache size for sequences depends on your specific application requirements. High-throughput systems benefit from larger caches to reduce database round-trips, while applications requiring strict sequential ordering need minimal or no caching. This section demonstrates three common patterns with different cache strategies.
High-volume event logging (CACHE >= 65536) example
Event logging systems prioritize write performance over sequential ordering because events are typically queried by timestamp rather than ID. A large cache size minimizes sequence-related overhead during burst traffic periods.
Account number generation (CACHE = 1) example
Financial or customer-facing account numbers often require predictable, gap-free sequential numbering for auditing, compliance, or customer service purposes. Setting CACHE = 1 allocates IDs in order, though at the cost of performance.
Hybrid approach (UUID primary key + sequence display ID) example:
This pattern combines the benefits of both approaches: UUIDs provide globally unique, distributed, system-friendly primary keys, while a cached sequence generates user-friendly display numbers for customer-facing interfaces like order tracking.
Best practices
Choosing CACHE size:
- Use
CACHE >= 65536when:- Identifiers generated at high frequency (thousands per second)
- Many concurrent sessions performing inserts
- Workload tolerates gaps in identifier values. The database always vends IDs in sequential, ascending order—apparent out-of-order allocation is a client-side observation, not a database behavior (this is also true of standard PostgreSQL, where client submission order does not determine ID assignment)
- Examples: IoT telemetry, event logs, job run IDs
- Use
CACHE = 1when:- Allocation rates are moderate (hundreds per second or less)
- Closer sequential ordering is important
- Minimizing gaps is a priority
- Examples: invoice numbers, account IDs, reference numbers
Identity columns compared to standalone sequences:
- Prefer
IDENTITYcolumns for: Single-table auto-incrementing IDs (most common use case) - Use standalone sequences for: Sharing sequences across tables, generating IDs outside
INSERToperations
UUIDs compared to integer sequences:
- Default to UUIDs for: Primary keys in high-scale distributed workloads
- Use integer sequences for: Human-readable display IDs, reporting, external integrations
- Consider a hybrid approach: UUID primary key + sequence-based display number
Managing gaps:
- Gaps are inevitable due to transaction rollbacks, cached value loss, and concurrent allocation
- Applications should be designed to handle non-contiguous ID values. The database always vends IDs in sequential, ascending order so gaps arise from external factors (rollbacks, cache loss, concurrency) rather than out-of-order vending
- Never rely on sequences for gapless numbering
Performance improvement:
- Larger cache sizes reduce coordination overhead but increase visible gaps
- Use explicit transactions when ordering matters within a single session
- Monitor sequence allocation patterns under production load
Troubleshooting examples
Issue: “Permission denied for sequence”
Grant necessary permissions.
Issue: “Cannot insert explicit value for identity column”
Use OVERRIDING SYSTEM VALUE for GENERATED ALWAYS columns.
Or change to GENERATED BY DEFAULT.
Issue: Large gaps in sequence values.
- Expected behavior with
CACHE >= 65536 - Occurs when cached values aren’t fully consumed before the session ends
- Consider reducing cache size if gaps are problematic for your use case
Issue: Sequence values not strictly increasing across sessions
- Expected behavior with
CACHE >= 65536in distributed systems - Use
CACHE = 1if strict ordering is required - Use explicit transactions for more predictable ordering within sessions
Clean up
If you created resources to follow along with identity column tutorials or need to clean up your database schema, this guide walks you through the process of removing identity column behavior from your database objects. Whether you need to drop identity properties while preserving your data, remove sequences, or clean up entire tables, you’ll find the SQL commands and syntax you need here. We’ll also cover important cost implications and system limits to help you manage your database resources efficiently.
- Remove identity column behavior:
- Drop identity property (keeps column and data)
- Drop sequences:
- Drop standalone sequence
- Drop sequence and dependent objects
- Drop tables with identity columns
- Automatically drops associated sequence
Cost implications:
- No additional charges for sequences or identity columns
- Dropping unused sequences reduces metadata overhead (5,000 sequence limit per cluster)
Conclusion
In this post, we explored how identity columns provide PostgreSQL-compatible auto-incrementing IDs improved for distributed workloads. We demonstrated that standalone sequences offer advanced control for complex scenarios, while UUIDs remain the recommended choice for primary keys, with integer sequences better suited for display IDs and human-readable identifiers. Finally, we discussed how gaps and ordering effects are inherent characteristics of distributed sequence allocation that developers should account for in their application design.
Ready to experiment with identity columns and sequences? Visit the Aurora DSQL Playground to run these examples in a live environment. You can test different cache settings, observe concurrent behavior, and explore how sequences work in real-time without setting up your own database.