Amazon Keyspaces now supports logged batches for atomic, multi-statement operations

Today, we are announcing Amazon Keyspaces (for Apache Cassandra) support for logged batches, a powerful feature that brings atomic, all-or-nothing write operations to your Apache Cassandra-compatible workloads.

This simplifies the migration path to Amazon Keyspaces for organizations running self-managed Apache Cassandra workloads that depend on logged batches.

This enhancement represents another step forward in our commitment to bring Amazon Keyspaces closer to full feature parity with Cassandra, while maintaining the benefits of a fully managed service.

In this post, we explore the benefits of logged batches, demonstrate how to set up and run them, and discuss important considerations when using logged batches with Amazon Keyspaces.

Benefits of logged batches

With logged batches, you can group multiple write operations—including INSERT, UPDATE, and DELETE statements—into a single atomic unit that either succeeds completely or fails entirely.

This all-or-nothing approach maintains durability and data consistency across a batch of operations. This is especially important when your batch spans multiple partitions or multiple tables.

Solution overview

Consider a common ecommerce scenario: you have two separate tables serving two different ways of viewing order data. This data is denormalized and duplicated across these tables for performance reasons.

Denormalization optimizes online transaction processing (OLTP) read and write performance by storing pre-computed data structures tailored to specific access patterns. Duplicating data across multiple tables limits the need for expensive multi-table lookups or full scans, and enables low-latency read and write operations that scale linearly.

Logged batches avoid the risk of inserts or deletes in one table failing by making sure either all operations succeed or none are applied.

The following diagram illustrates this process.

You can run the following steps on the Amazon Keyspaces console using the CQL editor, or using the command line or an integrated development environment (IDE). This walkthrough uses cqlsh and the cql-expansion extension.

Prerequisites

You will need an AWS account with appropriate permissions to Amazon Keyspaces to proceed. Refer to the documentation for the required permissions.

We have pre-configured the IAM permissions for cqlsh-expansion.

Create keyspace

To create your keyspace, complete the following steps:

Authenticate to your AWS account.

Connect to Amazon Keyspaces using the cqlsh command line and run the following command(s):

$ cqlsh-expansion cassandra.us-east-1.amazonaws.com 9142 --ssl

Connected to Amazon Keyspaces at cassandra.us-east-1.amazonaws.com:9142
[cqlsh 6.1.0 | Cassandra 3.11.2 | CQL spec 3.4.4 | Native protocol v4]
Use HELP for help.
cqlsh current consistency level is ONE.
cqlsh>

Change the consistency to LOCAL_QUORUM
```
CONSISTENCY LOCAL_QUORUM;
```

Create a keyspace called online_shop


CREATE KEYSPACE online_shop WITH replication = {'class': 'SingleRegionStrategy'} AND TAGS = {'blog':'logged-batches'};

Create tables

Now that you have created a keyspace, you create tables within the keyspace. For this post, we create three different tables. The order_by_order_id and order_by_customer_name, contain the same data but serve different access patterns, we will use logged batches to keep these in sync.

Create the customer table:

-- customer Table
CREATE TABLE online_shop.customer (
customer_id TEXT PRIMARY KEY,
customer_name TEXT,
customer_email TEXT,
customer_address TEXT
);

Create the order_by_order_id table:

-- order_by_order_id Table 
CREATE TABLE online_shop.order_by_order_id(
    order_id TEXT,
    order_date TIMEUUID,
    customer_name TEXT,
    product_name TEXT,
    order_item_count INT,
    order_total DECIMAL,
    PRIMARY KEY(order_id)
);

Create the order_by_customer_name table:

-- order_by_customer_name Table 
CREATE TABLE online_shop.order_by_customer_name(
    customer_name TEXT,
    order_date TIMEUUID,
    order_id TEXT,
    product_name TEXT,
    order_item_count INT,
    order_total DECIMAL,
    PRIMARY KEY((customer_name), order_id)
);

Perform logged batch operations

Complete the following steps to run logged batch operations:

Insert three records into the customer table:

BEGIN BATCH
    INSERT INTO online_shop.customer (customer_id, customer_name, customer_email, customer_address)
    VALUES ('1', 'Jane Doe', 'jane.doe@example.com', '123 Main St, Anytown, ST 12345');  
    INSERT INTO online_shop.customer (customer_id, customer_name, customer_email, customer_address)
    VALUES ('2', 'John Smith', 'john.smith@example.com', '456 Oak Ave, Somewhere, ST 67890');
    INSERT INTO online_shop.customer (customer_id, customer_name, customer_email, customer_address)
    VALUES ('3', 'Mike Johnson', 'mike.johnson@example.com', '789 Pine Rd, Elsewhere, ST 54321');
APPLY BATCH;

Validate the data is inserted correctly
```
SELECT * FROM online_shop.customer;
```

Insert some orders into the order_by_order_id and order_by_customer_name tables; this keeps different tables and their access patterns in sync:

BEGIN BATCH
    INSERT INTO online_shop.order_by_order_id (order_id, order_date, customer_name, product_name, order_item_count, order_total)
    VALUES ('ORD-001', NOW(), 'John Smith', 'Laptop', 1, 999.99);    
    INSERT INTO online_shop.order_by_customer_name (customer_name, order_date, order_id, product_name, order_item_count, order_total)
    VALUES ('John Smith', NOW(), 'ORD-001', 'Laptop', 1, 999.99);
APPLY BATCH;

Validate the data is inserted correctly

SELECT * FROM online_shop.order_by_order_id;

SELECT * FROM online_shop.order_by_customer_name;

Run some conditional logic, with different DML operations:

Insert an order if it doesn’t exist.
Update the order_item_count for another order.

BEGIN BATCH
    INSERT INTO online_shop.order_by_order_id (order_id, order_date, customer_name, product_name, order_item_count, order_total)
    VALUES ('ORD-002', NOW(), 'Jane Doe', 'Smartphone', 2, 1299.98) IF NOT EXISTS;
    INSERT INTO online_shop.order_by_customer_name (customer_name, order_date, order_id, product_name, order_item_count, order_total)
    VALUES ('Jane Doe', NOW(), 'ORD-002', 'Smartphone', 2, 1299.98) IF NOT EXISTS;

    UPDATE online_shop.order_by_order_id SET order_item_count = 3 WHERE order_id = 'ORD-001'
    UPDATE online_shop.order_by_customer_name  SET order_item_count = 3 WHERE customer_name = 'John Smith' AND order_id = 'ORD-001';
APPLY BATCH;

Validate the data is inserted and updated correctly

SELECT * FROM online_shop.order_by_order_id;

select * from online_shop.order_by_customer_name;

Run the following commands with an invalid-ORD-003 without using logged batches. In this case, one INSERT statement fails and the other executes.

INSERT INTO online_shop.order_by_order_id (order_id, order_date, customer_name, product_name, order_item_count, order_total)
VALUES (invalid-ORD-003, NOW(), 'Mike Johnson', 'Headphones', 3, 149.97);

INSERT INTO online_shop.order_by_customer_name (customer_name, order_date, order_id, product_name, order_item_count, order_total)
VALUES ('Mike Johnson', NOW(), 'ORD-003', 'Headphones', 3, 149.97);

The order is only in the order_by_customer_name table:

SELECT * FROM online_shop.order_by_order_id WHERE order_id = 'invalid-ORD-003';

SELECT * FROM online_shop.order_by_customer_name WHERE customer_name = 'Mike Johnson';

The result is a mismatch in the data between the two tables.The access patterns that use these tables will now present different versions of the truth, meaning a customer might not see their orders, or a customer might see an order that doesn’t actually exist.

Clean up the order that was just inserted by removing it

DELETE FROM online_shop.order_by_customer_name WHERE customer_name = 'Mike Johnson';

Validate the data has been deleted

SELECT * FROM online_shop.order_by_customer_name WHERE customer_name = 'Mike Johnson';

Run the same batch using logged batches, which throws an error.

BEGIN BATCH
    INSERT INTO online_shop.order_by_order_id (order_id, order_date, customer_name, product_name, order_item_count, order_total)
    VALUES (invalid-ORD-003, NOW(), 'Mike Johnson', 'Headphones', 3, 149.97);
    
    INSERT INTO online_shop.order_by_customer_name (customer_name, order_date, order_id, product_name, order_item_count, order_total)
    VALUES ('Mike Johnson', NOW(), 'ORD-003', 'Headphones', 3, 149.97);
APPLY BATCH;

The following queries show that there is no order in either table:

SELECT * FROM online_shop.order_by_order_id WHERE order_id = 'invalid-ORD-003';

SELECT * FROM online_shop.order_by_customer_name WHERE customer_name = 'Mike Johnson';

When one statement in the batch encountered an error, logged batches makes sure none of the statements in the batch were executed.

Clean up

To clean up your resources, drop the keyspace:

DROP KEYSPACE online_shop;

Additional information

In this section, we discuss additional features, considerations, and limitations of logged batches in Amazon Keyspaces.

Feature support

Logged batches in Amazon Keyspaces supports the following operations and features:

INSERT, UPDATE, and DELETE statements
The equality operator in the WHERE clause
Conditional logic operations: IF EXISTS and IF NOT EXISTS clauses for conditional writes
Batch operations can span multiple tables
Advanced data types: user-defined types (UDTs), static columns, and frozen columns
Client-side timestamps and Time To Live (TTL)

Consistency and concurrency control

Whereas Cassandra uses a last-write-wins approach for concurrent operations, Amazon Keyspaces implements a lightweight locking mechanism for isolation.

When concurrent batches attempt to modify the same rows, Amazon Keyspaces helps prevent conflicts by failing concurrent operations on rows involved in an ongoing batch.

This approach helps provide robust consistency assurances, though it represents a behavioral difference from Cassandra that developers should be aware of when migrating applications.

Conditional logic operations

Amazon Keyspaces implements conditional logic operations through Cassandra’s lightweight transactions (LWT) API. These operations let you modify data only when specific conditions are met.

The same implementation also applies when using logged batches.

Multi-Region deployments

For multi-Region deployments, logged batches provide atomicity and isolation in the source AWS Region, and replicated writes are asynchronous and treated as single statements.

The queries in the batch are executed in the destination Regions, but they won’t be isolated or batched. This means that a client might read some of a logged batch operation in a secondary Region, while the rest of the batch is in the process of being applied.

Monitoring and observability

Amazon Keyspaces extends its Amazon CloudWatch metrics to include specific monitoring capabilities for batch operations. You can track the following:

Batch operation success and failure rates
Latency metrics for batch operations
Throughput and utilization patterns
ConsumedWriteCapacityUnits and ConsumedReadCapacityUnits

Additionally, Amazon CloudTrail provides audit logging for Amazon Keyspaces batch operations to facilitate compliance and security monitoring.

Pricing and Capacity

There is no additional cost for enabling logged batches, you pay only for the capacity units that are part of your logged batch.

Keyspaces performs two writes for every row in the logged batch: one to prepare the row and one to commit the row. If the logged batch fails due to a query condition failure or throttling, the consumed capacity is based on the capacity used for preparation of the batch before the failure occurs.

You can view the consumed write capacity for your logged batch operations in Amazon CloudWatch metrics.

Performance considerations and best practices

Although logged batches provide consistency, they come with some performance trade-offs compared to individual writes or unlogged batches. The atomic nature of these operations requires additional coordination, which can result in slightly higher latency.

To get the best performance from logged batches, consider the following best practices:

Use them judiciously – Only use logged batches when you truly need atomicity across multiple operations
Keep batches small – Smaller batches generally perform better and are less likely to encounter conflicts
Avoid hotspots – Distribute your batch operations across different partitions when possible
Monitor performance – Use CloudWatch metrics to understand the impact of batch operations on your overall application performance

Key limitations and restrictions

Amazon Keyspaces implements limits on logged batch operations:

Up to 100 operations per batch (50 when static columns are involved)
Maximum payload of 4 MB per batch
Each batch can only contain operations on distinct items—no duplicate row operations within a single batch
Multiple concurrent in-flight batches on the same rows will conflict and fail
The IN operator is not supported
Logged batches only support LOCAL_QUORUM for consistency—this mirrors what Amazon Keyspaces supports for other DML operations
Range operators are not supported (>, <, >=, <=), including the LIKE and BETWEEN operators—this mirrors Apache Cassandra logged batches
Counter updates are not supported—this mirrors Apache Cassandra logged batches

The batch will fail if one of these conditions is detected.

Conclusion

In this post, we introduced and demonstrated logged batches for Amazon Keyspaces. This release simplifies maintaining data consistency across logical units of work involving multiple DML operations, multiple tables, and multiple partitions.

You can start using logged batches through a Cassandra-compatible driver without additional configuration.

Refer to the documentation to learn more.

AWS Database Blog