Achieving sub-10ms latency and 94% cost savings with Diskless Kafka using AutoMQ and Amazon FSx for NetApp ONTAP

As more organizations adopt Diskless Kafka—a cloud-based messaging queue that builds entirely on object storage like Amazon Simple Storage Service (Amazon S3)—they gain significant cost and operational advantages. But for latency-sensitive workloads, this architecture faces a key question: how to keep millisecond-level write latency while preserving the cost benefits of S3?

This is the core trade-off between Diskless Kafka and object storage-backed services. If you abandon local disks and write all data synchronously to object storage, you will lose one of Kafka’s most important capabilities: low latency. Object storage is designed for low-cost, high durability WORM storage workloads, but not for sub-millisecond write latency. S3 I/O latency is dozens of milliseconds. One solution might be to put an Amazon Elastic Block Store (Amazon EBS) volume in the middle tier as a buffer to address the latency and cost problems. But in practice, this new tier will introduce other problems, because S3 is a Regional service whereas EBS volumes are zonal. How can you prevent Availability Zone-level failure for EBS volumes? How can you reduce cross-AZ data transfer cost between brokers?

As a fully managed, Multi-AZ shared file system, Amazon FSx for NetApp ONTAP delivers the low-latency, cross-AZ durable storage that Diskless Kafka needs. AutoMQ now supports FSx for ONTAP as a storage option, helping you achieve local disk-class latency without sacrificing the benefits of an S3 based architecture.

AutoMQ is a fully Apache Kafka-compatible, S3 backed Diskless Kafka. Its storage engine combines a Write-Ahead Log (WAL) with object storage—where the WAL can run on different cloud storage types like block storage, object storage, and file storage, helping users balance cost and performance for their workload. This design preserves Kafka’s semantics while significantly reducing storage cost and simplifying operations, and has been broadly adopted. For latency-sensitive workloads like microservice call chains, exchange matching engines, risk decisioning, and real-time risk control, low latency is unacceptable. AutoMQ clusters are designed to share storage with each node in a cluster. If you’re using an EBS volume as the middle tier for WAL, it lacks cross-AZ sharing—it requires all AutoMQ nodes to be in the same Availability Zone, reducing architecture resilience.To gain cross-AZ and low latency at the same time, FSx for ONTAP is a good choice. FSx for ONTAP is a fully managed enterprise-grade file storage service that brings NetApp’s ONTAP file system capabilities to AWS. FSx for ONTAP delivers high availability through Multi-AZ deployment architecture, automatically replicating data across multiple Availability Zones to protect against single-AZ failures and support business continuity. In this post, we discuss the benefits of AutoMQ, including cross-AZ resilience for fault tolerance, zero cross-AZ data transfer costs, and low I/O latency for AutoMQ WAL storage.

Latency challenges for Diskless Kafka

Since AutoMQ introduced S3 as a shared storage-based Kafka architecture in 2023, Diskless Kafka has become a new architecture solution on the cloud. It provides compute-storage decoupling, elastic scaling, and significant cost savings.Importantly, AutoMQ reduces cross-AZ data transfer cost through shared storage. On AWS, a Multi-AZ Kafka cluster can save thousands to tens of thousands of dollars per month in network cost alone. Kafka users running on AWS have seen this savings already, and it’s becoming a primary driver for migrating to Diskless Kafka. However, the higher I/O latency on S3 restricts how AutoMQ is used in log ingestion and near real-time analytics workloads where end-to-end latency requirements are less stringent.In 2023, AutoMQ took a different approach: instead of writing directly to S3, it places a low-latency WAL in front of object storage. Writes land on the WAL first, then flush to S3 asynchronously in batches. This preserves Kafka semantics while keeping the hot write and read paths on fast storage, delivering a low-latency Diskless Kafka without giving up the cost benefits of S3.This architecture has two key benefits. First, the WAL serve writes at local storage-class speeds, so both write and read performance improve dramatically. Second, by aggregating writes in the WAL and flushing in batches, it reduces the number of S3 API calls, improving throughput and reducing costs. However, AWS doesn’t offer a Multi-AZ shared block storage service, so the WAL layer needs a different solution, which is where FSx for ONTAP comes in. Figure 1 illustrates the high-level architecture for this solution.

Figure 1: High-level solution architecture

On AWS, building a high-performance Diskless Kafka means choosing a WAL middle layer that delivers both Multi-AZ resilience and low latency. Historically, architects faced a difficult choice:

Amazon EBS as the WAL – Low latency, but cross-AZ replication costs and operational complexity remain.
S3 directly as the WAL – No cross-AZ data transfer costs, but latency is too high for latency-sensitive workloads.

In other words, Diskless Kafka on AWS was either affordable but slow, or fast but expensive. After evaluating multiple shared storage options on AWS, AutoMQ chose FSx for ONTAP as the WAL layer. FSx for ONTAP is a Multi-AZ shared file storage service that delivers write latencies to sub-millisecond on the SSD storage tier, and its pricing model doesn’t charge for cross-AZ traffic. This combination meets Diskless Kafka’s requirements for low latency, shared storage, and Multi-AZ availability.Because AutoMQ’s WAL is a fixed-size circular buffer, you only need a small amount of FSx for ONTAP capacity. Writes persist to FSx for ONTAP first, then flush to S3 in batches. This results in the following benefits:

All Diskless Kafka benefits, including compute-storage separation, elastic scaling, and S3 level cost.
Near-zero cross-AZ data transfer costs with Multi-AZ fault tolerance, due to only metadata exchange between brokers on the Amazon Elastic Compute Cloud (Amazon EC2) level.
Write and read latency close to local disk Kafka.

This makes AutoMQ one of the Diskless Kafka solutions on AWS with low latency, cross-AZ resilience, and low cost. It also makes Diskless Kafka feasible for latency-sensitive production workloads on AWS.

How FSx for ONTAP avoids cross‑AZ data transfer costs

To understand how FSx for ONTAP helps AutoMQ avoid cross‑AZ data transfer costs, we need to explore which layer of Kafka changes. Then we can examine what FSx for ONTAP does in this new architecture.Kafka can be split into three layers:

Network layer – Handles KafkaApis requests.
Compute layer – Implements core logic such as transactions, compression, deduplication, and LogCleaner. This is where most of Kafka’s code lives.
Storage layer – This layer is at the bottom, where LocalLog and LogSegment persist the unbounded log to the local file system.

AutoMQ keeps Kafka’s network and compute layers intact. The only change is at the storage layer: the LogSegment. At this layer, AutoMQ replaces local disks with a shared storage engine built on S3 and low-latency WAL (FSx for ONTAP). On top of the network layer, AutoMQ adds a Zone-routing interceptor. With this design, FSx for ONTAP acts as a Multi-AZ shared volume that provides durable WAL storage. Writes are first appended sequentially to FSx for ONTAP, then asynchronously flushed to S3. Figure 2 illustrates this workflow.

Figure 2: Solution workflow

In a Multi-AZ deployment, traditional Kafka incurs cross-AZ data transfer charges from three main sources: triplicate replication, cross-AZ consumption, and cross-AZ writes.AutoMQ addresses the first two directly. With a single replica plus cloud storage (S3 and FSx for ONTAP) for durability and Multi-AZ availability, the cluster no longer needs three in-cluster replicas, avoiding replication traffic between brokers. Combined with rack-aware scheduling, consumers read from the nearest broker, avoiding cross-AZ traffic on the read path.The biggest challenge is producer write traffic across Availability Zones. This is where FSx for ONTAP plays a critical role. As a shared WAL, FSx for ONTAP allows brokers in different Availability Zones to append to the same log without replicating data between them. The Availability Zone-routing interceptor localizes cross-AZ writes: when a producer writes to a broker in another Availability Zone, the interceptor transparently proxies that write to a broker in the producer’s local Availability Zone. Only a small amount of control metadata crosses Availability Zone boundaries; the actual data blocks are written to FSx for ONTAP from within the same Availability Zone and eventually persisted to S3. Figure 3 shows the end-to-end write and read paths.

Figure 3: End-to-end write and read paths

The workflow consists of four steps, with the write path comprising Steps 1 and 2, and the read path comprising Steps 3 and 4:

Producers send data to the broker, which persists it to the WAL storage. With FSx for ONTAP as the WAL, AutoMQ achieves write latency within 10 milliseconds without cross-AZ fees.
The broker asynchronously flushes data from the WAL to S3 in large, optimized batches. S3 remains the primary, cost-effective tier for long-term retention, and FSx for ONTAP handles only the hot tail of the log.
For real-time consumers, data is served directly from the hot data cache (in-memory), delivering the lowest end-to-end latency.
For historical data processing, the broker retrieves data from S3 and populates the cold data cache (backed by local NVMe SSDs). This way, catch-up reads don’t contend with the WAL or the hot write path.

With this design, AutoMQ maintains full Kafka protocol compatibility and cross-AZ high availability, while driving cross-AZ data plane traffic down to near its theoretical minimum.

To learn more, refer to How does AutoMQ implement a sub-10ms latency Diskless Kafka?

Benefits

With FSx for ONTAP, AutoMQ’s Diskless architecture on AWS no longer trades low latency for low cost. It preserves compute-storage separation, near-zero cross-AZ data plane traffic, and S3 backed storage cost. Additionally, a small, fixed-size FSx for ONTAP volume serving as a Region-wide WAL brings end-to-end latency down to a level that is suitable for demanding real-time workloads such as microservices and financial trading systems. The following sections cover performance and cost in detail.

Performance analysis

Using AutoMQ with FSx for ONTAP addresses a specific issue: in a cross-AZ high availability setup, how do you retain local disk-class latency without cross-AZ replication traffic?AutoMQ deploys FSx for ONTAP in Multi-AZ mode: within a single Region, FSx for ONTAP runs a high availability pair across two Availability Zones and exposes them as a single Region-wide shared file system. Brokers mount this file system as their WAL device. With this shared WAL layer in place, the system balances availability, elasticity, and network cost:

FSx for ONTAP delivers random I/O with latency close to Amazon EBS, while replicating across multiple Availability Zones to satisfy cross-AZ high availability requirements.
AutoMQ brokers remain stateless compute nodes that scale elastically when workloads increase. Hot writes go to FSx for ONTAP and asynchronously flush to S3.
Because data is no longer replicated between brokers, cross-AZ data plane traffic is avoided; only control plane communication remains.

We benchmarked end-to-end performance in the us-east-1 Region with a typical high-throughput workload:

Environment – 3x m7g.4xlarge brokers, FSx for ONTAP in Multi-AZ, Generation 2, configured with 1,024 GiB capacity, 3,072 provisioned IOPS, and 736 MBps throughput.
Workload – 4:1 read/write ratio, 64 KB message size, sustaining 300 MBps writes and 1.2 GiBps reads, simulating large-scale microservices and real-time workloads.
Results – Write latency averaged 5.98 milliseconds with P99 at 12.87 milliseconds; end-to-end latency averaged 7.79 milliseconds with P99 at 18.04 milliseconds.

Figure 4 illustrates these results.

Figure 4: Latency benchmarking

While still providing cross-AZ disaster recovery, full compute-storage separation, and S3 as the primary storage tier, AutoMQ uses a fixed-size FSx for ONTAP WAL layer to reduce Diskless Kafka’s average write latency from hundreds of milliseconds to single-digit milliseconds, approaching traditional local disk Kafka performance.This means latency-sensitive workloads such as microservice call chains, risk control, and orders matching can run on AutoMQ and FSx for ONTAP with stable, predictable latency.

Cost analysis

AutoMQ’s cost structure differs from traditional Kafka:

FSx for ONTAP serves as the durable, low-latency WAL—it stores only the most recent log entries, not long-term data.
S3 stores historical data and scales capacity independently. Storage costs stay at object storage level.
Because FSx for ONTAP and S3 provide built-in redundancy, AutoMQ avoids inter-broker replication, reducing storage and cross-AZ traffic costs.

In a typical scenario with 1 GBps ingestion/consumption and a 3-day retention period, you only need 45 m7g.large instances and 2 FSx for ONTAP file systems, each with 1,536 MBps throughput. Although FSx for ONTAP has a higher per-GiB cost, the required WAL capacity is small and fixed—it doesn’t grow with the retention period or data volume, unlike traditional Kafka’s replica storage. By avoiding inter-broker replication and most cross-AZ data plane traffic, AutoMQ avoids the network and storage costs that traditional Kafka incurs in Multi-AZ deployments. The overall TCO is dominated by S3 storage and on-demand compute, not block storage and cross-AZ bandwidth.The following table compares monthly costs under the same latency target (P99 write latency less than 10 milliseconds).

Cost analysis

Traditional Kafka requires a large number of instances, three-replica storage, and cross-AZ replication to meet this target, costing roughly USD $317,000 per month—most of which come from block storage, resource over-provisioning, and cross-AZ traffic. AutoMQ BYOC with FSx for ONTAP achieves the same single-digit millisecond write latency at roughly USD $18,345 per month—over 15 times less cost for the cloud resource.

For AutoMQ BYOC pricing, see the AutoMQ calculator.

AutoMQ BYOC and FSx for ONTAP: Free trial from AWS Marketplace

Figure 5 shows the deployment architecture of AutoMQ BYOC on AWS. You can subscribe to AutoMQ BYOC from AWS Marketplace and deploy it into your own virtual private cloud (VPC).

Figure 5: Deployment architecture of AutoMQ BYOC on AWS

Install AutoMQ BYOC console

For instructions to install the AutoMQ console from AWS Marketplace, refer to Install AutoMQ on AWS.

Create instance

To create an instance in AutoMQ, complete the following steps (Figure 6):

1) Log in to the AutoMQ console and choose Instances in the navigation pane.

2) Choose Create Instance.

Figure 6: Create new instance

3) In the Network Specs section, select Three Zones for your Availability Zone deployment.

For single-AZ deployments on AWS, EBS WAL is recommended for the best performance and cost efficiency. For Multi-AZ deployments, you can choose between S3 WAL and FSx for ONTAP WAL, depending on your latency requirements (Figure 7). For a detailed comparison, see WAL Storage.

Figure 7: Select Availability Zone deployment

Select FSWAL for WAL (Write-Ahead Log) Type.

When you choose EBS WAL or S3 WAL (Figure 8), capacity planning is simplified to a single parameter: AKU (AutoMQ Kafka Unit). You don’t need to select EC2 instance types or quantities—AutoMQ automatically selects a combination benchmarked for optimal performance and cost. For example, 3 AKU delivers 60 MBps write throughput, 60 MBps read throughput, 2,400 RPS, and at least 3,375 partitions. For more information about capacity planning, see Prepaid Billing.

In this example, we choose FSx for ONTAP WAL.

In addition to AKU, we also select the FSx for ONTAP instance specification and quantity.

AutoMQ has benchmarked FSx for ONTAP performance across specifications, so we can estimate the required instances based on target write throughput. In this example, we choose 3 AKU (60 MBps read/write) with 1 FSx for ONTAP instance at 384 MBps, which is sufficient for the WAL write requirements. The available options are as follows:

FSx for ONTAP 384 MBps specification provides 150 MiBps Kafka write throughput
FSx for ONTAP 768 MBps specification provides 300 MiBps Kafka write throughput
FSx for ONTAP 1,536 MBps specification provides 600 MiBps Kafka write throughput

Figure 8: Choose WAL type and AKU

Scaling and testing

After you create the cluster, you can view its details and adjust capacity elasticall (Figure 9):

FSx for ONTAP – Primary data is persisted in S3; FSx for ONTAP is only used for WAL. You can scale by adding or removing FSx for ONTAP instances—no partition migrations or data movement are required, so capacity changes don’t affect running workloads.
AKU – You can independently adjust the AKU count to match compute capacity to FSx for ONTAP throughput, scaling compute, and storage separately.

Figure 9: Update AKU

In this example, we use AutoMQ’s perf tool (based on OpenMessaging) to run a performance test from an EC2 instance in the same VPC:

SQL
KAFKA_HEAP_OPTS="-Xmx2g -Xms2g" ./bin/automq-perf-test.sh \
--bootstrap-server 0.kf-t1rf19ju6yrtl9fh.fsx-test-wanshao.automq.private:9092,1.kf-t1rf19ju6yrtl9fh.fsx-test-wanshao.automq.private:9092,2.kf-t1rf19ju6yrtl9fh.fsx-test-wanshao.automq.private:9092 \
--producer-configs batch.size=0 \
--consumer-configs fetch.max.wait.ms=1000 \
--topics 10 \
--partitions-per-topic 128 \
--producers-per-topic 1 \
--groups-per-topic 1 \
--consumers-per-group 1 \
--record-size 52224 \
--send-rate 160 \
--warmup-duration 1 \
--test-duration 5 \
--reset

The results as shown in Figure 10 confirm that FSx for ONTAP WAL delivers write latency similar to Kafka, meeting the requirements of most latency-sensitive event streaming and real-time processing scenarios.

Figure 10: Example output

Summary

In this post, we showed how AutoMQ uses FSx for ONTAP as the WAL layer on AWS to bring end-to-end latency down to levels suitable for real-time production workloads, while preserving the benefits of the Diskless Kafka architecture. The FSx for ONTAP and S3 shared storage design delivers compute-storage separation, Multi-AZ high availability, and near-zero cross-AZ data plane traffic. A small, fixed-size FSx for ONTAP WAL keeps the hot write and read paths on low-latency shared storage, with asynchronous flushing to S3. In our benchmark, AutoMQ with FSx for ONTAP delivered average write latency under 10 milliseconds and end-to-end latency in the tens of milliseconds, while maintaining S3 level storage cost and the elastic scaling of stateless brokers. Try this solution out for your own use case, and share your feedback in the comments.

AWS Storage Blog

Achieving sub-10ms latency and 94% cost savings with Diskless Kafka using AutoMQ and Amazon FSx for NetApp ONTAP

Latency challenges for Diskless Kafka

How FSx for ONTAP avoids cross‑AZ data transfer costs

Benefits

Performance analysis

Cost analysis

AutoMQ BYOC and FSx for ONTAP: Free trial from AWS Marketplace

Install AutoMQ BYOC console

Create instance

Scaling and testing

Summary

Resources

Follow

Learn

Resources

Developers

Help