4.7 times better write query price-performance with AWS Graviton4 R8g instances using Amazon Neptune v1.4.5

Amazon Neptune version 1.4.5 introduces engine improvements and support for AWS Graviton-based r8g instances. In this post, we show you how these updates can improve your graph database performance and reduce costs. We walk you through the benchmark results for Gremlin and openCypher comparing Neptune v1.4.5 on r8g instances against previous versions. You’ll see performance improvements of up to 4.7x for write throughput and 3.7x for read throughput, along with the cost implications.

Amazon Neptune Database is a fast, reliable, and fully managed graph database that makes it straightforward to build and run applications using highly connected datasets. You can build applications using Apache TinkerPop Gremlin or openCypher on the Property Graph model, or using the SPARQL query language on W3C Resource Description Framework (RDF).

Improvements in Neptune version 1.4.5

As part of the 1.4.5 release, Neptune delivered improvements targeting higher throughput and reduced P99 latencies compared to previous versions. According to our own benchmarks/tests, these improvements result in up to 20% performance gains on the same hardware for Gremlin workloads and up to 30% improvements for openCypher workloads. Additionally, the 1.4.5 version also contains improvements for openCypher users, with optimizations resulting in up to 10 times faster CREATE queries and up to 3 times faster MERGE queries on 12xlarge instances.

With this release, Neptune now supports r8g and r7g instance types, providing a 16% cost reduction compared to the previous lowest-priced option, the r6g.

The r7g instances are powered by AWS Graviton3 processors and are designed for memory-intensive workloads. They offer up to 25% better performance over the sixth-generation AWS Graviton2 based r6g instances. r7g instances feature Double Data Rate 5 (DDR5) memory, which provides 50% higher memory bandwidth compared to DDR4 memory to enable high-speed access to data in memory.

r8g instances are powered by the latest-generation AWS Graviton4 processors and provide the best price-performance for memory-intensive workloads. r8g instances offer up to 30% better performance and larger instance sizes with up to three times more vCPUs and memory than the seventh-generation AWS Graviton3 based r7g instances. When combined, these improvements translate to price-performance benefits for all users, with openCypher workloads seeing up to 3.7 times better price-performance for read workloads and 4.7 times better price-performance for write workloads.

Roy Reznik, Co-Founder & VP R&D at Wiz, shares their experience with Neptune’s performance improvements and r8g instances:

“The Wiz Security Graph visualizes your cloud stack to identify the risks in each layer and deliver actionable insights. It makes the complex simple by surfacing the relationships between cloud components as first-class citizens. To power the Wiz Security Graph, we needed a graph database that was globally available and could scale to 100s of billions of nodes and relationships to identify security risks in real-time. By updating to Graviton4-based r8g instances, we’ve reduced our graph database costs by 20%. Amazon Neptune’s combination of high-throughput graph processing, open-source graph models like Apache TinkerPop, and the cost-efficiency of the latest r8g instances allows us to scale the Wiz Security Graph at cloud speed.”

Next, we show you our performance benchmark results.

Performance benchmark using Locust

For this benchmarking exercise, we used Locust.io, an open source benchmarking tool that simulates real-world OLTP workloads. Locust is an open source load testing tool that you can use to create rich and complex load testing scenarios that can be distributed. For this test, we tested both read and write workloads on version 1.4.4.0 on an r5 instance and on version 1.4.5.0 on an r8g instance. The tests were performed using a custom user class (NeptuneUser) that implements a user class for the Boto3 SDK and is available on GitHub. The read and mutation workloads used represented common graph query patterns, such as retrieving egocentric neighborhoods, multi-hop path traversals, inserting nodes and edges, and upsert of nodes and edges. Comparable queries were run for both openCypher and Gremlin across a variety of client threads configurations.

Read workloads

For read workloads, Neptune 1.4.5 on r8g instances delivered 2.77x more queries per second than version 1.4.4.0 on r5 instances, with 62% lower P99 latency for openCypher queries. Similarly, Gremlin workloads showed an improvement of up to 1.89 times more queries per second with a 58% reduction in P99 latency.

For openCypher read queries, we saw the following results.

	large Instance Size		12xlarge Instance Size
Workload Type	Improvement in Throughput	Reduction in P99 Latency	Improvement in Throughput	Reduction in P99 Latency
Egocentric neighborhood	2.28x	-61%	1.56x	-47.42%
Multi-hop path traversal	2.77x	-62%	1.67x	-54.53%
Point lookup	1.95x	-58 %	1.52x	-46.81%

For Gremlin read queries, we saw the following results.

	large Instance Size		12xlarge Instance Size
Workload Type	Improvement in Throughput	Reduction in P99 Latency	Improvement in Throughput	Reduction in P99 Latency
Egocentric neighborhood	1.69x	-56%	1.57x	-27.29%
Multi-hop path traversal	1.89x	-59%	1.48x	-22.15%
Point lookup	1.58x	-58%	1.37x	-16.87%

Mutation workloads

For mutation workloads, Neptune 1.4.5 on r8g instances delivered 2.78x more queries per second than version 1.4.4.0 on r5 instances, with 77% lower P99 latency for openCypher queries. Gremlin workloads showed similar improvements: 1.99x higher throughput with 53% lower P99 latency.For openCypher mutation queries, we saw the following results.

	large Instance Size		12xlarge Instance Size
Workload Type	Improvement in Throughput	Reduction in P99 Latency	Improvement in Throughput	Reduction in P99 Latency
Create node	2.13x	-76%	11.9x	-53.48%
Create edge	2.01x	-68%	10.7x	-52.25%
Merge two nodes with one edge	2.78x	-77%	3.94x	-83.39%

For Gremlin mutation queries, we saw the following results.

	large Instance Size		12xlarge Instance Size
Workload Type	Improvement in Throughput	Reduction in P99 Latency	Improvement in Throughput	Reduction in P99 Latency
Create node	1.86x	-53%	1.29x	0%
Create edge	1.91x	-51%	1.18x	0%
Merge two nodes with one edge	1.98x	-51%	1.40x	-18.24%

In 1.4.5 we prioritized the optimization of openCypher write queries, showing improvements up to 11.9 times the previous version.

Price-performance

When looking at the combined impact of the engine improvements, new instance types, and cost reductions, we used a price-performance metric of cost/1mm queries. Given this metric, we saw an improved price-performance of up to 3.7 times for read workloads and up to 4.7 times for write workloads.

	large Instance Size			12xlarge Instance Size
Workload Type	Cost/1MM Queries (1.4.4 r5)	Cost/1MM Queries (1.4.5 r8g)	Price-Performance Improvement	Cost/1MM Queries (1.4.4 r5)	Cost/1MM Queries (1.4.5 r8g)	Price-Performance Improvement
Read	$0.81	$0.22	3.7x	$0.91	$0.39	2.3x
Write	$3.02	$0.64	4.7x	$4.33	$0.87	4.9x

The cost was calculated using US East (N. Virginia) Amazon Neptune prices.

Conclusion

The engine version 1.4.5 is available in all Regions where Neptune is available. To experience faster performance and higher throughput openCypher queries, upgrade your existing cluster to the latest 1.4.5 or newer version or create a new Neptune cluster. To learn more about the new release, see the 1.4.5 release notes.

AWS Database Blog

4.7 times better write query price-performance with AWS Graviton4 R8g instances using Amazon Neptune v1.4.5

Improvements in Neptune version 1.4.5

Performance benchmark using Locust

Read workloads

Mutation workloads

Price-performance

Conclusion

About the authors

Resources

Blog Topics

Follow

Learn

Resources

Developers

Help