Exploring the feature packed 22.214.171.124 release for Amazon Neptune
Amazon Neptune is a fast, reliable, and fully managed graph database service for building and running applications with highly connected datasets, such as knowledge graphs, fraud graphs, identity graphs, and security graphs. Neptune provides developers the most choice for building graph applications with three open graph query languages: openCypher, Apache TinkerPop Gremlin, and the World Wide Web Consortium’s (W3C) SPARQL 1.1.
Neptune announced the general availability of the latest engine release to 126.96.36.199 on 13th June 2023. With this release, you can benefit from a variety of new features and improvements, including support for Apache Tinkerpop 3.6.2, access to R6i instances, a graph summary API, slow query logging capabilities, and additional openCypher language functions and improved performance.
Apache Tinkerpop 3.6.2
Neptune support for Apache Tinkerpop 3.6.x introduces new Gremlin steps, and updates to existing options and modulators.
mergeV() and mergeE()
For workloads that use upsert-like functionality for vertices and edges, prior to Tinkerpop 3.6.x you would use the
fold().coalesce(unfold(), ...) pattern to determine if an object exists, create it if doesn’t, then update it as necessary. The
mergeE() steps simplify this process for create if not exists type queries.
As an example of
mergeV(), consider the fold-coalesce-unfold pattern for upserting a Vertex. The following query checks the existence of a vertex labelled with ‘airport’ with a ‘code’ property value of ATL, and creates it if it doesn’t exist:
Consider the same example using
mergeV() in 3.6.x:
This makes your code easier to both read and write, and allows Neptune to better optimize for mutation performance. Multi-label support for the
mergeV() step has also been added, simplifying the process of applying multiple labels to graph objects during object creation and updates.
If you have a requirement to set different properties during an object’s lifecycle, for example, when it is created, or when it is updated, you can also use the
mergeE() steps combined with the
The following is an example using the
mergeV() step to set the
create_date property when the object is first created, and the
update_date property when it is updated.
This is an example of using the
mergeE() step to update the distance property between creation and updating:
element() step allows you to traverse from a property back to its parent element, be that a Vertex or Edge. For more information, refer to the reference documentation on element. The following is an example of using the
The advantage of using the
element() step is it provides a convenient way of traversing back to the parent object rather than having to manually track it, resulting in more readable queries.
You can use the new
fail() step if you need to halt your query if a specific condition is met.
fail() immediately stops the traversal and throws an exception with a provided message. This is useful when debugging queries, or for providing better exception reporting when an unknown scenario has been reached. The following is an example of using the
fail() step to throw an exception when the vertex for ‘Kevin’ doesn’t exist:
regex predicate was added to TextP to provide a mechanism enabling you to build predicates that filter on string values using regular expressions:
The following is an example using the
TextP.regex predicate to find
airport vertices with a code property starting with the letter L:
In addition to supporting the
regex() predicate, Neptune also supports
notRegex(). This determines if a string value has no match with the specified regular expression pattern:
For more examples of using Regular Expressions refer to the online documentation.
Prior to the Apache Tinkerpop 3.6 release, updating properties in Gremlin required chaining multiple steps together. The following is an example of this pattern:
In many cases, applications send updates to the database in the form of a collection of property updates, or a Map. To update an object with all the given properties within the collection, this would need to be iterated over, either in code or within Gremlin itself. With the introduction of support for providing Map collections to the property step, you can now write more readable code without the need to manually iterate over the collection prior to updating:
For more information on these updates and upgrade considerations, refer to the Exploring new features in Apache Tinkerpop 3.6.x in Amazon Neptune blog post by Stephen Mallette, a long-time contributor to the Apache TinkerPop project and member of the Amazon Neptune service team.
Continuing our goal of delivering better price performance for customers, engine version 1.2.x.x now supports R6i instances. R6 instances are powered by 3rd generation Xeon Scalable processors, and are the 6th generation of Amazon Elastic Cloud Compute (Amazon EC2) memory optimized instances, designed for memory-intensive workloads.
R6i instances provide up to 50 Gbps of networking speed, twice that of existing R5 instances, and up to 20% higher memory bandwidth per vCPU compared to R5 instances. The R6i instances also deliver up to 15% better price-performance when compared to previous generation R5 instances, to help power your graph use cases. They are also priced at parity with the R5 instances.
Graph summary API
Customers asked us for a quick and simple way to retrieve the metadata about their Neptune graphs, such as a list of distinct vertex labels and distinct edge labels for property graphs or a count of subjects and predicates for their RDF graphs. This information is useful for providing a high-level view of the domain information to users, estimating the size of a graph, efficiently indexing data when running ETL (extract, transform, and load) jobs, or plugging into visualization and business intelligence (BI) applications powered by Neptune. With the graph summary API, you can send HTTP GET requests to get a report from the following endpoints:
Slow query logging
Neptune customers run millions of queries every day to derive insights from connections in their data. Many customers have applications that generate queries dynamically in response to user interactions. In such cases, customers asked us for increased visibility into query performance for queries that are taking longer than expected. To meet this requirement, we added support for slow query logs. You can now identify slow-running queries and log runtime details for these queries’ key performance indicators such as query runtime, waiting time in queue, index scan details, memory stats, and response codes to Amazon CloudWatch Logs.
Slow query logs are disabled by default, so to enable this functionality you must update the neptune_enable_slow_query_log database cluster parameter. To do so, change this setting to
info setting logs a few useful attributes of each slow-running query, whereas the
debug setting logs all available attributes.
To set the threshold that is used to identify slow running queries, you must set the neptune_slow_query_log_threshold database cluster parameter. This is the number of milliseconds after which a running query is considered slow and is then logged. The default value is 5000 milliseconds (5 seconds).
neptune_slow_query_log_threshold database cluster parameters are both dynamic parameters where changes are applied to your Neptune database almost immediately after they’re made without requiring a reboot.
The following is an example of how to update a custom database cluster parameter group using the AWS CLI:
The following is an example of how to modify an existing database cluster to enable publishing of the slow query logs to CloudWatch using the AWS CLI:
After enabling publishing of slow query logs to CloudWatch, the database cluster will be in a pending maintenance state. The following AWS CLI command applies the changes immediately:
The following is an example of a slow query log using the
For more information on each of the attributes included within the query log report, refer to Query attributes logged in debug mode.
As part of this release, a new
enableInterContainerTrafficEncryption parameter to all Neptune ML APIs, which you can use to enable or disable inter-container traffic encryption in training and hyper-parameter tuning jobs.
Further improvements and bug fixes focusing primarily on openCypher have also been made, addressing language parity with openCypher v9 specification. In addition, we have added new functions and improved performance. Further support was added for aggregation functions like percentile
percentileDisc() and standard deviation
stDev(), as well as trigonometric functions
randomUUID() function, used to generate random UUIDs, and the
epochMillis() function, used to convert datetime to epochmillis, were also added.
The following is an example of how to use the new
epochMillis function, converting a property value stored as datetime:
The following is an example of using the
randomUUID function to create a random id for a vertex:
randomUUID() to generate the object ID rather than Neptune means you can use the property value in subsequent parts of your query.
Further improvements have been made to how Neptune processes openCypher queries, as well as how it optimizes CPU usage during query execution. Specific query patterns have also seen performance improvements such as:
- Queries containing multiple update clauses,
- Queries that use parameterization for Maps or list properties
- Queries containing the
- Queries with filter
The following are examples of queries containing multiple update clauses:
The following is an example of a query with a filter using multi-hop patterns containing cycles:
The following is an example of a query with list/map injection using parameterization:
This update also includes SPARQL performance improvements for Concise Bounded Description (CBD) queries, and queries containing numerous static inputs provided in the VALUES clause. For example;
At AWS, we always work backwards from the customer, and this latest release from Amazon Neptune delivers numerous customer-requested enhancements for building graph applications. Beyond the features listed, you can find a complete list of improvements and fixes in the release notes. Here are a few ways to get started with this release:
- Create your first Neptune cluster as part of the AWS Free Tier
- Upgrade your existing Neptune cluster to take advantage of the latest features
- Use the open source graph-explorer application to quickly visualize and explore graphs on Neptune
- Run the open source graph-notebook library on Jupyter or JupyterLabs notebooks to interactively query and build graph applications on Neptune
Leave your questions in the comments section.
About the authors
Joy Wang is a Senior Product Manager on the Amazon Neptune team since 2020. She is passionate about making graph databases easy to learn and use, and empowering users with getting the most insights out of their highly-connected data.
Andrea Nassisi is a Principal Product Manager on the Amazon Neptune team. His passion is democratizing technologies. His focus in the team is the openCypher language implementation, and enabling any developer to use machine learning on graphs.
Kevin Phillips is a Neptune Specialist Solutions Architect working in the UK at Amazon Web Services. He has 18 years of development and solutions architectural experience, which he uses to help support and guide customers.