Amazon Neptune now supports TinkerPop 3.4 features
Amazon Neptune now supports the Apache TinkerPop 3.4.1 release. In this post, you will find examples of new features in the Gremlin query and traversal language such as text predicates, changes to valueMap, nested repeat steps, named repeat steps, non-numerical comparisons, and changes to the order step. It is worth pointing out that TinkerPop 3.4 has a few important differences from TinkerPop 3.3. Be sure to review the compatibility notes in the engine releases documentation.
All of the latest features and improvements in the engine are documented on the Amazon Neptune Releases page.
Setting up a test cluster
You can try out the examples in this post by following the steps below. This post builds upon two prior posts; Analyze Amazon Neptune Graphs using Amazon SageMaker Jupyter Notebooks and Let Me Graph That For You – Part 1 – Air Routes, and again takes advantage of the
air-routes data used in this example is available in GitHub here.
The examples shown below require that Gremlin Python be at the 3.4 level or higher. If you used the AWS CloudFormation templates from our previous posts to generate a set of notebooks, and an Amazon SageMaker instance, you must update the level of Gremlin Python running this command from a Terminal window (inside the notebook) or from a notebook cell prefixed with
You only must do this if you kept an instance from before. If you were to re-run the AWS CloudFormation script again, it now installs the latest Gremlin Python libraries for you. Next, let’s import the required classes and establish a connection to the Neptune cluster.
Import the key GremlinPython classes
We now must import some classes from those libraries before we can connect to our Neptune instance from our Python code. Python has a number of reserved words that have the same name as Gremlin query steps, so when needed, those steps must be postfixed with an underscore character when using Gremlin Python. For example, the Gremlin
in() step is written as
Establish access to our Neptune instance
Before we can work with our graph, we must establish a connection to it. This is done using the
DriverRemoteConnection capability as defined by Apache TinkerPop and supported by GremlinPython. Once this cell has been run, we are able to use the variable g to refer to our graph in Gremlin queries in subsequent cells. By default Neptune uses port 8182 and that is what we connect to below. When you configure your own Neptune instance, you can choose a different port number. In that case, you would replace port 8182 in the example below with the port you configured.
Testing our connection
Let’s start off with a simple query just to make sure our connection to Neptune is working. The queries below look at all of the vertices and edges in the graph and create two maps that show an aggregation of the graph. As we are using the
air-routes dataset, the values returned are related to airports and routes. You can use these values to help verify the results of other examples contained in this post.
Many customers are excited about TinkerPop 3.4’s new “predicates” feature to enable more focused text searches. This works well when you are building an application and want your traversals to take advantage of the text values of your properties. For example, find me all the cities that start with “Dal”. In total, six new predicates have been added.
All of these new predicates are case-sensitive.
The example below looks for any cities with names starting with “dal”. A
dedup step is used to get rid of any duplicate names.
As the text predicates are case-sensitive, if we look for cities that have names starting with “Dal” we will not find any.
If you want to check for both ‘Dal’ or ‘dal’, you can do that using an
or step and two
has steps as shown below.
Each of the text predicates has an inverse step. We can use the
notStartingWith step to look for city names that do not start with “Dal”.
The example above returns the same results we would get if we were to negate a
startingWith step as shown below.
The example below looks for any city names ending with that characters “zhi”.
notEndingWith we can easily find cities whose names do not end with “zhi”.
We can also look for cities whose names contain a certain string. The example below looks for any cities with the string “gzh” in their name.
The example below chains together a number of has steps using
notContaining predicates to find cities with names containing no basic, lowercase, vowels commonly used in the English language
Changes to valueMap
Apache TinkerPop 3.4 introduced changes and new capabilities to the way that a
valueMap step is used. In general, a
valueMap step returns a set of key-value pairs as shown below. By default all values are shown as members of lists. This is the same behavior found in earlier versions of TinkerPop.
TinkerPop 3.4 added the ability to have the results of a
valueMap step returned without the values presented in lists using the new predicate.
As in prior releases, you can be more specific about the property keys you are interested in and
unfold the results. This is a best practice for getting the best performance for your traversals.
You can also
select specific keys to return just the values without their associated key names.
Before Apache TInkerPop 3.4, in order to have the ID and label of a vertex or edge included in
valueMap results, you would use the
valueMap(true) construction as shown below.
The use of
valueMap(true) is now deprecated. Instead, the new with step allows us to specify what we want returned using the
The results can be unfolded as in the prior examples.
Adding a numerical index to a collection
index step allows anything that is a collection, such as the results of a
fold step, to have a numerical index value associated with each entry in the collection. The first index value is always zero and the increment is always 1.
with step can be used to control the type of index that is created. The default is
list, but you can also ask for the indexed values to be returned as a
map. The index is the map’s key and the original values are mapped against those keys.
list is the default indexing mode, you can explicitly request it using a
The index values can be accessed from a query. The example below uses the index value to return the results in reverse order.
The example below applies an index step to the results generated by a
Nested repeat steps
repeat steps can now be nested inside other
repeat steps or inside
until steps. The example below starts at the Austin airport, traverses out one time, and for each airport found looks at the incoming routes to a depth of two.
Named repeat steps
As well as being nested, each
repeat step can now be given an optional name. This allows it to be referred to later inside of a
loops step. The example below shows named
repeat steps being used. In this particular case that naming could have been omitted, but this demonstrates the capability. The ability to name
repeat steps is intended for cases where those steps are also being nested.
Before TinkerPop 3.4, the
max steps could only be applied to numeric values. They can now be applied to anything that is considered “comparable” such as text strings. This is a little simpler than having to order a result set and select the first or last value.
Changes to order
Order.decr enumerations are now deprecated in favor of
Order.desc. This change makes Gremlin’s terminology more consistent with other database query languages. These changes were released before TinkerPop 3.4, but are now also supported by Amazon Neptune.
Changes to bulkSet
TinkerPop 3.4 adds bulkSet as a GraphSON type instead of coercing it to a List type. Before, TinkerPop3.4 the query results were serialized as flattened lists. Older Gremlin clients may not be able to recognize the change. The TinkerPop 3.4 BulkSet documentation also calls out the details.
We are excited to support the Apache TinkerPop 3.4 release in Amazon Neptune and highly encourage you to create a cluster, as mentioned in the steps above, and run through the examples. Let us know your feedback through the comments in this post or through our Amazon Neptune Discussion Forum.
About the Author
Kelvin Lawrence is a Principal Data Architect in the Database Services Customer Advisory Team focused on Amazon Neptune and many other related services. He has been working with graph databases for many years, is the author of the book “Practical Gremlin” and is a committer on the Apache TinkerPop project.