AWS Database Blog
Amazon Neptune now supports TinkerPop 3.4 features
Amazon Neptune now supports the Apache TinkerPop 3.4.1 release. In this post, you will find examples of new features in the Gremlin query and traversal language such as text predicates, changes to valueMap, nested repeat steps, named repeat steps, non-numerical comparisons, and changes to the order step. It is worth pointing out that TinkerPop 3.4 has a few important differences from TinkerPop 3.3. Be sure to review the compatibility notes in the engine releases documentation.
All of the latest features and improvements in the engine are documented on the Amazon Neptune Releases page.
Setting up a test cluster
You can try out the examples in this post by following the steps below. This post builds upon two prior posts; Analyze Amazon Neptune Graphs using Amazon SageMaker Jupyter Notebooks and Let Me Graph That For You – Part 1 – Air Routes, and again takes advantage of the air-routes
dataset.
The air-routes
data used in this example is available in GitHub here.
The examples shown below require that Gremlin Python be at the 3.4 level or higher. If you used the AWS CloudFormation templates from our previous posts to generate a set of notebooks, and an Amazon SageMaker instance, you must update the level of Gremlin Python running this command from a Terminal window (inside the notebook) or from a notebook cell prefixed with %%bash
.
You only must do this if you kept an instance from before. If you were to re-run the AWS CloudFormation script again, it now installs the latest Gremlin Python libraries for you. Next, let’s import the required classes and establish a connection to the Neptune cluster.
Import the key GremlinPython classes
We now must import some classes from those libraries before we can connect to our Neptune instance from our Python code. Python has a number of reserved words that have the same name as Gremlin query steps, so when needed, those steps must be postfixed with an underscore character when using Gremlin Python. For example, the Gremlin in()
step is written as in_()
.
Establish access to our Neptune instance
Before we can work with our graph, we must establish a connection to it. This is done using theDriverRemoteConnection
capability as defined by Apache TinkerPop and supported by GremlinPython. Once this cell has been run, we are able to use the variable g to refer to our graph in Gremlin queries in subsequent cells. By default Neptune uses port 8182 and that is what we connect to below. When you configure your own Neptune instance, you can choose a different port number. In that case, you would replace port 8182 in the example below with the port you configured.
Testing our connection
Let’s start off with a simple query just to make sure our connection to Neptune is working. The queries below look at all of the vertices and edges in the graph and create two maps that show an aggregation of the graph. As we are using the air-routes
dataset, the values returned are related to airports and routes. You can use these values to help verify the results of other examples contained in this post.
Text predicates
Many customers are excited about TinkerPop 3.4’s new “predicates” feature to enable more focused text searches. This works well when you are building an application and want your traversals to take advantage of the text values of your properties. For example, find me all the cities that start with “Dal”. In total, six new predicates have been added.
- startingWith
- endingWith
- containing
- notStartingWith
- notEndingWith
- notContaining
All of these new predicates are case-sensitive.
startingWith
The example below looks for any cities with names starting with “dal”. A dedup
step is used to get rid of any duplicate names.
As the text predicates are case-sensitive, if we look for cities that have names starting with “Dal” we will not find any.
If you want to check for both ‘Dal’ or ‘dal’, you can do that using an or
step and two has
steps as shown below.
notStartingWith
Each of the text predicates has an inverse step. We can use the notStartingWith
step to look for city names that do not start with “Dal”.
The example above returns the same results we would get if we were to negate a startingWith
step as shown below.
endingWith
The example below looks for any city names ending with that characters “zhi”.
notEndingWith
Using notEndingWith
we can easily find cities whose names do not end with “zhi”.
containing
We can also look for cities whose names contain a certain string. The example below looks for any cities with the string “gzh” in their name.
notContaining
The example below chains together a number of has steps using notContaining
predicates to find cities with names containing no basic, lowercase, vowels commonly used in the English language
Changes to valueMap
Apache TinkerPop 3.4 introduced changes and new capabilities to the way that a valueMap
step is used. In general, a valueMap
step returns a set of key-value pairs as shown below. By default all values are shown as members of lists. This is the same behavior found in earlier versions of TinkerPop.
TinkerPop 3.4 added the ability to have the results of a valueMap
step returned without the values presented in lists using the new predicate.
As in prior releases, you can be more specific about the property keys you are interested in and unfold
the results. This is a best practice for getting the best performance for your traversals.
You can also select
specific keys to return just the values without their associated key names.
Before Apache TInkerPop 3.4, in order to have the ID and label of a vertex or edge included in valueMap
results, you would use the valueMap(true)
construction as shown below.
The use of valueMap(true)
is now deprecated. Instead, the new with step allows us to specify what we want returned using the WithOptions
enumeration.
The results can be unfolded as in the prior examples.
Adding a numerical index to a collection
The new index
step allows anything that is a collection, such as the results of a fold
step, to have a numerical index value associated with each entry in the collection. The first index value is always zero and the increment is always 1.
A with
step can be used to control the type of index that is created. The default is list
, but you can also ask for the indexed values to be returned as a map
. The index is the map’s key and the original values are mapped against those keys.
While list
is the default indexing mode, you can explicitly request it using a with
step.
The index values can be accessed from a query. The example below uses the index value to return the results in reverse order.
The example below applies an index step to the results generated by a group
step.
Nested repeat steps
Gremlin repeat
steps can now be nested inside other repeat
steps or inside emit
and until
steps. The example below starts at the Austin airport, traverses out one time, and for each airport found looks at the incoming routes to a depth of two.
Named repeat steps
As well as being nested, each repeat
step can now be given an optional name. This allows it to be referred to later inside of a loops
step. The example below shows named repeat
steps being used. In this particular case that naming could have been omitted, but this demonstrates the capability. The ability to name repeat
steps is intended for cases where those steps are also being nested.
Non-numeric comparisons
Before TinkerPop 3.4, the min
and max
steps could only be applied to numeric values. They can now be applied to anything that is considered “comparable” such as text strings. This is a little simpler than having to order a result set and select the first or last value.
Changes to order
The previous Order.incr
and Order.decr
enumerations are now deprecated in favor of Order.asc
and Order.desc
. This change makes Gremlin’s terminology more consistent with other database query languages. These changes were released before TinkerPop 3.4, but are now also supported by Amazon Neptune.
Changes to bulkSet
TinkerPop 3.4 adds bulkSet as a GraphSON type instead of coercing it to a List type. Before, TinkerPop3.4 the query results were serialized as flattened lists. Older Gremlin clients may not be able to recognize the change. The TinkerPop 3.4 BulkSet documentation also calls out the details.
Conclusion
We are excited to support the Apache TinkerPop 3.4 release in Amazon Neptune and highly encourage you to create a cluster, as mentioned in the steps above, and run through the examples. Let us know your feedback through the comments in this post or through our Amazon Neptune Discussion Forum.
About the Author
Kelvin Lawrence is a Principal Data Architect in the Database Services Customer Advisory Team focused on Amazon Neptune and many other related services. He has been working with graph databases for many years, is the author of the book “Practical Gremlin” and is a committer on the Apache TinkerPop project.