Amazon Neptune releases Streams, SPARQL federated query for graphs and more

The latest Amazon Neptune release brings together a host of capabilities that enhance developer productivity with graphs. This post summarizes the key features we have rolled out and pointers for more details.

Getting started

This new engine release will not be automatically applied to your existing cluster. You can choose to upgrade an existing cluster by following the instructions in the release notes. Or you can create a clone of an existing cluster that will receive the latest engine version.

If the Amazon Neptune cluster does not already exist, create a database with version 1.0.1.0.200463.0. Before we test Neptune Streams, we must enable it in lab mode. Based on customer feedback, Neptune is releasing early features in preview for customers to try out and validate their use cases. These preview features are in lab mode until they are productized in a future release. Lab mode allows you to enable (or disable) the experimental features in the Neptune engine using the neptune_lab_mode cluster parameter. You can configured lab mode using the AWS Console or the CLI.

AWS console

From the AWS console for Neptune, choose Parameter groups from the console left navigation menu. Navigate to the cluster parameter group associated with the Neptune cluster and choose Edit parameters. For the neptune_lab_mode parameter, add the following to enable Streams and Transaction semantics.

Streams=enabled, ReadWriteConflictDetection=enabled

AWS CLI

In addition to the console, you can use the CLI to check the value of the cluster parameter group and set the lab mode parameter. The following two commands can be used respectively:

aws neptune describe-db-cluster-parameters --db-cluster-parameter-group-name my-test-param-group --region us-east-1

aws neptune modify-db-cluster-parameter-group --db-cluster-parameter-group-name my-test-param-group --region us-east-1 –parameters "ParameterName=neptune_lab_mode,ParameterValue=\"Streams=enabled, ReadWriteConflictDetection=enabled\",ApplyMethod=pending-reboot"

Confirm that the changes have been made to the parameter group:

aws neptune describe-db-cluster-parameters --db-cluster-parameter-group-name my-test-param-group --region us-east-1

Output:
PARAMETERS      0,1     pending-reboot  static  boolean Toggle audit logging for Neptune        True    neptune_enable_audit_log        0       user
PARAMETERS              pending-reboot  static  string  Enables Neptune engine experimental features    True    neptune_lab_mode        Streams=enabled, ReadWriteConflictDetection=enabled     user
PARAMETERS      10-2147483647   pending-reboot  static  integer Graph query timeout (ms).       True    neptune_query_timeout   120000  engine-default

Note 1: If the Neptune cluster is associated with the default cluster parameter group (usually default.neptune1), you must create a new cluster parameter group and then associate it with the Neptune cluster.

Note 2: Whenever lab mode cluster parameters changes, you must reboot the instances in the cluster before using preview features.

Neptune Streams preview

Neptune Streams is an easy way to capture changes in your graph. When enabled in preview, Neptune Streams logs changes to your graph as they happen. Streams are available using the REST API https://Neptune-DNS:8182/sparql/stream or https://Neptune-DNS:8182/gremlin/stream for SPARQL and Gremlin respectively.

For example, if you add a vertex for person with name as the property, querying the REST endpoint for Gremlin returns the list of changes to the graph in JSON format. A similar query for SPARQL returns the N-Quads format.

gremlin> g.addV('person').property('name','karthik').next()
==>v[feb6e536-a9c4-d9fd-57aa-dbb8f94328d6]

The streams would indicate the operation on the vertex and its property in the stream result below.

curl -s http://myneptune:8182/gremlin/stream?limit=3&commitNum=1&opNum=1&iteratorType=AT_SEQUENCE_NUMBER

{
    "lastEventId": {
        "commitNum": 1,
        "opNum": 2
    },
    "lastTrxTimestamp": 1571059225504,
    "format": "GREMLIN_JSON",
    "records": [
        {
            "eventId": {
                "commitNum": 1,
                "opNum": 1
            },
            "data": {
                "id": "feb6e536-a9c4-d9fd-57aa-dbb8f94328d6",
                "type": "vl",
                "key": "label",
                "value": {
                    "value": "person",
                    "dataType": "String"
                }
            },
            "op": "ADD"
        },
        {
            "eventId": {
                "commitNum": 1,
                "opNum": 2
            },
            "data": {
                "id": "feb6e536-a9c4-d9fd-57aa-dbb8f94328d6",
                "type": "vp",
                "key": "name",
                "value": {
                    "value": "karthik",
                    "dataType": "String"
                }
            },
            "op": "ADD"
        }
    ],
    "totalRecords": 2
}

SPARQL federated query

Neptune now supports SPARQL 1.1 federated query. Using the SPARQL1.1 SERVICE keyword, customers can execute portions of a query in different SPARQL endpoints within their Virtual Private Cloud (VPC), combine the results and return them to the user.

For example, if a sample query on the person dataset in one cluster retrieves persons with first name “John” and the second query on another cluster retrieves persons with last name “Abercrombie”, you can federate queries using SPARQL 1.1 and retrieve persons with first name “John” and last name “Abercrombie”.

Here is the example using RDF4J running on neptune_cluster_1 and federating query to neptune_cluster_2:

> open neptune
Opened repository 'neptune'

neptune> sparql
Enter multi-line SPARQL query (terminate with line containing single '.')

 SELECT ?person WHERE {
   ?person foaf:givenname "JOHN" .
   SERVICE <http://neptune_cluster_2:8182/sparql> 
   {
   ?person foaf:surname "ABERCROMBIE" .
   }
}
.

Note: If more than one Neptune cluster is in the same VPC, you must ensure the security group for each of the cluster allows the Neptune clusters to talk to one another over port 8182 (or the configured port).

Gremlin sessions

Sessions in Gremlin define the start and end of a transaction. Transactions are automatically committed only when the session is closed. This is different from the sessionless approach where Neptune automatically manages the transaction boundary for each request. To use sessions, pass the session parameter to the remote connection. The table below illustrates the behavior of two clients connected to the same Neptune cluster. Notice that the updates from the session are not visible until the session is closed.

Step  |Gremlin connection 1 without Session     |Gremlin connection 2 with Session 
------------------------------------- ------------------------------------------------- 
 1    |remote connect tinkerpop.server          |
      |conf/neptune-remote.yaml                 |
---------------------------------------------------------------------------------------
 2    |g.addV('person').property(id, 'vertex    |
      |without session').next()                 |
      |==>v[vertex without session]             |
---------------------------------------------------------------------------------------
 3    |                                         |remote connect tinkerpop.server 
      |                                         |conf/neptune-remote.yaml session 
---------------------------------------------------------------------------------------
 4    |g.V()                                    |g.V() 
      |==>v[vertex without session]             |==>v[vertex without session] 
---------------------------------------------------------------------------------------
 5    |                                         |g.addV('person').property(id, 'vertex
      |                                         |added in session').next() 
      |                                         |==>v[vertex added in session] 
---------------------------------------------------------------------------------------
 6    |g.V()                                    |
      |==>v[vertex without session]             |
---------------------------------------------------------------------------------------
 7    |g.addV('person').property(id, 'another   |
      |vertex without session').next()          |
      |==> fails due to conflicting             |
      |concurrent operation                     |
---------------------------------------------------------------------------------------
 8    |                                         |g.V() 
      |                                         |==>v[vertex added in session] 
      |                                         |==>v[vertex without session] 
---------------------------------------------------------------------------------------
 9    |                                         |remote close 
      |                                         |==> Removed - Gremlin Server 
---------------------------------------------------------------------------------------
10    |g.V()                                    |
      |==>v[vertex added in session]            |
      |==>v[vertex without session]             |

If you are using Java to access Neptune, use the session name in the Cluster.connect call to create a session. This session will last until the client is closed.

Cluster cluster = Cluster.open(config);
Client client = cluster.connect("mysession1"); // creates the Client.SessionedClient
...
client.submit(gremlin_query); // queries as above
...
client.close(); // session closed

Gremlin explain query

You can use the Gremlin explain query to understand the query execution plan and identify any bottlenecks. Gremlin explain query is available as a REST query on /gremlin/explain with the query supplied via the parameter as shown below.

curl -X POST https://myneptune:8182/gremlin/explain -d '{"gremlin":"g.E()"}'

*******************************************************
                Neptune Gremlin Explain
*******************************************************

Query String
============
g.E()

Original Traversal
==================
[GraphStep(edge,[])]

Converted Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Edge) {
        JoinGroupNode {
            PatternNode[(?3, ?2, ?4, ?1) . project ?1 . IsEdgeIdFilter(?1) .]
        }, annotations={path=[Edge(?1):GraphStep], maxVarId=5}
    },
    NeptuneTraverserConverterStep
]

Optimized Traversal
===================
Neptune steps:
[
    NeptuneGraphQueryStep(Edge) {
        JoinGroupNode {
            PatternNode[(?3, ?2, ?4, ?1) . project ?1 . IsEdgeIdFilter(?1) .], {estimatedCardinality=INFINITY}
        }, annotations={path=[Edge(?1):GraphStep], maxVarId=5}
    },
    NeptuneTraverserConverterStep
]

Predicates
==========
# of predicates: 1

Summary

That was a quick tour of the features in the current release (1.0.1.0.200463.0) of Neptune. We encourage you to spin up a test cluster and try out the features in the latest release. More details about each capability are in the documentation. Many of the features in this blog post will also be elaborated in future posts shortly. Stay Tuned.

We are always eager for your feedback. Please reach out using the comments below or via the support forums.

About the Author

Karthik Bharathy leads product for Amazon Neptune.