AWS Database Blog

Resetting your graph data in Amazon Neptune in seconds

As an enterprise application developer building graph applications with Amazon Neptune, you may want to delete and reload your graph data on a regular basis to make sure you’re working with the latest changes in your data, such as new relationships between nodes, or to replace test data with production data. In the past, you either had to determine the difference between large datasets and insert incremental data into your graphs, or delete the entire graph database and create new Neptune clusters to load your data. These processes are time consuming and require additional development overhead, such as client changes and policy configurations for the new cluster. Now, with Neptune engine release 1.0.4.0, you can automate the process of deleting old graph data and loading new data into Neptune.

Today deleting graphs in Amazon Neptune can be slow because the process is transactional in nature. If you’re using Resource Description Framework (RDF) to delete data, you must use a cascading delete in SPARQL. If you’re using property graphs, you might start with the command g.V().drop() on the Gremlin console. However, depending on the size of your graph database, the drop command might time out or give an out-of-memory error. Therefore, you may get stuck in a cycle of increasing query timeouts and retrying. You may end up rewriting your queries to drop your graph database in smaller chunks (either by sequencing dropping edges followed by vertices, or writing a multi-threaded Gremlin query).

With the new database reset capability in Neptune, you can remove all data from the graph database using REST APIs or built-in commands which are available via Neptune Workbench. Database reset works for both property graphs and RDF graphs. In this post, we discuss the database reset in detail and what happens when it is invoked.

Solution overview

Database reset in Neptune is a two-step process. The first step issues a time-bound token (valid for 60 minutes). The second step uses the issued token as input to perform the actual reset. This two-step approach offers a protection against accidental deletions.

Database reset in Neptune is a two-step process.  

 

REST APIs

Amazon Neptune exposes a new /system endpoint for performing database reset in two steps: initiateDatabaseReset and performDatabaseReset.

initiateDatabaseReset

You must perform this step on the /system endpoint using the curl command. See the following code:

curl -X POST \
-H 'Content-Type: application/json' http://neptune-writer-endpoint:8182/system \
-d '{ "action" : "initiateDatabaseReset" }'

# or

curl -X POST \
-H 'Content-Type: application/x-www-form-urlencoded' \
https://neptune-writer-endpoint:8182/system \
-d 'action=initiateDatabaseReset'

The response in JSON format provides a reset token:

{
"status" : "200 OK",
"payload" : {
"token" : "ef478d76-d9da-4d94-8ff1-08d9d4863aa5"
}
}

performDatabaseReset

The second step uses the token from the initiateDatabaseReset command as input. This step works on the /system endpoint using the curl command. See the following code:

curl -X POST -H 'Content-Type: application/x-www-form-urlencoded' \
https://neptune-writer-endpoint:8182/system \
-d 'action=performDatabaseReset&token=ef478d76-d9da-4d94-8ff1-08d9d4863aa5'

# or

curl -X POST -H 'Content-Type: application/json' https: //neptune-writer-endpoint:8182/system -d '
{ 
"action": "performDatabaseReset" ,
"token" : "ef478d76-d9da-4d94-8ff1-08d9d4863aa5"
}'

The response in JSON format provides a reset token:

{
"status" : "200 OK"
}

The token is valid for 60 minutes. If the token is expired, you get an error response:

{"code":"InvalidParameterException","requestId":"4cb9c101-07bc-4317-d897-187978fbc270",
"detailedMessage":"System command parameter 'token': '4cb9c101-07bc-4317-d897-187978fbc270' does not match database reset token"}

Workbench magic commands

You can also initiate a database reset on the Neptune workbench using this two-step process. 

db_reset –generate-token

The db_reset --generate-token command generates a time-sensitive token:

%db_reset --generate-token

{
"status" : "200 OK",
"payload" : {
"token" : "ef478d76-d9da-4d94-8ff1-08d9d4863aa5"
                     }
}

db_reset –token

The db_reset --token command needs to provide the token generated in the previous command as input to trigger the reset:

%db_reset --token "ef478d76-d9da-4d94-8ff1-08d9d4863aa5"

{
"status" : "200 OK"
} 

How Neptune performs a database reset

Amazon Neptune performs series of steps to successfully perform a database reset:

  • The cluster marks the reset status in database.
  • A JSON response with 200 status is sent to the client.
  • The cluster stops accepting new incoming requests.
  • The cluster tries to cancel any queries that are currently in the queue.
  • The cluster re-starts.
  • The cluster drops existing database and recreates a blank database.
  • The cluster is ready to accept incoming read/write requests.

The operation can take 60–90 seconds before the server is ready to accept new read and write requests. The client application should use the best practice of gracefully handling connection disruptions and reconnecting as necessary prior to submitting any read or write requests. 

Conclusion

The newly released database reset capability allows you to remove all data from the graph. This accelerates graph application development because developers can use REST APIs or built-in commands which are available via Neptune Workbench to delete all data quickly and easily. You no longer need to spend time tuning query timeouts or writing complex, multi-threaded queries to delete edges and vertices. Try it out on your Neptune Cluster today.


About the Authors

Niraj Jetly is a Software Development Manager, Neptune. Prior to AWS, Niraj has led several product and engineering teams as CTO, VP-Engineering, and Head of Product Management for over 15 years. Niraj is a recipient of over 15 innovation awards including being named as CIO of the year in 2014 and top 100 CIO in 2013 and 2016. A frequent speaker at several conferences, he has been quoted in NPR, WSJ, and The Boston Globe.

 

Navtanay Sinha is a Senior Product Manager for Amazon Neptune at AWS.