Amazon Neptune Documentation

Introduction

With Amazon Neptune, you can create graph applications that can query billions of relationships in milliseconds. Amazon Neptune allows you to use the popular graph query languages Apache TinkerPop Gremlin and W3C's SPARQL and openCypher to run powerful queries that are easy to write and perform well on connected data. This significantly reduces code complexity, and allows you to more quickly create applications that process relationships.

Neptune increases database performance and availability by tightly integrating the database engine with an SSD-backed virtualized storage layer purpose-built for database workloads. Neptune's storage is designed to be fault-tolerant and self-healing, and disk failures are repaired in the background without loss of database availability. Neptune is designed to automatically detect database crashes and restart without the need for crash recovery or to rebuild the database cache. If the entire instance fails, Neptune is designed to automatically fail over to one of up to 15 read replicas.

You can quickly launch a Neptune database instance with a few steps in the Neptune console. Neptune scales storage automatically, growing storage and rebalancing I/Os to provide consistent performance without the need for overprovisioning.

Performance and Scalability

Serverless option

Amazon Neptune Serverless is an on-demand deployment option that automatically adjusts database capacity based on an application’s needs. It can scale graph database workloads instantly to hundreds of thousands of queries, and you pay only for consumed capacity.

Throughput and Latency for Graph Queries

Amazon Neptune stores and navigates graph data, and uses a scale-up, in-memory optimized architecture to allow for fast query evaluation over large graphs. With Neptune, you can use either Gremlin or SPARQL to execute powerful queries.

Scaling of Database Compute Resources

Through the AWS Management Console, you can scale the compute and memory resources powering your production cluster up or down by creating new replica instances of the desired size, or by removing instances. 

Storage that Scales

Amazon Neptune is designed to grow the size of your database volume as your database storage needs grow. You don't need to provision excess storage for your database to handle future growth.

Low Latency Read Replicas

Increase read throughput to support high volume application requests by creating up to 15 database read replicas. Amazon Neptune replicas share the same underlying storage as the source instance, avoiding the need to perform writes at the replica nodes. This can help free up more processing power to serve read requests and reduces the replica lag time. Neptune also provides a single endpoint for read queries so the application can connect without having to keep track of replicas as they are added and removed.

High Availability and Durability

Instance Monitoring and Repair

The health of your Amazon Neptune database and its underlying EC2 instance is continuously monitored. Amazon Neptune is designed so that if the instance powering your database fails, the database and associated processes are restarted. Neptune recovery does not require the potentially lengthy replay of database redo logs to restart. It also isolates the database buffer cache from database processes.

Multi-AZ Deployments with Read Replicas

On instance failure, Amazon Neptune is designed to failover to one of up to 15 Neptune replicas you have created in any of three Availability Zones. If no Neptune replicas have been provisioned, in the case of a failure, Neptune will attempt to create a new database instance for you.

Fault-tolerant and Self-healing Storage

Amazon Neptune is designed to replicate each 10GB chunk of your database volume six ways, across three Availability Zones. Amazon Neptune uses fault-tolerant storage that is designed to transparently handle the loss of up to two copies of data without affecting database write availability and up to three copies without affecting read availability. Neptune’s storage is also designed to be self-healing; data blocks and disks are continuously scanned for errors and replaced.

Backups and Point-in-time Restore

Amazon Neptune's backup capability is designed to enable point-in-time recovery for your instance. Your backup retention period can be configured up to thirty-five days. Automated backups are stored in Amazon S3.

Database Snapshots

Database Snapshots are user-initiated backups of your instance stored in Amazon S3 that will be kept until you delete them. They leverage the incremental snapshots to help reduce the time and storage required. You can create a new instance from a Database Snapshot whenever you desire.

Global database

Amazon Neptune Global Database is designed for globally distributed applications, allowing a single Neptune database to span multiple AWS Regions. It replicates graph data with little impact to database performance, enables fast local reads with low latency in each Region, and provides disaster recovery in case of region-wide outages.

Open graph APIs

Supports Apache TinkerPop Germlin for property graph

Property Graphs are popular because they are familiar to developers that are used to relational models. Gremlin traversal language provides a way to quickly traverse Property Graphs. Amazon Neptune supports the Property Graph model using the open source Apache TinkerPop Gremlin traversal language and provides a Gremlin Websockets server that supports TinkerPop version 3.3. With Neptune, you can quickly build fast Gremlin traversals over property graphs. Existing Gremlin applications can easily use Neptune by changing the Gremlin service configuration to point to a Neptune instance.

Supports W3C's Resource Description Framework (RDF) 1.1 and SPARQL 1.1

RDF provides flexibility for modeling complex information domains. There are a number of existing free or public datasets available in RDF including Wikidata and PubChem, a database of chemical molecules. Amazon Neptune supports the W3C's Semantic Web standards of RDF 1.1 and SPARQL 1.1 (Query and Update), and provides an HTTP REST endpoint that implements the SPARQL Protocol 1.1. With Neptune, you can easily use the SPARQL endpoint for both existing and new graph applications.

Supports openCypher v9 for property graph

Neptune supports building graph applications using openCypher. Developers, business analysts, and data scientists like openCypher's SQL-inspired syntax because it provides a familiar structure to compose queries for graph applications. OpenCypher and Gremlin query languages can be used together over the same property graph data. Support for openCypher is compatible with the Bolt protocol, to continue to run applications that use the Bolt protocol to connect to.

Machine Learning

Security

Network Isolation

Amazon Neptune runs in Amazon VPC, which allows you to isolate your database in your own virtual network, and connect to your on-premises IT infrastructure using encrypted IPsec VPNs. Neptune’s VPC configuration is designed to help you configure firewall settings and control network access to your database instances.

Resource-Level Permissions

Amazon Neptune is integrated with AWS Identity and Access Management (IAM) and provides you the ability to control the actions that your AWS IAM users and groups can take on specific Neptune resources including Database Instances, Database Snapshots, Database Parameter Groups, Database Event Subscriptions, and Database Options Groups. In addition, you can tag your Neptune resources, and control the actions that your IAM users and groups can take on groups of resources that have the same tag (and tag value). 

Fine-grained access control

Neptune provides fine-grained access to users retrieving Neptune data plane APIs with AWS Identity and Access Management (IAM) for performing graph-data actions such as reading, writing, and deleting data from the graph, and non-graph-data actions such as starting and monitoring Amazon Neptune ML activities and checking the status of ongoing data plane activities.

Encryption

Amazon Neptune allows you to encrypt your databases using keys you create and control through AWS Key Management Service (KMS). 

Advanced Auditing

Amazon Neptune is designed to allow you to log database events with minimal impact on database performance. Logs can later be analyzed for database management, security, governance, regulatory compliance and other purposes. You can also monitor activity by sending audit logs to Amazon CloudWatch. 

Managed

Usability

Neptune database instances are pre-configured with parameters and settings for the database instance class you have selected. You can launch a database instance and connect your application without additional configuration. Database Parameter Groups provide granular control and fine-tuning of your database.

Operability

With Amazon Neptune, you do not need to create custom indexes over your graph data. Neptune is designed to provide timeout and memory usage limitations to reduce the impact of queries that consume too many resources.

Monitoring and Metrics

Amazon Neptune provides Amazon CloudWatch metrics for your database instances. You can use the AWS Management Console to view key operational metrics for your database instances, including compute, memory, storage, query throughput, and active connections.

Software Patching

Amazon Neptune is designed to keep your database up-to-date with the latest patches. You can control if and when your instance is patched via Database Engine Version Management.

Database Event Notifications

Amazon Neptune is designed to notify you via email or SMS of important database events like failover. You can use the AWS Management Console to subscribe to different database events associated with your Amazon Neptune databases.

Database Cloning

Amazon Neptune supports cloning operations, where database clusters can be cloned. Cloning is useful for a number of purposes including application development, testing, database updates, and running analytical queries. You can clone an Amazon Neptune database in the Management Console. The clone can be distributed and replicated across 3 Availability Zones.

Parallel Bulk Data Loading

Property Graph Bulk Loading

Amazon Neptune supports parallel bulk loading for Property Graph data that is stored in S3. You can use a REST interface to specify the S3 location for the data. It uses a CSV delimited format to load data into the Nodes and Edges. 

RDF Bulk Loading

Amazon Neptune supports parallel bulk loading for RDF data that is stored in S3. You can use a REST interface to specify the S3 location for the data. The N-Triples (NT), N-Quads (NQ), RDF/XML, and Turtle RDF 1.1 serializations are supported. 

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html. This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.