Amazon Neptune Documentation

Introduction

With Amazon Neptune, you can create sophisticated, interactive graph applications. SQL queries for highly connected data are complex and hard to tune for performance. Instead, Amazon Neptune allows you to use the popular graph query languages Apache TinkerPop Gremlin and W3C’s SPARQL to execute powerful queries that perform well on connected data. Amazon Neptune is designed to enhance database performance and availability by tightly integrating the database engine with an SSD-backed virtualized storage layer purpose-built for database workloads. Neptune's storage is designed to be fault-tolerant and self-healing, and disk failures are repaired in the background. Neptune is designed to detect database crashes and restart without the need for crash recovery or to rebuild the database cache. You can launch an Amazon Neptune database instance in the Neptune Management Console. Neptune scales storage and rebalances I/Os to provide consistent performance without the need for over-provisioning.

Performance and Scalability

Serverless option

Amazon Neptune Serverless is an on-demand deployment option that automatically adjusts database capacity based on an application’s needs. It can scale graph database workloads instantly to hundreds of thousands of queries, and you pay only for consumed capacity.

Throughput and Latency for Graph Queries

Amazon Neptune stores and navigates graph data, and uses a scale-up, in-memory optimized architecture to allow for fast query evaluation over large graphs. With Neptune, you can use either Gremlin or SPARQL to execute powerful queries.

Scaling of Database Compute Resources

Through the AWS Management Console, you can scale the compute and memory resources powering your production cluster up or down by creating new replica instances of the desired size, or by removing instances. 

Storage that Scales

Amazon Neptune is designed to grow the size of your database volume as your database storage needs grow. You don't need to provision excess storage for your database to handle future growth.

Low Latency Read Replicas

Increase read throughput to support high volume application requests by creating up to 15 database read replicas. Amazon Neptune replicas share the same underlying storage as the source instance, avoiding the need to perform writes at the replica nodes. This can help free up more processing power to serve read requests and reduces the replica lag time. Neptune also provides a single endpoint for read queries so the application can connect without having to keep track of replicas as they are added and removed.

High Availability and Durability

Instance Monitoring and Repair

The health of your Amazon Neptune database and its underlying EC2 instance is continuously monitored. Amazon Neptune is designed so that if the instance powering your database fails, the database and associated processes are restarted. Neptune recovery does not require the potentially lengthy replay of database redo logs to restart. It also isolates the database buffer cache from database processes.

Multi-AZ Deployments with Read Replicas

On instance failure, Amazon Neptune is designed to failover to one of up to 15 Neptune replicas you have created in any of three Availability Zones. If no Neptune replicas have been provisioned, in the case of a failure, Neptune will attempt to create a new database instance for you.

Fault-tolerant and Self-healing Storage

Amazon Neptune is designed to replicate each 10GB chunk of your database volume six ways, across three Availability Zones. Amazon Neptune uses fault-tolerant storage that is designed to transparently handle the loss of up to two copies of data without affecting database write availability and up to three copies without affecting read availability. Neptune’s storage is also designed to be self-healing; data blocks and disks are continuously scanned for errors and replaced.

Backups and Point-in-time Restore

Amazon Neptune's backup capability is designed to enable point-in-time recovery for your instance. Your backup retention period can be configured up to thirty-five days. Automated backups are stored in Amazon S3.

Database Snapshots

Database Snapshots are user-initiated backups of your instance stored in Amazon S3 that will be kept until you delete them. They leverage the incremental snapshots to help reduce the time and storage required. You can create a new instance from a Database Snapshot whenever you desire.

Global database

Amazon Neptune Global Database is designed for globally distributed applications, allowing a single Neptune database to span multiple AWS Regions. It replicates graph data with little impact to database performance, enables fast local reads with low latency in each Region, and provides disaster recovery in case of region-wide outages.

Security

Network Isolation

Amazon Neptune runs in Amazon VPC, which allows you to isolate your database in your own virtual network, and connect to your on-premises IT infrastructure using encrypted IPsec VPNs. Neptune’s VPC configuration is designed to help you configure firewall settings and control network access to your database instances.

Resource-Level Permissions

Amazon Neptune is integrated with AWS Identity and Access Management (IAM) and provides you the ability to control the actions that your AWS IAM users and groups can take on specific Neptune resources including Database Instances, Database Snapshots, Database Parameter Groups, Database Event Subscriptions, and Database Options Groups. In addition, you can tag your Neptune resources, and control the actions that your IAM users and groups can take on groups of resources that have the same tag (and tag value). 

Encryption

Amazon Neptune allows you to encrypt your databases using keys you create and control through AWS Key Management Service (KMS). 

Advanced Auditing

Amazon Neptune is designed to allow you to log database events with minimal impact on database performance. Logs can later be analyzed for database management, security, governance, regulatory compliance and other purposes. You can also monitor activity by sending audit logs to Amazon CloudWatch. 

Managed

Usability

Neptune database instances are pre-configured with parameters and settings for the database instance class you have selected. You can launch a database instance and connect your application without additional configuration. Database Parameter Groups provide granular control and fine-tuning of your database.

Operability

With Amazon Neptune, you do not need to create custom indexes over your graph data. Neptune is designed to provide timeout and memory usage limitations to reduce the impact of queries that consume too many resources.

Monitoring and Metrics

Amazon Neptune provides Amazon CloudWatch metrics for your database instances. You can use the AWS Management Console to view key operational metrics for your database instances, including compute, memory, storage, query throughput, and active connections.

Software Patching

Amazon Neptune is designed to keep your database up-to-date with the latest patches. You can control if and when your instance is patched via Database Engine Version Management.

Database Event Notifications

Amazon Neptune is designed to notify you via email or SMS of important database events like failover. You can use the AWS Management Console to subscribe to different database events associated with your Amazon Neptune databases.

Database Cloning

Amazon Neptune supports cloning operations, where database clusters can be cloned. Cloning is useful for a number of purposes including application development, testing, database updates, and running analytical queries. You can clone an Amazon Neptune database in the Management Console. The clone can be distributed and replicated across 3 Availability Zones.

Parallel Bulk Data Loading

Property Graph Bulk Loading

Amazon Neptune supports parallel bulk loading for Property Graph data that is stored in S3. You can use a REST interface to specify the S3 location for the data. It uses a CSV delimited format to load data into the Nodes and Edges. 

RDF Bulk Loading

Amazon Neptune supports parallel bulk loading for RDF data that is stored in S3. You can use a REST interface to specify the S3 location for the data. The N-Triples (NT), N-Quads (NQ), RDF/XML, and Turtle RDF 1.1 serializations are supported. 

Amazon Neptune ML

Amazon Neptune ML is a new capability of Neptune that uses Graph Neural Networks (GNNs), a machine learning technique purpose-built for graphs, designed to make more accurate predictions using graph data. Neptune ML can help improve the accuracy of most predictions for graphs when compared to making predictions using non-graph methods.
Making accurate predictions on graphs with billions of relationships can be difficult and time consuming. Existing ML approaches such as XGBoost can’t operate effectively on graphs because they are designed for tabular data. As a result, using these methods on graphs can take time, require specialized skills from developers, and produce sub-optimal predictions.
Using the Deep Graph Library (DGL), an open-source library to which AWS contributes, that helps users apply deep learning to graph data, Neptune ML is designed to select and train the best ML model for graph data, and lets users run machine learning on their graph directly using Neptune APIs and queries. 

Additional Information

For additional information about service controls, security features and functionalities, including, as applicable, information about storing, retrieving, modifying, restricting, and deleting data, please see https://docs.aws.amazon.com/index.html. This additional information does not form part of the Documentation for purposes of the AWS Customer Agreement available at http://aws.amazon.com/agreement, or other agreement between you and AWS governing your use of AWS’s services.