What Is a Graph Database?

The graph database defined

Graph databases are purpose-built to store and navigate relationships. Relationships are first-class citizens in graph databases, and most of the value of graph databases is derived from these relationships. Graph databases use nodes to store data entities, and edges to store relationships between entities. An edge always has a start node, end node, type, and direction, and an edge can describe parent-child relationships, actions, ownership, and the like. There is no limit to the number and kind of relationships a node can have.

A graph in a graph database can be traversed along specific edge types or across the entire graph. In graph databases, traversing the joins or relationships is very fast because the relationships between nodes are not calculated at query times but are persisted in the database. Graph databases have advantages for use cases such as social networking, recommendation engines, and fraud detection, when you need to create relationships between data and quickly query these relationships.

The following graph shows an example of a social network graph. Given the people (nodes) and their relationships (edges), you can find out who the "friends of friends" of a particular person are—for example, the friends of Howard's friends. 

An example of a social network graph

Use cases

Fraud detection

Graph databases are capable of sophisticated fraud prevention. With graph databases, you can use relationships to process financial and purchase transactions in near-real time. With fast graph queries, you are able to detect that, for example, a potential purchaser is using the same email address and credit card as included in a known fraud case. Graph databases can also help you easily detect relationship patterns such as multiple people associated with a personal email address, or multiple people sharing the same IP address but residing in different physical addresses. 

Recommendation engines

Graph databases are a good choice for recommendation applications. With graph databases, you can store in a graph relationships between information categories such as customer interests, friends, and purchase history. You can use a highly available graph database to make product recommendations to a user based on which products are purchased by others who follow the same sport and have similar purchase history. Or, you can identify people who have a friend in common but don’t yet know each other, and then make a friendship recommendation. 

Popular graph databases

Amazon Neptune

Amazon Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds latency. Neptune supports the popular graph models property graph and W3C's Resource Description Framework (RDF), and it also supports their respective query languages, Apache TinkerPop Gremlin and SPARQL, to allow you to build queries that efficiently navigate highly connected datasets. 

Neptune is highly available with read replicas, point-in-time recovery, continuous backup to Amazon S3, and replication across Availability Zones. Neptune is secure with support for encryption at rest. Neptune is fully-managed, so you no longer need to worry about database management tasks such as hardware provisioning, software patching, setup, configuration, or backups.

Amazon Neptune announcement at AWS re:Invent 2017

Neo4j

Neo4j is an open-source, nonrelational, native graph database that provides an ACID-compliant transactional backend for your applications. Neo4j is a native graph database because it efficiently implements the property graph model down to the storage level. Neo4j also provides full database characteristics, including ACID transaction compliance, cluster support, and runtime failover. Neo4j supports its own Cypher query language as well as Gremlin.

To get started using Neo4j, see the AWS Marketplace.