A graph database is a systematic collection of data that emphasizes the relationships between the different data entities. The NoSQL database uses mathematical graph theory to show data connections. Unlike relational databases, which store data in rigid table structures, graph databases store data as a network of entities and relationships. As a result, these databases often provide better performance and flexibility as they are more suited for modeling real-world scenarios.
What is a graph
The term “graph” comes from the field of mathematics. A graph contains a collection of nodes and edges.
Nodes are vertices that store the data objects. Each node can have an unlimited number and types of relationships.
Edges represent relationships between nodes. For example, edges can describe parent-child relationships, actions, or ownership. They can represent both one-to-many and many-to-many relationships. An edge always has a start node, end node, type, and direction.
Each node has properties or attributes that describe it. In some cases, edges have properties as well. Graphs with properties are also called property graphs.
The following property graph shows an example of a social network graph. Given the people (nodes) and their relationships (edges), you can find out who the "friends of friends" of a particular person are—for example, the friends of Howard's friends.
What are the use cases of graph databases
Graph databases have advantages for use cases such as social networking, recommendation engines, and fraud detection when used to create relationships between data and quickly query these relationships.
Graph databases are capable of sophisticated fraud prevention. For example, you can use relationships in graph databases to process financial transactions in near-real time. With fast graph queries, you can detect that a potential purchaser is using the same email address and credit card included in a known fraud case. Graph databases can also help you detect fraud through relationship patterns, such as multiple people associated with a personal email address or multiple people sharing the same IP address but residing in different physical locations.
The graph model is a good choice for applications that provide recommendations. You can store graph relationships between information categories such as customer interests, friends, and purchase history. You can use a highly available graph database to make product recommendations to a user based on which products are purchased by others who have similar interests and purchase histories. You can also identify people who have a mutual friend but don’t yet know each other and then make a friendship recommendation.
Route optimization problems involve analyzing a dataset and finding values that best suit a particular scenario. For example, you can use a graph database to find the following:
- The shortest route from point A to B on a map by considering various paths.
- The right employee for a particular shift by analyzing varied availabilities, locations, and skills.
- The optimum machinery for operations by considering parameters like cost and life of the equip-ment.
Graph queries can analyze these situations much faster because they can count and compare the number of links between two nodes.
Graph databases are well suited for discovering complex relationships and hidden patterns in data. For instance, a social media company uses a graph database to distinguish between bot accounts and real accounts. It analyzes account activity to discover connections between account interactions and bot activity.
Graph databases offer techniques for data integration, linked data, and information sharing. They represent complex metadata or domain concepts in a standardized format and provide rich semantics for natural language processing. You can also use these databases for knowledge graphs and master data management. For example, machine learning algorithms distinguish between the Amazon rainforest and the Amazon brand using graph models.
What are the advantages of graph databases
A graph database is custom-built to manage highly connected data. As the connectedness and volume of modern data increase, graph databases present an opportunity to utilize and analyze the data cost-effectively. Here are the three main advantages of graph analytics.
The schema and structure of graph models can change with your applications. Data analysts can add or modify existing graph structures without impacting existing functions. There is no requirement to model domains in advance.
Relational database models become less optimal as the volume and depth of relationships increase. This results in data duplication and redundancy—multiple tables need processing to discover query results. In contrast, graph database performance improves by several orders of magnitude when querying relationships. Performance stays constant even when graph data volume increases.
Graph queries are shorter and more efficient at generating the same reports compared to relational databases. Graph technologies take advantage of linked nodes. Traversing the joins or relationships is a very fast process, as the relationships between nodes are not calculated at query times but are persisted in the database.
How do graph analytics and graph databases work
Graph databases work using a standardized query language and graph algorithms.
Graph query languages
Graph query languages are used to interact with a graph database. Similar to SQL, the language has features to add, edit, and query data. However, these languages take advantage of the underlying graph structures to process complex queries efficiently. They provide an interface so you can ask questions like:
- Number of hops between nodes
- Longest path/shortest path/optimal paths
- Value of nodes
Apache TinkerPop Gremlin, SPARQL, and openCypher are popular graph query languages.
Graph algorithms are operations that analyze relationships and behaviors in interconnected data. For instance, they explore the distance and paths between nodes or analyze incoming edges and neighbor nodes to generate reports. The algorithms can identify common patterns, anomalies, communities, and paths that connect the data elements. Some examples of graph algorithms include:
Applications like image processing, statistics, and data mining use clustering to group nodes based on common characteristics. Clustering can be done on both inter-cluster differences and intra-cluster similarities.
You can partition or cut graphs at the node with the fewest edges. Applications such as network testing use partitioning to find weak spots in the network.
Graph searches or traversals can be one of two types—breadth-first or depth-first. Breadth-first search moves from one node to the other across the graph. It is useful in optimal path discovery. Depth-first search moves along a single branch to find all relations of a particular node.
When are graph databases not suitable
A dedicated graph database provides the most value for highly connected datasets and any analyses that require searching for hidden and apparent relationships. If this doesn’t fit your use case, other database types may be better suited.
For example, imagine a scenario where you need to record product inventory by item. You only need to store details like item name and available units. Since you don’t need to retain additional information, the columns on the table will not change. Due to the tabular nature, a relational database is better suited for such unrelated data.
It is also important not to use graph databases simply as key-value stores. A lookup result from a known key does not maximize the function of what graph databases were created to do.
How can AWS support your graph database requirements
Amazon Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds latency. Neptune supports the popular graph models—property graph and W3C's Resource Description Framework (RDF). It also supports respective query languages—Apache TinkerPop Gremlin and SPARQL—to allow you to build queries that efficiently navigate highly connected datasets. The top features of Neptune include:
- Serverless—enabling you to instantly scale graph workloads in fine-grained increments and save up to 90% on database costs vs. provisioning for peak capacity.
- Highly available—including Amazon Neptune Global Database for globally distributed applications supporting fast local read performance.
- Decoupled storage and compute so you can increase read performance with up to 15 read replicas that share the same underlying storage, without having to perform writes at the replica nodes.
- Highly reliable and durable with fault-tolerant and self-healing storage, point-in-time recovery, continuous backups, and more. Amazon Neptune makes your data durable across three AZs within a Region by replicating new writes six ways while you only pay for one copy.
- Highly secure with default encryption at rest, network isolation, and advanced auditing while provid-ing ability to control resource-level permissions with fine-grained access.
- Broad compliance coverage including FedRAMP (Moderate and High) to SOC (1, 2, and 3), and is HIPAA eligible.
- Fully managed, so you no longer need to worry about database management tasks such as hard-ware provisioning, software patching, setup, configuration, or backups.
Get started with graph databases on AWS by creating a free account today.