Knowledge Graphs on AWS
Build a knowledge graph in Amazon Neptune by structuring and organizing information for easier access and understanding.
What is a knowledge graph?
If you searched the web for the longest river in the world, browsed through a list of recommended movies to watch on a Friday, or checked your daily schedule with Alexa, chances are you were interacting with a knowledge graph. Simply put, a knowledge graph is a means of structuring and organizing information for easier access and understanding. It “democratizes” data in an organization by allowing more people to understand and access data.
Graphs are a natural way to model and represent information about the world. This idea is not new, but has now become more viable via the introduction of scalable graph databases. Unlike traditional ways of managing data, such as relational databases, graph modeling is very flexible and allows for the real-world diversity and heterogeneity of data. This lets us model complex and complicated subject matter.
A knowledge graph captures the semantics of a particular domain using a set of definitions of concepts, their properties, relations between them, and logical constraints that are expected to hold. Logic built into such a model allows us to reason about a graph and information contained within, and to make implicit information in the graph explicitly accessible.
Knowledge graphs consolidate and integrate an organization’s information assets and make them more readily available to all members of the organization. There are many applications and use cases that are enabled by knowledge graphs. Information from disparate data sources can be linked and made accessible to answer questions you may not even have thought of yet. Information and entities can be extracted not only from structured sources (e.g., relational databases) but also from semi-structured sources (e.g., media metadata, spreadsheets) and unstructured sources (e.g., text documents, email, news articles).
Technologies for building knowledge graphs
The most effective way to build and store a knowledge graph is to use a graph model and a graph database. Graph databases are purpose-built to store and navigate relationships. Graph databases make it easier to model and manage highly connected data, treat relationships as “first class citizens,” have flexible schemas, and provide higher performance for graph traversal queries. There are different graph technologies that you may consider.
The World Wide Web Consortium (W3C) has created a set of technical specifications for representing graphs, querying them, and for building graph schemas: the specifications for the Resource Description Framework (RDF) graphs, the SPARQL query language, and the Web Ontology Language (OWL) collectively make up what is known as the Semantic Web, a vision of how knowledge could be structured and managed in a distributed fashion; work on the Semantic Web in the early 2000s laid the groundwork for modern knowledge graphs. The seminal article on this topic was published in Scientific American in 2001.
There are also other, de facto industry standards for graphs. Property graphs offer developers a different way to structure their graph model. Neptune supports both RDF as well as property graphs, the latter through the Gremlin query language from the Apache TinkerPop open source project.
Enterprise knowledge graphs
While the Semantic Web was intended to be a vast effort to make lots of knowledge accessible on the Web, an enterprise knowledge graph can use the graph technologies in the context of a single organization or company. An enterprise knowledge graph can integrate information from different parts of the company and between proverbial “data silos”, turning the data an organization collects into a valuable and useful asset. This promotes information sharing and reuse. Federated query capabilities built into the Amazon Neptune graph database allow you to build enterprise knowledge graphs that can also access public data on the Web to augment the knowledge contained therein.
Why do you need a knowledge graph?
Using Amazon Neptune to build an enterprise knowledge graph
Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. Amazon Neptune is purpose-built for storing billions of relationships and querying the graph with milliseconds latency. Amazon Neptune is compatible with open graph APIs, and supports popular graph models Property Graph and W3C's RDF, and their respective query languages Apache TinkerPop Gremlin and SPARQL. While graph databases usually require extensive hardware management, provisioning, and manual scaling, Amazon Neptune is a fully managed service, so you no longer have to worry about database management tasks. You can be up and running with an Amazon Neptune graph cluster in a matter of minutes with a few clicks in the AWS management console or with the AWS CLI.
Example - Steps to build an enterprise knowledge graph
1
Data integration:
You can integrate data from Amazon S3 directly into Amazon Neptune using the bulk loader. You can also integrate data from relational databases using AWS DMS.
2
Data extraction
Extract entities from unstructured text using Amazon Comprehend, Amazon Comprehend Medical, and Amazon Textract. You can extract meta-data from audio using Amazon Transcribe and from video using Amazon Rekognition. You can write the extracted entities into Amazon Neptune.
3
Graph database
Build your knowledge using Amazon Neptune using the triple store model to store your knowledge base. Amazon Neptune is compatible with open graph APIs, and supports popular graph models Property Graph and W3C's RDF, and their respective query languages Apache TinkerPop Gremlin and SPARQL. You can take advantage of Neptune Streams to detect real-time changes to a graph and trigger AWS Lambda functions.
4
Federated query
You may want to express queries across diverse data sources. Neptune implements the W3C SPARQL 1.1 Federated Query across multiple Neptune clusters or source data from SPARQL 1.1 endpoints. You can also use Athena Federated Query to write data into S3 and then load it into Neptune.
5
Visualization
You can use the Neptune Workbench to visualize your graph data using Jupyter Notebooks.
6
Search
You can search unstructured data using Amazon Opensearch Service and knowledge graphs using Amazon Kendra.
7
End-user application
You build web applications such as chat bots, drug discovery tool, investment analysis, supply chain dashboards using the enterprise knowledge graph.
Benefits of Amazon Neptune for knowledge graphs
Highly scalable and available
Fully-managed and cost-effective
Query federation
Customers
-
Siemens
Siemens is a global powerhouse focusing on the areas of electrification, automation, and digitalization. They were faced with isolated data silos from different departments that resulted in data inaccessibility, inefficient workflows, and low data quality. The built the Industrial knowledge graphs for capturing Siemens Domain Knowledge and provide knowledge graphs as a service.
-
ADP
ADP built its next generation HCM, within its Lifion digital transformation unit, using Amazon Neptune. Neptune easily builds queries that efficiently navigate highly connected datasets, enabling the next generation HCM team to build applications that use ADP’s wealth of data to answer complex workplace questions for a variety of use cases. With the fully managed Amazon Neptune, ADP eliminated database licensing, reduced Amazon EC2 costs, and enabled its team to focus on core business operations rather than database maintenance. But most importantly, it was able to use the purpose-built graph database to power complex queries and deliver to its customers advanced HR applications that it wouldn’t have been able to otherwise.
“We felt that Amazon Neptune was a slam dunk because our application was already using these open standards” — Zaid Masud, Chief Architect, ADP's next gen HCM
-
Audible for Business
Audible for Business needed to enable its enterprise customers’ administrators to maintain their own sets of end users and required a database that would scale to seamlessly manage the complex network of relationships. The company used Amazon Neptune, a managed graph database, to provide automated reporting and an increased self-service experience that could scale to support hundreds of thousands of end users.
Amazon Neptune helped give our team more flexibility, so we could take our product to the next level for our customers.
Kristina Flora, Audible for Business product marketing manager -
Amazon Neptune
Amazon Neptune is a key part of the toolkit we use to continually expand Alexa’s knowledge graph for our tens of millions of Alexa customers — it is just Day 1 and we are excited to continue our work with the AWS team to deliver even better experiences for our customers
David Hardcastle, Director of Amazon Alexa.