Knowledge Graphs on AWS
Build a knowledge graph in Amazon Neptune by structuring and organizing information for easier access and understanding.
What is a knowledge graph?
If you searched the web for the longest river in the world, browsed through a list of recommended movies to watch on a Friday, or checked your daily schedule with Alexa, chances are you were interacting with a knowledge graph. Simply put, a knowledge graph is a means of structuring and organizing information for easier access and understanding. It “democratizes” data in an organization by allowing more people to understand and access data.
Graphs are a natural way to model and represent information about the world. This idea is not new, but has now become more viable via the introduction of scalable graph databases. Unlike traditional ways of managing data, such as relational databases, graph modeling is very flexible and allows for the real-world diversity and heterogeneity of data. This lets us model complex and complicated subject matter.
A knowledge graph captures the semantics of a particular domain using a set of definitions of concepts, their properties, relations between them, and logical constraints that are expected to hold. Logic built into such a model allows us to reason about a graph and information contained within, and to make implicit information in the graph explicitly accessible.
Knowledge graphs consolidate and integrate an organization’s information assets and make them more readily available to all members of the organization. There are many applications and use cases that are enabled by knowledge graphs. Information from disparate data sources can be linked and made accessible to answer questions you may not even have thought of yet. Information and entities can be extracted not only from structured sources (e.g., relational databases) but also from semi-structured sources (e.g., media metadata, spreadsheets) and unstructured sources (e.g., text documents, email, news articles).
Technologies for building knowledge graphs
The most effective way to build and store a knowledge graph is to use a graph model and a graph database. Graph databases are purpose-built to store and navigate relationships. Graph databases make it easier to model and manage highly connected data, treat relationships as “first class citizens,” have flexible schemas, and provide higher performance for graph traversal queries. There are different graph technologies that you may consider.
The World Wide Web Consortium (W3C) has created a set of technical specifications for representing graphs, querying them, and for building graph schemas: the specifications for the Resource Description Framework (RDF) graphs, the SPARQL query language, and the Web Ontology Language (OWL) collectively make up what is known as the Semantic Web, a vision of how knowledge could be structured and managed in a distributed fashion; work on the Semantic Web in the early 2000s laid the groundwork for modern knowledge graphs. The seminal article on this topic was published in Scientific American in 2001.
There are also other, de facto industry standards for graphs. Property graphs offer developers a different way to structure their graph model. Neptune supports both RDF as well as property graphs, the latter through the Gremlin query language from the Apache TinkerPop open source project.
Enterprise knowledge graphs
While the Semantic Web was intended to be a vast effort to make lots of knowledge accessible on the Web, an enterprise knowledge graph can use the graph technologies in the context of a single organization or company. An enterprise knowledge graph can integrate information from different parts of the company and between proverbial “data silos”, turning the data an organization collects into a valuable and useful asset. This promotes information sharing and reuse. Federated query capabilities built into the Amazon Neptune graph database allow you to build enterprise knowledge graphs that can also access public data on the Web to augment the knowledge contained therein.
Why do you need a knowledge graph?
Analyze unstructured and structured data together
Data, documents, and processes may be stored across teams and tools. By linking relevant pieces of data, you can build systems that can recommend people to projects, connect related projects, or centralize access to avoid duplicate efforts. You can extract entities from text-heavy content such as emails, word documents, PDF, and spreadsheets or meta-data from video, audio, and photos to build a knowledge graph. You can augment this knowledge graph with structured data from CRM and ERP systems to get a comprehensive view about a product.
Improve process efficiency and analyze dependencies
In manufacturing, you can track the different stages building and delivering a product from changes to inventory levels to store shipments using a knowledge graph. In life sciences, you can use a knowledge graph to track an experiment, trails and characteristics of drugs. In financial services, you can build a knowledge graph for the holding company of a security, the security and the beneficial holding. You can augment this graph with social media, industry events to record the relations to provide insights into dependencies between firms.
Build virtual assistants, chatbots or question-answering systems
Build context-aware systems that can derive at an answer based on queries and a vast knowledge base. As an example, sales teams can use question and answering system to analyze customer requirements and available collateral, and connect with experts to provide the best purchase action to customers.
Insights with Machine Learning
You can use machine learning services with knowledge graphs for better decision making and knowledge discovery. You can use machine learning services to build a knowledge graph from unstructured data such as text, audio or video. You can also use the knowledge graph as input to machine learning to build smarter systems to detect fraud or recommend a product.
A knowledge graph captures the semantics of a particular domain using a set of definitions of concepts, their properties, relations between them, and logical constraints that are expected to hold.
Using Amazon Neptune to build an enterprise knowledge graph
Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. Amazon Neptune is purpose-built for storing billions of relationships and querying the graph with milliseconds latency. Amazon Neptune is compatible with open graph APIs, and supports popular graph models Property Graph and W3C's RDF, and their respective query languages Apache TinkerPop Gremlin and SPARQL. While graph databases usually require extensive hardware management, provisioning, and manual scaling, Amazon Neptune is a fully managed service, so you no longer have to worry about database management tasks. You can be up and running with an Amazon Neptune graph cluster in a matter of minutes with a few clicks in the AWS management console or with the AWS CLI.
Example - Steps to build an enterprise knowledge graph
- Data integration: You can integrate data from Amazon S3 directly into Amazon Neptune using the bulk loader. You can also integrate data from relational databases using AWS DMS.
- Data extraction: Extract entities from unstructured text using Amazon Comprehend, Amazon Comprehend Medical, and Amazon Textract. You can extract meta-data from audio using Amazon Transcribe and from video using Amazon Rekognition. You can write the extracted entities into Amazon Neptune.
- Graph database: Build your knowledge using Amazon Neptune using the triple store model to store your knowledge base. Amazon Neptune is compatible with open graph APIs, and supports popular graph models Property Graph and W3C's RDF, and their respective query languages Apache TinkerPop Gremlin and SPARQL. You can take advantage of Neptune Streams to detect real-time changes to a graph and trigger AWS Lambda functions.
- Federated query: You may want to express queries across diverse data sources. Neptune implements the W3C SPARQL 1.1 Federated Query across multiple Neptune clusters or source data from SPARQL 1.1 endpoints. You can also use Athena Federated Query to write data into S3 and then load it into Neptune.
- Visualization: You can use the Neptune Workbench to visualize your graph data using Jupyter Notebooks.
- Search: You can search unstructured data using Amazon Elasticsearch Service and knowledge graphs using Amazon Kendra.
- End-user application: You build web applications such as chat bots, drug discovery tool, investment analysis, supply chain dashboards using the enterprise knowledge graph.
Sample steps to build a knowledge graph using Amazon Neptune
Benefits of Amazon Neptune for knowledge graphs
Highly scalable and available
Knowledge Graphs, by definition, store and process billions or even trillions of datasets. With Amazon Neptune, you can scale the compute and memory resources powering your production graph cluster up or down by creating new replica instances of the desired size, or by removing instances. Based on your database usage, your Amazon Neptune storage will automatically grow up to 64 TB, in 10GB increments, with no impact to database performance. There is no need to provision storage in advance. Amazon Neptune is highly available, with read replicas, point-in-time recovery, continuous backup, and replication across Availability Zones (AZs).
Fully-managed and cost-effective
Amazon Neptune reduces the cost of managing your graph database by eliminating the need for hardware and software investments and reducing operational burden. A knowledge graph built on Amazon Neptune will enable you to build a cost-effective, scalable, secure, and highly available customer data platform with your own proprietary business rules to respond to customer signals in real-time and inform their advertising and marketing journey orchestration workflows.
Data for Knowledge graphs can be sourced from data within an organization or public endpoints. Further, data can be partitioned across several Neptune clusters for performance or security reasons. With Amazon Neptune, you can use SPARQL 1.1 Federated Query to express queries across diverse data sources, whether the data is stored natively as RDF or viewed as RDF via middleware. For example, using a single SPARQL query you can combine data from Neptune with data coming from DBPedia.
Customer case studies
Siemens is a global powerhouse focusing on the areas of electrification, automation, and digitalization. They were faced with isolated data silos from different departments that resulted in data inaccessibility, inefficient workflows, and low data quality. The built the Industrial knowledge graphs for capturing Siemens Domain Knowledge and provide knowledge graphs as a service.
ADP built its next generation HCM, within its Lifion digital transformation unit, using Amazon Neptune. Neptune easily builds queries that efficiently navigate highly connected datasets, enabling the next generation HCM team to build applications that use ADP’s wealth of data to answer complex workplace questions for a variety of use cases. With the fully managed Amazon Neptune, ADP eliminated database licensing, reduced Amazon EC2 costs, and enabled its team to focus on core business operations rather than database maintenance. But most importantly, it was able to use the purpose-built graph database to power complex queries and deliver to its customers advanced HR applications that it wouldn’t have been able to otherwise.
“We felt that Amazon Neptune was a slam dunk because our application was already using these open standards” — Zaid Masud, Chief Architect, ADP's next gen HCM
Audible for Business needed to enable its enterprise customers’ administrators to maintain their own sets of end users and required a database that would scale to seamlessly manage the complex network of relationships. The company used Amazon Neptune, a managed graph database, to provide automated reporting and an increased self-service experience that could scale to support hundreds of thousands of end users.
“Amazon Neptune helped give our team more flexibility, so we could take our product to the next level for our customers.” — Kristina Flora, Audible for Business product marketing manager
“Amazon Neptune is a key part of the toolkit we use to continually expand Alexa’s knowledge graph for our tens of millions of Alexa customers — it is just Day 1 and we are excited to continue our work with the AWS team to deliver even better experiences for our customers,” — David Hardcastle, Director of Amazon Alexa.
Get started with Amazon Neptune, a fully managed graph database
Amazon Neptune is a fast, reliable, fully managed graph database service that makes it easy to build and run applications that work with highly connected datasets. The core of Amazon Neptune is a purpose-built, high-performance graph database engine optimized for storing billions of relationships and querying the graph with milliseconds latency. Amazon Neptune supports popular graph models Property Graph and W3C's RDF, and their respective query languages Apache TinkerPop Gremlin and SPARQL, allowing you to easily build queries that efficiently navigate highly connected datasets. Neptune powers graph use cases such as knowledge graphs, identity graphs, recommendation engines, fraud detection, drug discovery, and network security.
- Knowledge Graph Conference 2020 Talk: Are Knowledge Graphs a good thing? What does it take to build one
- Thermo Fisher Scientific use case - Complement Commercial Intelligence by Building a Knowledge Graph out of a Data Warehouse with Amazon Neptune