AWS Database Blog

Discover more insights in your graphs with new features from Amazon Neptune ML

Amazon Neptune ML is a feature of Amazon Neptune that brings the power of the state-of-the-art graph neural network (GNN) models to all graph developers. You can use Neptune ML for tasks like node classification, node regression, and link prediction. This allows you to train GNN models powered by the Deep Graph Library (DGL) to extract insights from graph data in various applications ranging from fraud detection to recommendations. Since its launch at AWS re:Invent 2020, Neptune ML has been available to AWS customers as a lab mode feature of Neptune. Starting with Neptune Engine version 1.0.5.0, Neptune ML will be available to all Neptune customers without the need to enable the lab mode feature.

In addition to being generally available to all Neptune customers, Amazon Neptune ML now includes more capabilities that make the process of machine learning (ML) on graphs easier, and allow you to discover even more insights in your graph data:

  • Support for a new class of graph learning tasks: edge classification and regression
  • Getting predictions on new nodes and edges in the graph without retraining the model
  • Simplifying and enhancing graph feature processing
  • Faster training across all tasks
  • Greater customization of model training and hyperparameter search

This post is the first of a three-part series on the new features available in Neptune ML:

  • In this post, we introduce the graph learning tasks of edge classification and regression now supported by Neptune ML
  • In Part 2, you learn how to use Neptune ML to apply a model trained on a previous snapshot of a graph to provide predictions for a new snapshot of the graph
  • In Part 3, we detail Neptune ML capabilities for enhanced feature processing, and faster and easier training

Be sure to read through the other posts in the series to get more details about the new capabilities.

Edge classification and regression

Neptune ML now supports edge classification and edge regression tasks. These tasks are used when your graph edges have properties, and you want to infer the property values for edges that have missing or unknown values. In this setting, you can use the known edge property values as the supervision targets to train a Neptune ML model. The trained model learns to generate representations for the nodes participating in an edge, and uses the nodes’ representations to predict the missing property value for the edge.

For example, in a social network, edges exist between users, and have an edge property to denote the type of the relationship: friend, family, colleague, and so on. You can use edge classification to predict a relationship class for edges that are missing this property. You can also use edge classification on the same graph to make binary predictions, such as whether two users connected by the edge live in the same household.

Edge regression also allows you to train a model to predict real-valued properties for edges in your graph. For example, in a social network graph, you can predict numerical quantities on an edge, like how long the friendship has lasted. Another use case for this is predicting the ratings on a user item interaction graph (item edges in a user).

The following diagram represents a graph data model on the MovieLens dataset from grouplens. A potential ML task on this dataset is to predict the rating attribute on the rated edge between a user and movie. This can be cast both a classification task on the edge property—each rating between 1–5 is one of five possible classes—or as a regression task, where the objective is to predict the real numeric value of the rating, like 3.8.

To train a Neptune ML model for edge classification or regression, you can simply follow the same training workflow as with node classification or regression. However, when exporting your data from Neptune, you must specify that you want to train an edge classification or regression model as well as the edge type and property you want the model trained on. For an example of how to do this during Neptune data export, see Parameters for the Neptune ML export process.

In the following example, for edge classification, we specify the rating property on the rated edge as our target for prediction by including these values in the targets subparameter of the additionalParams object:

"additionalParams": {
        "neptune_ml": {
          "targets": [
            {
              "edge": ["user", "rated", "movie"],
              "property": "rating",
              "type": "classification"
            }
          ],
          ....

For edge regression, you can set target type to regression:

"additionalParams": {
        "neptune_ml": {
          "targets": [
            {
              "edge": ["user", "rated", "movie"],
              "property": "rating",
              "type": "regression"
            }
          ],
          ....

For more information, see Using Amazon Neptune ML for machine learning on graphs. The following notebooks walk through an example of using Neptune ML for edge classification and edge regression on the MovieLens dataset.

Inference queries for edge classification and regression

After you train your edge classification or edge regression model, you can make predictions on your Neptune property graph directly using the Gremlin query language. Similar to node classification and link prediction, Neptune ML provides query extensions to the Apache TinkerPop Gremlin query language that allow you to get inferred values from a trained and deployed Neptune ML model by running Gremlin queries on your property graph.

For example, the following query uses the trained model, hosted on an Amazon SageMaker endpoint, to obtain a classification rating prediction for an edge between user user_1 and the movie Apollo 13 (1995):

%%gremlin
g.with("Neptune#ml.endpoint", '${endpoint}').
    V('user_1').outE('rated').where(inV().has('title', 'Apollo 13 (1995)')).
    properties('rating').with("Neptune#ml.classification").value()

The same query written for an edge regression model endpoint looks like the following:

%%gremlin
g.with("Neptune#ml.endpoint", '${endpoint}').
    V('user_1').outE('rated').where(inV().has('title', 'Apollo 13 (1995)')).
    properties('rating').with("Neptune#ml.regression").value()

The predicted values are read from the inference endpoint serving the classification or regression model. The predicted values can also be explicitly stored in the property graph using Gremlin.

You can also use the edge classification or edge regression queries to predict values on hypothetical edges that don’t exist in the graph. The source node and destination node of the hypothetical edge must be of the same types as those of the real edges used to train the model. For example, you can create an edge between the user user_1 and the movie Batman Forever (1995) and then use Gremlin to query the endpoint for the predicted rating on the movie by user1.

You can examine the notebook examples for edge classification and edge regression to see more examples of inference queries.

Conclusion

In this post, you saw how to use Neptune ML to train edge classification and edge regression models, as well as how to use the Neptune ML Gremlin query extensions to invoke a deployed edge classification or regression model to obtain predictions about edge properties in your graph.

To get started using Neptune ML, see the quick start guide. To learn more about additional features in Neptune ML, go to Part 2 of this series, which demonstrates how to use a Neptune ML trained model to get predictions for evolving graph data.


About the Author

Soji Adeshina is an Applied Scientist at AWS where he develops graph neural network-based models for machine learning on graphs tasks with applications to fraud & abuse, knowledge graphs, recommender systems, and life sciences. In his spare time, he enjoys reading and cooking.