AWS Database Blog

Get predictions for evolving graph data faster with Amazon Neptune ML

As an application developer building graph applications with Amazon Neptune, your graph data may be evolving on a regular basis, with new nodes and or new relationships between nodes being added to the graph to reflect the latest changes in your underlying business data. Amazon Neptune ML now supports incremental model predictions on graph data to allow you quickly obtain machine learning (ML) insights on new data that has been added to your graph.

This post is the second in a series of posts detailing new features of Neptune ML:

  • In Part 1, you saw how to use the new tasks supported by Neptune ML: edge classification and edge regression
  • In this post, you see how to use a Neptune ML trained model to get predictions for evolving graph data
  • In Part 3, you learn about the enhanced feature processing and greater customizations for faster training that Neptune ML provides

Incremental model predictions

When you train a Neptune ML model for a particular task like node classification or link prediction, the trained model and the predictions generated by the model are tied to the snapshot of the graph that was used during model training. However, as new data comes in, you may need to get predictions for nodes or edges that weren’t present in the graph during training. Neptune ML now supports providing predictions on new nodes and edges added to the graph, without having to retrain the underlying Neptune ML model, by providing a pipeline for incremental model predictions.

Consider the following scenario in which you have a property graph that you use for fraud detection on user account nodes. After a week, some new user accounts have been created that have some connections to existing user accounts in the graph. Previously, after the new user nodes were loaded into the Neptune graph, you needed to retrain the model to be able to obtain predictions for the new nodes. Now, to obtain model predictions on new data, you only need to run the following steps:

  1. Export the new graph snapshot from Neptune.
  2. Use the Neptune ML data processing API to process the exported graph snapshot and generate the input features.
  3. Run the model transform API to precompute model predictions and node representations.

This step doesn’t retrain the model. For more information, see the model transform documentation.

  1. Update the Neptune ML model endpoint so that it can serve predictions for the new data.

The following diagram illustrates this workflow.

Introducing the model transform API

The key behind the capability of using an existing model to get predictions for new data is the new model transform API of Neptune ML. The model transform API allows you to compute model artifacts like node embeddings on new processed graph data using pre-trained model parameters. The pre-trained model parameters are saved during the initial training phase on an earlier snapshot of the data.

You can call the model transform API by directly using the Neptune ML /ml/ endpoints, as in the following code:

curl -k -X POST \
    -H 'Content-Type: application/json' \
    https://localhost:8182/ml/modeltransform -d 
    '{
      "id" : "<job-id>",
      "dataProcessingJobId": "Completed data processing job id",
      "mlModelTrainingJobId": "Completed ml model training job id",
      "modelTransformOutputS3Location" : "s3://<bucket-name>/neptune-model-transform/"
    }'

Or you can use the graph notebook cell magic:

transform_params = f"""--id {job_id}
                       --dataProcessingJobId {dataProcessingJobId}
                       --mlModelTraningJobId {oldTrainingJobId}
                       --modelTransformOutputS3Location {outputModelLocation}
                    """
    
%neptune_ml modeltransform start --wait --store-to transform_results {transform_params}

To learn more about what parameters are accepted by a model transform job and what model artifacts are used by Neptune ML for predictions, see the ML management API reference and model transform page.

Obtain incremental predictions on new data

In this section, we show the overall workflow to get incremental predictions on a new graph snapshot. The following steps assume that you have already gone through the main Neptune ML workflow to train and deploy a model for an ML task and have a model endpoint running. To update the model artifacts and inference endpoint to give predictions for new data in the graph, follow these steps in order:

  1. Run the Neptune export command for incremental data, which is identical to the export command for a new model training task.

This exports the current snapshot of the graph training data to Amazon Simple Storage Service (Amazon S3). For instructions, see the documentation page on Using the Neptune-Export service to export data from Neptune for Neptune ML.

To perform data processing for incremental predictions, you need to ensure that Neptune ML uses the same feature encoding methods that were used during the original training workflow. Additionally, nodes that were present in the original training graph snapshot should retain their existing mapping to node IDs in the DGL graph for incremental predictions. Neptune ML allows you to do this by effectively copying the original data processing job to the new data processing job for incremental predictions.

  1. Pass in the parameter previousDataProcessingJobId in the create data processing job API call, as shown in the following code:
curl \ -X POST https://(your Neptune endpoint)/ml/dataprocessing \ 
-H 'Content-Type: application/json' \ 
-d '{ "inputDataS3Location" : "s3://(Amazon S3 bucket name)/(path to your input folder)",
      "id" : "(a job ID for the new job)",
      "processedDataS3Location" : "s3://(S3 bucket name)/(path to your output folder)",
      "previousDataProcessingJobId", "(a job ID of the previous data processing job)"
      }'

Alternatively, you can use the graph notebook Neptune ML Jupyter magic. This makes sure that the new graph data snapshot is processed using the same techniques as the previous snapshot. For more information, see the page on  processing the graph data exported from Neptune for training.

After the incremental data processing step is complete, you can use the model transform API to apply the previous trained model on the newly processed data.

  1. Invoke the model training API with both the incremental data processing job ID and the previous model training job ID (as shown in the example earlier but also repeated here):
curl -k -X POST \
    -H 'Content-Type: application/json' \
    https://(your Neptune endpoint)/ml/modeltransform -d 
    '{
      "id" : "( a job id for the transform job)",
      "dataProcessingJobId": "(Completed data processing job id from this workflow)",
      "mlModelTrainingJobId": "(Completed ml model training job id from previous workflow)",
      "modelTransformOutputS3Location" : "s3://<bucket-name>/neptune-model-transform/"
    }'

The model transform job creates the necessary model artifacts to provide predictions for the new data using the saved model parameters from the original training job.

Finally, you can update the existing model endpoint to use the new model artifacts created by the transform job.

  1. Invoke the endpoint API with the existing endpoint ID and the model transform job ID, as in the following template:
curl -k -X POST \
    -H 'Content-Type: application/json' \
    https://(your Neptune endpoint)/ml/endpoints -d '
    {
      "id" : "<existing-endpoint-id>",
      "mlModelTransformJobId": "<job-id of model transform job which should have completed>"
    }'  

The endpoint is temporarily unavailable while the update is in progress. When it’s complete, the endpoint comes back online and is ready to serve the predictions for the new data.

The notebook example on the Neptune workbench contains a full example to demonstrate these capabilities on an example dataset. You can run this by using the Neptune ML AWS CloudFormation quick start to launch a Neptune cluster with the Neptune workbench and Neptune ML.

Conclusion

In this post, you learned how to use the Neptune ML incremental prediction feature to obtain predictions for new graph data without retraining your model and how the new Neptune ML model transform API makes this capability possible.

This post is part of a three-post series on new Neptune ML features. Part 1 focuses on the new tasks of edge classification and edge regression. Part 3 details some additional capabilities in data processing and model training that make Neptune ML models more accurate and faster to train.


About the Author

Soji Adeshina is an Applied Scientist at AWS where he develops graph neural network-based models for machine learning on graphs tasks with applications to fraud & abuse, knowledge graphs, recommender systems, and life sciences. In his spare time, he enjoys reading and cooking.