Unleashing the power of Graphs: operating 5G networks with GNN and generative AI on AWS

Abstract: Networks are inherently graph-structured, as such a ‘network graph’ representation can be built and leveraged to address network operations problems. This post is an introduction for how to use Graph and Graph-related techniques (machine learning (ML) and generative AI) with AWS services to transform 5G network operations. We use an open-source network dataset and we focus on the use case Next-Cell prediction to showcase how to 1) build a network graph, 2) query it with Graph query languages and generative AI, and 3) build predictions with Graph ML to derive insights into the mobile cells that have more connected users and anticipate resource allocation.

Our next post will focus on Root Cause Analysis with Spatio-Temporal network features.

Why Graph, Graph Neural Networks, and Generative AI for networks

Graph databases such as Amazon Neptune are specifically designed to store complex models of interconnected data at massive scale. Graph data is made up of nodes, edges, and their properties. Nodes are connected by edges, and both nodes and edges have their own individual properties. Graphs are used to gain insights into intricate patterns, discover hidden structures, and understand the dynamics of interconnected systems.

Graph Neural Networks and ML

Graph Neural Networks (GNNs) are a class of neural networks designed for processing graph-structured data, and capturing relationships between entities. They extend deep learning techniques to handle graph data by iteratively aggregating information from neighbouring nodes. GNNs enable deep learning models to operate on non-Euclidean domains, such as social networks and molecular structures, by embedding nodes in a high-dimensional space. This allows GNNs to learn representations that capture both local and global graph properties, facilitating tasks such as node classification and link prediction within deep learning frameworks. Then, GNNs can be used to make multiple types of inferences. Neptune ML uses GNNs to enable the following inference types (predictions):

Table 1 ML tasks on Graphs

Graph ML tasks	Description
Node Classification	Predicting the categorical feature of a vertex property.
Node Regression	Predicting a numerical property of a vertex.
Edge Classification	Predicting the categorical feature of an edge property.
Edge Regression	Predicting a numerical property of an edge.
Link Prediction	Predicting the most likely destination nodes for a particular source node and outgoing edge, or the most likely source nodes for a given destination node and incoming edge.

Neptune ML has two learning modes for making predictions; Transductive inferencing and Inductive inferencing.

Table 2 Inductive and Transductive learnings

For more detail about Transductive and Inductive reasoning, visit the Neptune ML documentation.

Network Next-Cell prediction use cases

Telecommunication networks are natively a graph, which comprehensively captures the interdependencies among network nodes and their attributes. A network graph is natively a heterogeneous structure, accommodating various node and edge types, known as Network topology. Nodes encompass a spectrum of entities such as users, locations, and diverse network components such as base stations, sites, routers, etc. Edges denote connections such as wireless links between network nodes or network interfaces, such as N1 in 5G networks, or user cell connectivity. The network is dynamic and evolving over time: new network nodes, new subscriptions, and new offers with temporal key performance indicators. Therefore, a graph representation for network is dynamic with temporal and topological changes, such as network topology enriched with measurements at the node and the edge levels. In this post, we focus on the network use case, in particular the Next-Cell prediction use case.

Figure 1 simplified view of 5G RAN with UE-Cell-gNB

Mobility prediction is a foundational tool for network operators to manage network performance. Particularly, with the Next-Cell prediction, network operators can predict to which mobile cells users are connected. The Next-Cell prediction is a foundational problem for network operators, and solving this task allows network operators to optimize network performance, manage cell-to-cell handover, identify congested cells, and realize energy savings.

We frame the Next-Cell prediction use case as a link prediction task between the node’s “user” and “cell” in the graph network data. As described previously, link prediction is a graph ML task, where the GNN in a self-supervised way uses the existing edges in the graph to predict the new edges with a high likelihood of being formed.

Beyond the use case we are showcasing in this post, in the context of the telecommunication networks, link prediction can be applied to other problems. For example, it can be used to predict the likelihood of a link between a gNodeB and its cells serving a UE. This information can be used to improve the performance of 5G networks:

Optimize the placement of gNodeBs: network operators can optimize the placement of gNodeBs to improve the performance of the network.
Predict the performance of the network: network operators can predict the performance of the network under different load conditions.
Identify potential problems: network operators can identify potential problems, such as overloaded gNodeBs or low-quality links.
Energy efficiency: track the unused or low used cells/gNodeB.

Solution overview

In this post we describe how to use Graph, GNN, and generative AI with AWS services for the next cell prediction use case and showcase a graph-data and AI pipeline. For that, we use open-source data (reference: https://activityinequality.stanford.edu/data/telecom-graph.html) to enable the reproducibility of the Proof-of-Concept. The dataset includes the relationship between users and mobile network cells in the Telecom network as well as other nodes. The data includes properties for each node and they are anonymized and normalized. The Nodes are “User” and “Cell” and the edges between are labelled “User-live-Cell”.

The dataset is augmented with some additional nodes and edges for the gNodeB for visualization purposes, but the ML model is trained using the initial Stanford dataset (the user and cell nodes). The dataset does not include temporal properties. To build the GNN and Generative AI pipeline for the network graph (user-cell-gnode), we perform the following steps:

1) load the public data set into an Amazon Simple Storage Service (Amazon S3) bucket and transform the dataset into an Neptune format with an Amazon Glue ETL Job: (Step 0 in the following figure)

2) Load the dataset transformed into a graph data from Amazon S3 to Neptune

3) query the loaded network graph with:

a) a graph query language, Gremlin query

b) in a natural language, Generative AI LLM for graph: (step 7)

4) export, process, training, and inference with GNN on the network graph data: (steps 1, 2, 3, 4, and 5)

Figure 2 Graph, GNN, and Generative AI solution architecture

Environment preparation

1) The quickest way to setup the environment for the post is to use the NeptuneML AWS CloudFormation quick start. You can select your preferred region and select ‘launch’ stack. The cloud formation installs the following:

A Neptune DB cluster.
The necessary AWS Identity and Access Management (IAM) roles (and attaches them).
The necessary Amazon SageMaker VPC endpoints.
A SageMaker notebook with pre-populated notebook samples for Neptune ML.
The Neptune-Export service.

2) This install takes roughly 20 min.

You also need to create or select an S3 bucket. The Neptune and SageMaker roles should have permissions to this S3 bucket. You can follow these instructions to configure the S3 bucket. The same is done for Amazon Bedrock, where you connect to the console and activate the model access.

3) Clone the code from the GitHub repo. The main folder is “04-Telco-Networks”

Data preparation and loading to Neptune

In order to load the graph data into Neptune, the raw data files were transformed using an AWS Glue Job to the Neptune Gremlin CSV load format. The headers for the public dataset on users, cells, and their edges are shown in the following figure.

Figure 3 Example of Neptune Gremlin CSV file format for a Node

Figure 4 Example of Neptune format an edge

The ETL Python script transforming the raw data into graph data compatible to the Neptune format can be found in the GitHub repo.

Then, the transformed data saved in Amazon S3 can be loaded to Neptune using the Neptune bulk-loader. You can use the Neptune Workbench load magic commands to run the bulkloader in the notebook as shown in the following figure. You can also modify the “Graph_init.ipynb” notebook to load the data into your Neptune database.

Figure 5 Load UI to load data to Neptune

Network visualization and Graph discovery

Once the graph data is loaded into the database, Neptune supports a number of tools and integrations for visualizing the graph. This post uses the open-source Graph Explorer tool.

Figure 6 Graph Explorer UI to visualize the network graph

Graph discovery
We use the Gremlin query language to query the network graph as well as the predictions. Furthermore, given the advances of LLM on graphs and the associated Langchain agents, we are also showing how to conduct graph discovery with Generative AI.

The following notebooks are shown:

BasicGraphStatistics.ipynb: this notebook shows how to use Gremlin to query the network graph.
AnalyticswithLLM.ipynb: this notebook shows how to use LLM to query the network graph.

The following figure shows examples of queries on the network graph.Figure 7 LLM to conduct graph discovery

Neptune ML workflow and Next-Cell predictions

The Neptune ML workflow consists of the steps described in this section. Every step could be automated with a step function. We showcased them step-by-step within the notebooks to help the readers reproduce the post. As a reminder, the network dataset in use in this post is fully anonymized with two types of nodes, cell and user, hundreds of numerical properties, and without temporal features.

Network data export and configuration – is enabled through Neptune-Export service to export data from Neptune into Amazon S3 in the CSV form. This step allows you to specify the subset of data for ML training as well as the graph ML task type, such as the link prediction here. This step automatically generates a configuration file training-data-configuration.json hat specifies how the exported data is loaded into a trainable graph and processed for training.

Network data pre-processing – The exported network dataset is pre-processed for training using standard techniques for feature encoding. Feature normalization can be performed for numeric data, and text features can be encoded using word2vec. A Deep Graph library (DGL) graph is generated from the exported dataset for the model training step to use. This step is performed by an Amazon SageMaker processing job in your account, and the resulting data is stored in an Amazon S3 location that you have specified. Further feature engineering techniques can be explored depending on the data format, frequencies and distributions, as well as other modelling approaches for spatio-temporal. SageMaker Hyperparameter Tuning Job can be used to accelerate and automate those steps.

Graph Model training – the model training step trains the ML model that is used for predictions. Model training is done in two stages:

The first stage uses an Amazon SageMaker processing job to generate a model training strategy configuration set that specifies the type of model and model hyperparameter ranges to be used for the model training.
The second stage uses the Amazon SageMaker model tuning job to try different hyperparameter configurations and select the training job that produced the best-performing model. The tuning job runs a pre-specified number of model training job trials on the processed data. At the end of this stage, the trained model parameters of the best training job are used to generate model artifacts for inference.

Inference endpoint creation in Amazon SageMaker – the inference endpoint is an Amazon SageMaker endpoint instance that is launched with the model artifacts produced by the best training job. Each model is tied to a single endpoint. The endpoint can accept incoming requests from the graph database and return the model predictions for inputs in the requests. After you have created the endpoint, it stays active until you delete it.

Next-Cell predictions with Neptune ML
In the following section we apply the previously described steps to the link prediction ML-task for the User and Cell Predictions. First, the process of Graph ML with Neptune ML and SageMaker as described previously starts by defining the ML task – link prediction during the Export step.

export_params={ 
"command": "export-pg", 
"params": { "endpoint": neptune_ml.get_host(),
            "profile": "neptune_ml",
            "useIamAuth": neptune_ml.get_iam(),
            "cloneCluster": False,
            "nodeLabels": ["user", "cell"],
            "edgeLabels": ["user_live_cell"]
            }, 
"outputS3Path": f'{s3_bucket_uri}/neptune-export',
"additionalParams": {
        "neptune_ml": {
          "version": "v2.0",
          "targets": [
            {
                "edge": ["user", "user_live_cell", "cell"],
                "type" : "link_prediction",
                "split_rate": [0.8, 0.1, 0.1]
            }
         ]
        }
      },
"jobSize": "xlarge"}
export_params

Next, you can start the Data processing, which performs feature engineering and constructs the DGL graph.

# The training_job_name can be set to a unique value below, otherwise one will be auto generated
training_job_name=neptune_ml.get_training_job_name('link-prediction')

processing_params = f"""
--config-file-name training-data-configuration.json
--job-id {training_job_name} 
--instance-type ml.r5.16xlarge
--s3-input-uri “ADD HERE YOUR S3 Bucket from previous step”
--s3-processed-uri {str(s3_bucket_uri)}/preloading """
%neptune_ml dataprocessing start --wait --store-to processing_results {processing_params

You can start the Model training by passing the ID of the data processing job as the input:

training_params=f"""
--job-id {training_job_name}
--data-processing-id {training_job_name}
--instance-type ml.g4dn.16xlarge
--s3-output-uri {str(s3_bucket_uri)}/training
--max-hpo-number 4
--max-hpo-parallel 2 """
%neptune_ml training start --wait --store-to training_results {training_params}

The data processing creates a model-hpo-configuration.json file with the range of hyperparameters and model configurations for training. This file can be modified to change the settings of the model training before running the previous model training command.

For example, to use edge features during GNN message passing, you can simply change the default value to true for the parameter use-edge-features as shown in the following image.

Note that setting this true makes the model support only the transductive prediction mode, as this setting is not yet supported for inductive predictions. Once the model training is complete, you can examine the output Amazon S3 folder for the training metrics. To deploy the model to an endpoint, you can simply pass the ID of the model training job as the input to the endpoint command.

endpoint_params=f"""
--id {training_job_name}
--model-training-job-id {training_job_name}
"""
%neptune_ml endpoint create --wait --store-to endpoint_results {endpoint_params}

In the following image you can see an example query that predicts the user to be connected to the cell_62000. The user_0 appears in the list of predicted users, even though the connection between this user and the cell was originally dropped.

You can follow the steps in the “TransductiveMode-CellPrediction-.ipynb” notebook to obtain the same results and each inductive learning step is described within the “InductiveModeCellPrediction.ipynb”.

The following example from the same notebook shows a different query to predict the cell that is not already connected and that a given user connects:

Key messages

In this post we illustrated how to use Graph, GNN, and Generative AI together with AWS services, and we described in detail a graph-data and AI pipeline for the known network problem, Next-Cell predictions.

We showed the following:

1) Initiate a network graph from network raw data.

2) Use GNNs to perform predictions, here it was regarding to which cells a given user is connected. We performed Link Prediction on the network graph to complement and discover the missing links and derive insights.

3) Use Graph Explorer for visualization.

4) Generative AI through Amazon Bedrock was used to show how to navigate, query, and discover the network graph with LLMs, thus accelerating the adoption of Graphs to solve real-world use cases, such as the telecommunication network domain. Anthropic Claude2 was the model used in this Proof-of-Concept.

In our future post, we cover the following: Spatio-Temporal Graph ML and Analytics for Mobile Network Knowledge Graph to support efficient and straightforward RCA.