AWS Quantum Technologies Blog

Community Detection using Hybrid Quantum Annealing on Amazon Braket – Part 2

Many customers are facing the challenge of efficiently extracting information hidden within complex network structures. For example, a healthcare insurance company needs to identify fraudulent claims through detecting abnormal relationships between patients and providers, a finance company needs an anti-money laundering tool that can detect abnormal transactions between different entities, or a marketing company needs to segment its audience for targeted marketing campaigns. All such problems are related to extracting network entity relationships and are known as community detection problems.

Approaches to solving community detection problems have been evolving rapidly over the past decade, from greedy algorithms such as Girvan-Newman [1] and Louvain [2], to nature-inspired algorithms like extremal optimization [3], and to deep learning models [4]. Recently Negre et. al. [5] explored and demonstrated the ability of a quantum computer, in particular a quantum annealer, to solve community detection as an optimization problem.

This post is the second of a two-post series about solving community detection using a hybrid classical-quantum annealing algorithm on Amazon Braket.

  • In Part 1, we provided a step-by-step guide on how to formulate community detection as a Quadratic Unconstrained Binary Optimization (QUBO) problem, similar to the work done by Negre et. al. [5]. We then demonstrate how to use the open source QBSolv library, which provides quantum-classical hybrid solvers for QUBO problems using a combination of classical compute resources and D-Wave quantum annealers, to solve community detection problems on Amazon Braket.
  • In this post, we demonstrate using Amazon Braket Hybrid Jobs to seamlessly solve community detection problems for real-world networks at different scales. We measure the performance of the annealer-based solution by the modularity metric and show they are comparable to the published state of the art (SOTA) results from heuristic methods.

This post assumes knowledge of how community detection problems are formulated and how solutions to these problems are typically evaluated (for example, modularity), as well as an understanding of how to set up D-Wave’s QBSolv library to solve these problems on Amazon Braket. For a review of these topics, see Part 1.

The detailed code implementation and tutorial notebook of using QBSolv for community detection in complex networks can be found in our AWS GitHub code repository.

Using Amazon Braket Hybrid Jobs for community detection at scale

Community detection for real-world networks can be a computationally intensive task that requires setting up an infrastructure with the right computing power and managing workloads with potentially long run times of up to 10 hours. To simplify the process of running hybrid quantum-classical algorithms, Amazon Braket Hybrid Jobs provides fully managed execution of hybrid quantum-classical workloads. It automates infrastructure setup, monitoring and logging, and provides prioritized access to the selected QPU for faster run times of your algorithm.

In this blog, we use the Hybrid Jobs feature to solve community detection for real-world networks at different scales and compare the results against classical SOTA approaches. Real-world networks can vary significantly in size and will require different types of classical compute resources to run the community detection algorithm. For example, small networks like the Zachary karate club network with 4 communities has 136 variables and can run on EC2 using an m5.xlarge instance, but large networks like the Cora network with 7 communities has 18,956 variables and will require using a larger instance like ml.m5.4xlarge to provide sufficient memory. To support range of requirements, Braket lets you select the most appropriate classical compute resources from a range of instance types. You can also pass hyperparameters to your algorithm when you create the job, in order to fine tune your approach without having to change the code at each run.

Hybrid Jobs can be created either through the Amazon Braket console (as shown in this example) or using the Amazon Braket SDK. Here we walk through the 3 major steps for preparing and submitting a job using the Amazon Braket SDK from a Jupyter notebook.

Step 1: Creating an algorithm script

In our AWS GitHub code repository, we have created a generic algorithm script for community detection using either the QBSolv classical solver or the QBSolv hybrid solver that uses D-Wave quantum annealers. Then you can use Braket to run your algorithm script in a pre-built Docker container, log job status and metrics into Amazon CloudWatch, and save the results to Amazon S3.

More examples of constructing algorithm scripts for hybrid quantum-classical algorithms can be found in the Amazon Braket examples GitHub repository.

Step 2: Specifying hyperparameters

When you create a job, you can specify hyperparameters that will be used in your algorithm at runtime. This is useful to iterate on the fine tuning of your algorithm, without having to change the code of your algorithm script.

The following code is an example of hyperparameters required by our community detection algorithm:

hyperparams = {
	"input_graph_file": "Zachary.weighted.edgelist", # str, the file name of the input graph
	"num_community": 4, # int, the number of communities to detect
	"solver_mode": "hybrid", # str, must be either 'classical' or 'hybrid'. Determines whether the classical or hybrid solver will be called
	"solver_limit": 100, # int, the maximum number of variables (n) for sub-QUBOs
	"num_repeats": 1, # int, the maximum iterations to repeat QBSolv solver execution to discover a new best solution
	"num_reads": 1000, # int, how many times the annealing is performed
	"seed": 1, # int, random seed
	"alpha": 5, # int, the penalty coefficient to enforce assigning only one community to each node
	}

# JSON encode hyperparameters as required by Amazon Braket Hybrid Jobs
hyperparams = {str(k): json.dumps(v) for (k, v) in hyperparams.items()}

Step 3: Submitting the job

You then configure the job by providing the following arguments:

  • device: The Amazon Resource Name (ARN) of the D-Wave QPUs that can be found in the list of Amazon Braket supported devices or in the Amazon Braket console. It will be stored as an environment variable for the algorithm script. Here we use D-Wave Advantage System 4.1.
  • instance_config (optional): The configuration of classical resources, such as instance type and data storage volume, to run the algorithm script. The script uses the ml.m5.xlarge instance type by default.
  • source_module: The path to a file or a Python module that contains your algorithm script. It will be uploaded to the container for running the job.
  • job_name: A unique string to identify the job. It appears in the Amazon Braket console and in the job ARN.
  • entry point: The path relative to the source_module. It points to the piece of code to be run when the job starts.
  • hyperparameters: The Python dictionary containing the hyperparameter names and values (as strings).
  • input_data: A dictionary that maps channel names to either a file location in the local environment or a path to your S3 bucket. We can also specify only a file location, in which case the channel name is treated as “input”.
  • wait_until_complete (optional): If True, the function call will wait until the job is completed, and will additionally print logs to the local console. Otherwise, it will run asynchronously. Defaults to False.

Now you can submit the job using the following code:

device_arn = 'arn:aws:braket:::device/qpu/d-wave/Advantage_system4' # D-Wave QPU Device ARN as the target QPU
aws_job = AwsQuantumJob.create(device=device_arn,
	instance_config=InstanceConfig(instanceType="ml.m5.2xlarge"),
	source_module="src",
	job_name="job-"+graph_name+"-" + str(int(time.time())),
	entry_point="src.braket_job_community_detection",
	hyperparameters=hyperparams,
	input_data={"input-graph": input_graph_path},
	wait_until_complete = True
	)

To monitor the job status in real time, you can access its Amazon CloudWatch Logs under the ‘Jobs’ tab in the Amazon Braket console. Because D-Wave QPUs are only available in the AWS ‘us-west-2’ Region, you will need to set the AWS Management Console Region to ‘us-west-2’ in order to see the status of your jobs.

Once the job is completed, you can view the results by downloading the result file from your S3 bucket or by printing the job results within the notebook:

print(aws_job.result())
{‘community_results’: “{‘modularity’: 0.41978961209730437, ‘num_comm’: 4}

Because the job execution is fully managed by Braket, you can submit and monitor multiple jobs simultaneously through the same notebook. These jobs will be automatically queued and run based on the QPU availability. You can use this feature to conduct hyperparameter tuning for a given network and benchmark this quantum annealing-based solution for different real-world networks as presented in the result section.

Here we show an example to submit four jobs for hyperparameter tuning by scanning the number of communities from 2 to 5 for the Zachary karate club network. We iterate over setting hyperparameter values and calling the job creation function within the following code:

for num_community in range(2, 6):
	name = f"hyper-param-job-{graph_name}-{num_community}-" + str(int(time.time()))
	
	# assign the number of communities to detect
	hyperparams["num_community"] = str(num_community)
	
	# submit a Braket job
	tmp_job = AwsQuantumJob.create(
		device=device_arn,
		source_module="src",
		job_name=name,
		entry_point="src.braket_job_community_detection",
		hyperparameters=hyperparams,
		input_data={"input-graph": input_graph_path},
		)

By running this code, you create four jobs simultaneously and can check their status under the ‘Jobs’ tab in the Amazon Braket console in the AWS ‘us-west-2’ region, as shown in Figure 1.

Figure 1: The Amazon Braket console shows you the status of your jobs. Iterating over the number of communities from 2 to 5, the first job (num_community=2) has completed, the second job (num_community=3) is currently running, and the final two jobs (num_community=4 and 5) are waiting in the queue to run.

Results of real-world network community detection

In this section, we present the results of real-world networks obtained using the Hybrid Jobs feature. The focus of this post is on a pedagogical introduction to the active research field in community detection, we do not attempt to achieve better results than published records for real-world networks listed in Table 1. If you want to benchmark the quantum annealing method to other existing methods for community detection [6], you can use our tutorial notebooks to, for example, conduct hyperparameter optimization or experiment with broader networks.

We consider seven real-world networks in this post: a Zachary’s karate club social network (Zachary), a social network of dolphins (Dolphins), a coappearances network of characters in the novel Les Miserables (LesMiserables), a collaboration network of Jazz musicians (Jazz), a metabolic network of the nematode C. Elegans (C. Elegans), a university email network (E-mail), and a scientific publication citation network (Cora). These networks are common benchmarks in community detection literature, for which we have known SOTA results to compare against, as per [5, 7, 8, 9]. The datasets of these networks can be downloaded from the website of network repository [10].

As with Part 1, we use modularity as our measure of solution quality. Recall from Part 1, Eq. 1 modularity is defined as:

Where:

 –   the Kronecker-delta δ(ci, cj) is 1 if node i and node j are in the same community (ci = cj) and 0 otherwise,

 –   Αij is the adjacency matrix for the graph,

 –   gi = ∑jΑij is the degree of node i,

 –   m = 1/2 ∑igi is the total number of weights in the graph.

The goal is to maximize the modularity M through optimizing community assignments ci for every node in the graph.

In Table 1, we show modularity values of real-world networks obtained from QBSolv classical solver and the hybrid solver, and compare them to published SOTA results from heuristic methods [7, 8, 9] and quantum annealing [5]. The QBSolv solvers were set with solver_limit = 100 as the upper limit size of sub-QUBO problems. The annealing time was 20 microseconds and numbers of shots are 1000, which are the default values. We set the maximum iterations to repeat QBSolv solver execution to discover a new best solution to one (i.e., num_repeats = 1) in order to minimize QPU cost. These hyperparameter settings are not necessarily optimal for all problems. Experimenting with different solver_limit values and increasing the num_repeats can result in better solutions.

Table 1: Modularity values M for real-world networks with N nodes, E edges, and k communities from published results (classical method and quantum annealing method), QBSolv classical solver solutions, and hybrid solver solutions. The best modularity values are highlighted in bold.

For small graphs like Zachary, Dolphins, LesMiserables, Jazz, and C. Elegans, the modularity results between QBSolv classical solver and hybrid solver are comparable and are also competitive with the quantum annealing results reported by Negre et. al. [5], which are themselves comparable to the cited classical SOTA [7, 8].

For larger networks like E-mail and Cora, which have more than 1000 nodes, we are not aware of any published quantum-based results; therefore, we only compare the QBSolv results to classical optimization methods: the extremal optimization (EO) algorithm [8] for the E-mail network, and the expectation-maximization (EM) algorithm [9] for the Cora network. For the Cora network, we found the QBSolv classical solver outperforms the EM algorithm, but the hybrid solver modularity value is slightly lower than the reported EM result. For the E-mail network, which has larger community numbers, both QBSolv classical and hybrid solutions have lower modularity value than the EO algorithm. The lower performance of QBSolv, especially for large graphs with a large number of communities (for example, ≥ 15), may come from the suboptimal decomposition of QUBO problems into subproblems in the hybrid quantum annealing algorithm. Further research on the scalability of the hybrid quantum annealing method and the development of QPUs with more qubits available will be needed to further improve the performance of this quantum annealing-based community detection technique.

Let’s have a look at the cost of running a job on Amazon Braket. Amazon Braket Hybrid Jobs have 2 pricing components: charges for the use of classical compute resource to run the container and charges for the use of QPUs to run quantum tasks. In most cases, the overall cost will be dominated by the QPU cost. Let’s look at a median size graph C. Elegans as an example, which has 2265 variables for the QUBO matrix based on 453 nodes and 5 communities. For the classical resource cost, we used an ml.m5.xlarge instance with 70 minutes run time, which incurred $0.27 cost based on the current job instance pricing of $0.00383/minute. For the QPU cost, the QBSolv hybrid job on D-Wave Advantage System 4.1 resulted in 180 tasks with 1000 shots per task and incurred a cost of $88.20 based on the current Amazon Braket pricing with $0.30/task and $0.00019/shot. It is hard to estimate the exact number of tasks for a graph ahead of a run because this not only depends on the solver_limit setting for subQUBOs, but also on the specific graph structure for embedding between source graph and D-Wave QPU topology. We recommend that you experiment and assess the potential QPU cost for QBSolv hybrid solver starting with a small instance of the problem before submitting a large job.

Conclusion

In this post, we showed how to run quantum annealing-based community detection tasks using Amazon Braket Hybrid Jobs. We assessed the solution quality and scalability by solving real-world community detection problems, which have applications in business areas like social network analysis, healthcare network analysis, and collusive fraud detection. We found that the QBSolv results are comparable to published SOTA results for small and large real-world graphs when the number of communities is relatively small (k < 15). For networks with large number of communities (k ≥ 15), we observed lower performance in this hybrid quantum-classical method, which might be due to the suboptimal decomposition of a large QUBO problem into sub-QUBO problems. The notebook and scripts provided in our AWS GitHub repository can be used as an example to study quantum annealing algorithms and benchmark solution performance for community detection problems using Amazon Braket Hybrid Jobs.

References:

  1. Girvan, M. and Newman, M. E. J (2002) Community structure in social and biological networks. PNAS, 99 (12) 7821-7826
  1. Traag, V.A., Waltman, L. & van Eck, N.J. From Louvain to Leiden (2019) Guaranteeing well-connected communities. Sci Rep 9, 5233
  1. Duch, Jordi and Arenas, Alex (2005) Community detection in complex networks using extremal optimization. Phys. Rev. E 72, 027104
  1. Liu, Fanzhen and Xue, Shan et. al. (2020) Deep Learning for Community Detection: Progress, Challenges and Opportunities. IJCAI, 4981 – 4987
  1. Negre, CFA and Ushijima-Mwesigwa H, Mniszewski SM (2020) Detecting multiple communities using quantum annealing on the D-Wave system. PLOS ONE 15(2): e0227538
  1. Sobolevsky, Stanislav and Campari, Riccardo and Belyi, Alexander and Ratti, Carlo. (2014) General optimization technique for high-quality community detection in complex networks. Phys. Rev. E 90, 012811
  1. Fu, YH and Huang, CY and Sun, CT. (2017) A community detection algorithm using network topologies and rule-based hierarchical arc-merging strategies. PLOS ONE. 12(11): e0187603
  1. Duch, Jordi and Arenas, Alex (2005) Community detection in complex networks using extremal optimization. Phys. Rev. E 72, 027104
  1. Yang, Tianbao and Chi, Yun and Zhu, Shenghuo and Gong, Yihong and Ji, Rong (2010) Directed Network Community Detection: A Popularity and Productivity Link Model. Proceedings of the 2010 SIAM International Conference on Data Mining (SDM), 742-753
  1. Rossi, Ryan A. and Ahmed, Nesreen K. (2015) The Network Data Repository with Interactive Graph Analytics and Visualization. AAAI https://networkrepository.com