## AWS Spatial Computing Blog

# Modeling Spread of Infectious Disease Using Spatial Simulations on AWS

Spatial simulations compute the motion and behavior of dynamic entities across 2D and 3D virtual environments. These simulations allow users to explore how the behaviors of individual actors and interactions amongst a group of actors can lead to emergent spatial patterns. The authors of the blog see customers using spatial simulations for use cases such as urban planning, emergency evacuation, crowd simulation, defense synthetic training, and gaming. In this blog, the authors combined high resolution spatial simulations run on Amazon Web Services (AWS) with the Delineo Disease Modeling Project (DMP) developed by the Malone Center for Engineering in Healthcare and the Institute for Assured Autonomy at Johns Hopkins University (JHU) to generate valuable insights into disease propagation metrics and forecast various scenarios at the community level.

In the following sections, the authors will walk through the steps of building and scaling spatial simulations on AWS for the Delineo DMP, present the insights provided by such a model, and demonstrate scaling up to 240,000 simulated people (dynamic entities) using AWS SimSpace Weaver.

### Introduction to Infectious Disease Modeling using Spatial Simulations

Mitigating the impact of infectious diseases (such as COVID-19) requires a firm understanding of how transmission occurs within and across communities with different demographics and geographical spread. The disease propagation models generally span a range of fidelity and computational complexity, from simple compartment models represented by ordinary differential equations, such as the SIR (susceptible, infectious, recovered) model to more complex network models, to higher fidelity spatial models that account for each individual entity’s behaviors (i.e. a person’s movement).

The Delineo DMP, developed by researchers at Johns Hopkins University, is a software simulator that provides insight into the spread of diseases within an urban, suburban, or small-town environment. In particular, DMP is able to use demographic information, pre-existing conditions and the distribution of people across facilities to compute the rate of spread of infectious diseases using the Wells-Riley model. The synthetic population is created with relevant underlying medical conditions, age, gender, and other demographics with the Python package SynthPops.

On its own, DMP uses statistics-based machine learning algorithms to determine how many people were in each facility at a given time, but no actual people movement was modeled. Incorporating DMP with spatial simulation, with actual people moving from one facility to another, enables capturing the mechanisms of disease spread with high fidelity, as well as studying the impact of counter-measures such as mask wearing or limiting movement through restrictions. However, spatial simulations incorporating hundreds of thousands of people are computationally expensive, given that a multitude of simulations (>20) have to be calculated in order to obtain statistically valid results. In the following sections, the authors will walk through the steps of building and scaling spatial simulations on AWS for the Delineo DMP.

### Building Spatial Simulation for Infectious Disease Modeling

*Step 1 – Creating and initializing entities (people)*: As part of the initialization, people are added randomly into households of varying sizes. Every member of the population goes to different facilities on a daily basis (probabilistically assigned) during the 100 days of the simulation. The chances of them getting infected are calculated based on the Wells-Riley disease propagation equation within indoor confined environments and their characteristics [1]:

where *P* represents the probability of infection, *i* represents the number of infecting individuals within the confined space, *a* represents average breathing rate of individuals within the space, *q* is a Quanta generation rate of infection, *t* is the exposure time and *Q* is the airflow rate from the HVAC system of the facility.

People’s severity of infection can be classified by a 6-point scale ranging from uninfected, asymptomatic, mild, severe, critical to finally recovered. Each facility is separated into different categories, such as restaurants or hospitals, and they have different parameters for the Wells-Riley equation, which also affect their infection rates.

There are different interventions in the simulation, such as mask wearing, percentage of population vaccinated and stay-at-home mandates. The user is able to adjust all of these parameters and view the effects on disease spread. At the end of the simulation, it is possible to see how many people were infected in each facility and when they were infected.

*Step 2 – Creating behavior of the entities (people)*: In order to capture the movement patterns, entities are assigned specified waypoints corresponding to individual parks, facilities, etc. and these waypoints are classified into categories, such as restaurants, parks, stores, and gyms. Each of these categories has bounds on the phenomenological constants associated with the Wells-Riley model, and individual instances of these exhibit slight variations. In order to model perpetual motion within a city, entities are assigned new waypoints randomly upon reaching a desired destination facility after spending a required amount of time within the facility.

The spatial motion of people is modeled through a particle simulation on a 2D plane, representative of an urban downtown area. Instead of modeling an actual city, a generalized layout with geographical regions grouped together as facilities is used. The *Eikonal* equation is solved using the fast-marching method to compute velocity updates from the underlying flow field with an efficient *kD-Tree* search algorithm to find nearest neighbors for collision avoidance [2].

Here, *x* represents the spatial dimensions in 2D, *∇* represents the mathematical divergence operator, *φ* represents the weighted field of distance between an entity and its destination, *C* is the cost function and the domain *Ω* is a geographical area of the domain. The boundaries of the domain and internal obstructions are defined as a collision mesh, while the walking paths within the domain are considered as navigation meshes [3] to compute the flow fields.

*Step 3 – Initializing the simulation*: A 2D map is generated with a random distribution of facilities and people, and people are spread statistically within the map. 1% of the population is seeded with an infected state. People physically move across the map towards their assigned destinations at each timestep of the simulation. Within facilities, disease propagates between two individuals in close proximity. The total number of individuals in each of the 6 states (uninfected, asymptomatic, mild, severe, critical, and recovered) are plotted over time to observe macroscopic trends, while the microscopic clusters can also be identified through observations of movement patterns of individuals together with their disease state.

### Forecasting Spread of Disease

When forecasting the spread of disease, best and worst-case scenarios need to be evaluated by running ensemble simulations through randomized initializations of people’s locations, initial disease state and demographics. For this study, the initial locations of people, and their initial destinations or waypoints, are changed while keeping all other factors, such as demographics, number of facilities and the map constant. This allows researchers to understand probabilistic characteristics of the propagation curves across a wide range of initial conditions. The simulation results presenting in this section were calculated based on 10,000 entities (people). The compute was performed on a single Amazon Elastic Compute Cloud (Amazon EC2) instance.

The results of the representative simulation are shown in Fig. 1. Here, the number of people at each disease state is shown against time for each of the disease states. The bold curves indicate the mean of the bounding simulations, while the shaded envelopes around each curve indicate the best and worst-case bounds for the given set of conditions.

The simulations are able to capture a number of insights into the macroscopic characteristics of spread for this synthetic population. The total infection curve indicates four inflection points or plateaus where the rate of infection decreases temporarily before increasing. These indicate cycles of infection, an emergent phenomenon that is also seen more clearly in the number of people who are critically ill from the infection.

Interestingly, the cycles of infection appear to be agnostic to the random initialization of people’s locations, as the cycles are approximately at the same timeframe for each of the simulations. The implication from the model is that the cycles and approximate peaking of total infections are dependent on a few factors: 1) the nature of the geographical map/facilities, 2) commuter statistics, and 3) demographics. However, the cycles don’t depend on the physical distribution of people at a given time, and their preferred places to visit within the map.

Fig. 2 shows a top-down view of spatial entities represented by colored cones based on disease state at different times during the simulation. In order to further understand the impact of facility locations and how it statistically impacts the total disease spread, the authors change the total number of facilities while keeping the demographics constant.

Fig. 3 illustrates the scaling study in changing the number of facilities within this map, and observing the disease spread characteristics. Notably, while there are complex characteristics of each curve, there is convergence when increasing the number of facilities from 30 to 120. The simulation model predicts specific characteristic curves of spread, which are independent of people’s initial locations, preferred places of visit, or total number of facilities. Instead, the results are highly relevant to the geographical map and demographics, implicitly commuter’s paths and distances, and the average time spent at facilities.

### Scaling Up Spatial Simulations with AWS SimSpace Weaver

AWS SimSpace Weaver is a managed service for scaling and running spatial simulations in the cloud. SimSpace Weaver allows developers to scale spatial simulations across multiple Amazon EC2 instances by handling the underlying infrastructure, networking, and data management. By using SimSpace Weaver, the number of entities (people) simulated within the JHU Delineo DMP simulation can be increased from 10,000 to greater than 200,000, allowing for higher resolution simulations to understand emergent behavior within a larger population.

There are several steps required to scale up a spatial simulation with SimSpace Weaver (Fig. 4). SimSpace Weaver handles entity management through the SimSpace Weaver State Fabric, which allows users to read/write data similar to a high-speed in-memory database, transfer information about entities crossing spatial divisions, and handle data replication. The spatial application logic is then able to compute the locations of entities in the next simulation timestep.

SimSpace Weaver divides the simulation area into partitions (spatial divisions), which are each run on individual compute resource units in parallel. Between two partitions, the data regarding entity locations and other fields is transferred, either from one block to another within an Amazon EC2 compute instance, or across compute instances (discussed in a previous blog, New AWS SimSpace Weaver–Run Large-Scale Spatial Simulations in the Cloud). In this study, the JHU Delineo DMP application is deployed across 10 c5.24xlarge instances, with each instance representing 1/10 of the map. A GPU instance, such as an Amazon EC2 G4 or G5, can be used to render the simulation with game engines such as Unreal Engine 5 or Unity. The rendered simulation can be viewed on a remote client through streaming via NICE DCV server connected to the remote client as a web stream or NICE DCV client, shown as Fig. 5. Finally, the spreading of infectious disease across greater than 200,000 people that was simulated by JHU Delineo DMP and SimSpace Weaver can be visualized on a marco-level, or zoomed in to micro-level for specific area of interest.

Fig. 6 shows the development of disease spreading for the scaled population of approximately 212,000. Simulating at this large scale of population for the first time led to some emergent patterns: The disease spreading cycle reduced to approximately a week due to the large population inside a relatively confined geospatial area. Since spatial simulation models have collision avoidance logics that don’t allow entities to crash into each other, entities attempted to find alternative empty pathways to pass, which were difficult. As a result, local isolations were created, and insufficient exchange of population between facilities confined the spread of disease. Approximately 35% of the population were never infected by the disease before the rest of the population recovered. While this result demonstrates the feasibility of running larger scale simulations it raises its own new series of questions on how different levels of people’s mobility can play an important role in the spreading cycle of a disease. With this large-scale simulation capability, the authors will address these in future studies.

### Summary

In this blog, the authors have shared how spatial simulations on AWS allow for modeling the spread of disease in urban environments. Insights into macroscopic trends, such as waves of infection, as well as forecast curves of lower and upper bounds on disease propagation, can be obtained through such studies, together with microscopic observations on clustering regions and patterns. Such insights can be used to inform public policy decisions and observe their effect within synthetic environments. Additionally, by using AWS SimSpace Weaver, such simulations can be scaled up to hundreds of thousands of individuals.

### References

[1] Riley, E. C., G. Murphy, and R. L. Riley. Airborne spread of measles in a suburban elementary school. American journal of epidemiology. 107.5 (1978): 421-432.

[2] Maury, Bertrand, and Sylvain Faure. Crowds in Equations: An Introduction to the Microscopic Modeling of Crowds. World Scientific, 2018.

[3] Van Toll, W., Triesscheijn, R., Kallmann, M., Oliva, R., Pelechano, N., Pettré, J., & Geraerts, R. (2016, October). A comparative study of navigation meshes. In Proceedings of the 9th International Conference on Motion in Games (pp. 91-100).