Accelerating green-urban planning simulations with AWS Batch

Despite the increasing risk of environmental threats to cities across the world, and the critical role the natural world plays in protecting cities and their residents from climate extremes, effective green urban planning receives little representation in city-planning and infrastructure decisions.

We know that urban forests [1][2] have a wide range of economic and social benefits, and are one of the most effective ways to reduce climate risk in cities.

In this blog post, we’ll explore how the Netherlands-based organization Lucidminds AI is addressing this challenge through their Green Urban Scenarios simulator (GUS). The team initially built GUS to power the TreesAI project, a collaboration between Lucidminds and Dark Matter Labs. It’s a tool built on AWS so that urban planners, researchers, green portfolio managers, and others can explore the impact of green infrastructure on the urban environment through the power of digital twins and numerical simulations scaled using AWS Batch.

Trees – a natural city defender

Trees are natural carbon sinks. As trees grow, they absorb the greenhouse gas CO2 and sequester carbon in their fibers until an event like combustion or decomposition releases the carbon back into the atmosphere as CO2. Even if a tree is harvested for wood, the carbon remains locked away and out of the atmosphere. Trees provide essential habitat for wildlife like birds and their canopies provide shade and shelter which reduces the amount of energy needed to cool buildings. Their roots help prevent soil erosion, and they improve air quality by removing and breaking down pollutants such as sulfur dioxide.

In short, planting trees is one of the most effective ways to combat climate change and promote biodiversity in cities. But urban planners and decision makers need tools to understand what the best strategies are for building or strengthening their urban forests, such as where trees should be placed, what types of trees to plant, and the optimal level of tree maintenance required. Critically, it’s also important to understand the impacts of proposed urban forest plans, in particular the estimated amount of carbon sequestered under different scenarios.

Nature-based solutions that use digital technologies, like Green Urban Scenarios (GUS), are making it easier for cities to understand how trees can benefit their communities. With a better understanding of how trees reduce emissions, governments and businesses can make more informed decisions about which trees to plant where, and how-to best care for them.

How the “Green Urban Simulator” works

To accurately simulate hypothetical green urban planning projects for a city, the simulations must reflect the unique characteristics of the city in question. Every city has its own built environment and geography that defines its associated urban forest.

Figure 1 – Green Urban Scenarios simulator (GUS) is a tool built on AWS so that urban planners, researchers, green portfolio managers, and others can explore the impact of green infrastructure on the urban environment.

For simulations to be useful, they must be initialized with data that reflects the real environment. Therefore, the first step in the process of green urban simulation is to capture and digitize the unique aspects of a city’s urban forest, namely the spatial coordinates, species, diameter, height, and current condition of individual trees across a city.

Lucidmind’s team uses a combination of available datasets including satellite imagery, street view images, and field surveys, to build digital urban forests. This data is collected into a CSV file where each line represents a single tree and columns that correspond to the tree characteristics we described. The CSV file is then fed into the simulation modules to initialize the computational domain of a green urban simulation.

Figure 2 – Characteristics of Urban Forests Complex Systems consider specificity, heterogeneity and dynamic interaction between trees and their environment

To illustrate the capabilities of GUS, the Lucidminds team performed simulations of the city of Amsterdam, the Dutch capital, that incorporated over 250,000 individual trees. That is just the trees currently growing in Amsterdam. One of the powerful features of simulations is the ability to examine hypothetical scenarios – such as the city of Amsterdam planting another 100,000 trees – to understand how their urban forest might evolve and sequester carbon over the next 30 years.

There’s an endless number of possible arrangements for trees that can be simulated for any city in the world. Of course, as the number of trees in any simulation grows, so can the computational expense required to complete the simulation. To ensure that their simulations could scale along with the simulation ambitions, the Lucidminds looked to the power of AWS’s massive computing capacity and HPC services like AWS Batch.

Running GUS on AWS

The GUS engine is written in Python and is available as an open source Python package called ‘pygus’.

Like many simulations, there are phenomena represented in the simulations that behave probabilistically, such as the chance on any individual simulated day that a storm materializes resulting in precipitation that impacts some or all the trees in the city.

If you’re not familiar with simulations with probabilistic mechanisms, the important thing to understand is that if probabilities are embedded in a simulation engine, a user can run the exact same simulation code with the exact same simulation inputs (like tree data) and the two simulations will almost certainly produce different results. This is a perfectly acceptable outcome, one that comes with the territory of simulating scenarios where we can’t know for sure every single sub-event of a simulation, but we can assign a probability distribution to the occurrence of those events and draw from that distribution during the simulation.

The drawback to incorporating probabilities in simulations, however, is that it becomes necessary to run the simulations many times over to understand all the different possible outcomes. This mathematical technique is called Monte Carlo simulation and is widely used in simulations across the fields of physics, finance, biology, and engineering, to name a few. Depending on the complexity of the simulation and the number of probabilistic variables/events, hundreds or thousands of simulations might be necessary to understand the emerging behavior. And that’s just for one scenario or set of input parameters. If you want to simulate a different scenario, say where you replace a few dozen trees near the city center with trees of a different species, then you might need to run those thousands of simulations again.

If a single simulation iteration (a term used in this context to describe just one run of a simulation) takes a fraction of a second, then it’s not a big deal to run a thousand simulations one after the other to get an ensemble. That would only take a few minutes and you’d have a robust set of results to examine. Many simulations, however, take much longer to run. For instance, a single GUS iteration for Amsterdam’s 250k trees takes a little over an hour (~75 minutes) on a typical laptop or equivalent EC2 instance and the number of iterations needed to confidently simulate a single scenario is around a thousand.

Running each iteration back-to-back on a single laptop or a similar EC2 instance would take about 2 months, which is obviously not a feasible time frame to make these simulations useful for decision makers, especially considering that there are likely dozens or hundreds of scenarios that should be simulated when developing an urban planning project.

The solution to this problem is to embrace the highly parallel nature of Monte Carlo-based simulation. For the 1,000-simulation set described above, each iteration is independent of the other 999, meaning that we can split up the 1,000 simulations, run them separately and even at the same time, and then combine the results at the end. So, in theory we could spin up 1,000 EC2 instances, run our simulation code with the same inputs on each instance, dump the results to a shared storage location (say, an Amazon S3 bucket), and complete all of that in roughly the same 75 minutes that it would take to run a single simulation iteration.

But how do we go about orchestrating all those compute instances and getting our simulation code on each one? Enter AWS Batch.

AWS Batch is a fully managed batch computing service that plans, schedules, and runs containerized batches or machine learning (ML) workloads across the full range of AWS compute offerings, such as Amazon ECS, Amazon EKS, AWS Fargate, using Spot or On-Demand Amazon EC2 instances.

Batch is a great solution for performing Monte Carlo solutions because it handles a lot of the undifferentiated heavy lifting of setting up the compute resources needed to run simulations leaving you more time to focus more on developing the actual simulation code and analyzing it’s results.

To run a GUS of Amsterdam, Lucidminds took simplicity a step further by using AWS Fargate in combination with AWS Batch. AWS Fargate is a serverless, pay-as-you-go compute engine that lets you focus on building applications without managing servers or choosing instance types. With Fargate you simply package your application in containers, specify the CPU and memory requirements, define networking and IAM policies, and launch the application. Fargate then launches and scales the compute to closely match the resource requirements that you specify for the container.

To illustrate what running Monte Carlo simulations using AWS Batch looks like in practice, let’s look at the steps that Lucidminds took to get their Python-based simulation application running at scale using AWS resources. We’ll break down the steps into two distinct phases: 1) containerizing and testing the simulation application and; 2) scaling the simulation application using AWS Batch.

As a helpful guide to understand how the various AWS services used in this post (simulation application and scaling) fit together, refer to the reference architecture diagram in Figure 3.

Figure 3 – Reference architecture diagram for running Monte Carlo simulations.

Phase 1: containerize, set up, and test simulation application on Amazon Elastic Container Service (Amazon ECS)

Containerize the simulation application – AWS Batch runs jobs using Docker container images. In this post we’re not going to cover how to containerize applications, just mention that is a part of the process. If you’re new to containers, check out this quick primer on the subject: AWS: What is Containerization? You can containerize applications using Docker technology on your local machine or using AWS services like AWS Cloud9, depending on where you are developing your applications. Lucidminds built and tested their containers on their local machines.
Push container image to Amazon Elastic Container Registry (Amazon ECR) – Once you have a containerized application, we need to get it to an image repository. Lucidminds used ECR, which is a fully managed container registry offering high-performance hosting, so you can reliably deploy application images and artifacts anywhere. The first step is to create an image repository, which can be done via the AWS Command Line Interface (CLI) or using the AWS console. If you’re unsure about what commands are needed to tag and push images to an ECR repository, simply create a repository through the AWS console, navigate to the repository and click View push commands to view the steps to push an image to your new repository.

Figure 4 – Creating an Image repository for your containerized application on Amazon Elastic Container Registry

Set up databases that connect to the simulation container – The GUS application connects to two databases to store data needed during the simulations (Amazon MemoryDB for Redis) and outputs of simulation runs for caching purposes (Amazon RDS for PostgreSQL). Caching simulation outputs allows Lucidminds to reduce their need for compute when users enter a set of simulation parameters that have already been computed in the past. Not all simulations require a database connection and it is not a requirement for running Monte Carlo simulations using AWS Batch, that’s just how the Green Urban Simulator application works. GUS also writes simulation results to an S3 bucket. It’s strongly recommended that container instances are launched within a Virtual Private Cloud (VPC), and required when using managed compute instances (discussed later), so for the simulation containers to output results to S3 buckets, you’ll need a VPC endpoint. For more information, see Gateway endpoints for Amazon S3.
Test application on Amazon Elastic Container Service (ECS) cluster – Make sure to test that the application is working as anticipated before trying to scale. Lucidminds tested their simulation application using an ECS cluster, to ensure that the container could be loaded and run, and that the application can successfully connect to and read/write results to the associated data stores. Once GUS was performing sas expected, it was time to get running with AWS Batch.

Phase 2: set up and run simulations using AWS Batch

Create a compute environment (CE) – A compute environment is a collection of compute resources used to run batch jobs. Before you can run jobs in AWS Batch, you need a CE. You can set up a managed CE which is managed by AWS, or an unmanaged CE that you manage yourself. You can configure managed CEs to use Fargate or On-demand Amazon EC2 instances. For each of those options you can choose to use Spot Instances at a deep discount, however they can be stopped suddenly and restarted at any time with a 2-minute warning (potentially from a checkpoint). If you’re not familiar with Spot Instances, check out the Amazon EC2 Spot Instances product page. For the simulations of Amsterdam, Lucidminds chose to use a managed CE, configured with AWS Fargate. During setup of a Fargate-configured CE, you choose a name for the environment, select whether you want to run with Spot Instances, set the maximum number of vCPUs the environment can use, and set up the network configuration by choosing (or creating) a VPC, subnets into which resources are launched, and the security groups to be associated with the launched instances.
Create a job queue, associate to CE – Jobs queues are the place that submitted jobs sit until they are scheduled to run and are associated with CEs. You can set up multiple job queues and set a priority order to determine what jobs get run first in an environment where multiple types of jobs are being submitted. For example, you can create a queue that uses Amazon EC2 On-Demand instances for high priority jobs and another queue that uses Amazon EC2 Spot Instances for low-priority jobs. For additional details on setting up job queues, see the AWS Batch job queues documentation.
Create and register a job definition – Job definitions describe the job to be executed, including parameters, environment variables, and compute requirements. Job definitions are how you tell AWS Batch the location of the container image you’re running, the number of vCPUs and amount of memory the container should use with the container, IAM roles the job might need, and any commands the container should run when starting. Once you create a job definition, it can be reused or shared by multiple jobs. In the Lucidminds case for Amsterdam, the GUS container used 1 vCPU (it’s a single-threaded application) and 2 GB per container.
Submit an AWS Batch array job – Now that all AWS Batch resources are set up, we’re ready to run simulations. You can submit jobs using the AWS Console or the AWS CLI. When submitting a job, you select the CE, job queue, and job definition. If needed, you can override many of the parameters specified in the job definition at runtime. One additional parameter we can set that is particularly useful in the case of the GUS (and Monte Carlo simulations in general) is: Array Size. This parameter can be set between 2 and 10,000 and is used to run array jobs: jobs that share common parameters, like the job definition, vCPUs, and memory. These jobs run as a collection of related, yet separate, basic jobs that might be distributed across multiple hosts and might run concurrently. Array jobs are the most efficient way to run extremely-parallel jobs like Monte Carlo simulations or parametric sweeps.

As we mentioned earlier, running an ensemble of simulations for a green urban scenario does not require any simulation input parameters to be different, because the inherent probabilistic nature of the simulations provides different results for the same set of inputs. If you have an application or simulation to which you’d like to pass a sequence of different input parameters, such as for parameter sweeps or grid search cases, you can use the AWS_BATCH_JOB_ARRAY_INDEX environment variable to differentiate the child jobs. For a quick, simple, and yet highly-illustrative tutorial that demonstrates this concept, see Tutorial: Using the array job index to control job differentiation.

Once you submit your job, you can track it on the AWS Batch dashboard. In Figure 5, we’ve shown a screenshot of an example array job with 100 child jobs submitted and mid-run. As the jobs get scheduled and processed, they move across the dashboard from Submitted to either Succeeded or Failed.

Figure 5 – Reviewing results from induvidual jobs on Amazon CloudWatch

Consolidate and view results – Once the job completes, it’s time to review the results. You can navigate into individual jobs and view the logs via CloudWatch. In the array job screenshot shown in Figure 4, you’d get there by clicking one of the hyperlinked numbers in the column Job Index and by clicking the link under Log stream name. If your application writes results to an Amazon S3 bucket as the GUS does, you might need to include an extra step where the results are consolidated to make for easier analysis.

GUS in action

Lucidmind’s pilot project with the City of Glasgow, aimed at supporting the city’s climate targets by growing canopy cover in deprived neighborhoods, has yielded promising results. Soon the city of Stuttgart (in Germany) will adopt the GUS Framework. With GUS, these cities can run city-scale simulations and explore multiple scenarios to identify optimal projects that align with their net-zero climate targets. By leveraging the GUS framework, they can harness the numerous benefits of trees, including carbon sequestration and storage, storm-water retention, and mitigating heatwave effects.

The GUS framework, consisting of a set of meticulously designed microservices, serves as a powerful tool for various organizations worldwide. VAIV, a South Korean company specializing in AI and big data solutions, will utilize the GUS API to create digital twins of their projects, specifically leveraging the heatwave effects module to address their unique use-case. This demonstrates the versatility and adaptability of the GUS framework, catering to diverse requirements across different regions and sectors. Stuttgart is another example where GUS is used for long term urban deforestation planning and decision making.

A GUS demo is now available for anyone who’d like to run city-scale simulations and explore multiple scenarios, gaining valuable insights into the impact of your decisions on urban environments. You can visit run.greenurbanscenarios.com/, set up an example scenario, and click “run on AWS” to run a simulation and view the results. The science behind GUS is explained in a peer-reviewed journal article [3].

Figure 6 – GUS is available to run online

Figure 7 – Insights generated by GUS Simulations ran on AWS HPC. Impact Analysis Dashboard for Carbon, Water retention, Air Quality and Canopy Cover

The GUS dashboard’s standardized outputs are generated through Monte Carlo experiments, considering the probabilistic nature of climate and tree growth dynamics influenced by their surroundings.

GUS leverages AWS computing services to perform Monte Carlo experiments. In Figure 8, we illustrate a simulation ensemble of fifty different simulations for three different maintenance scenarios (high, medium, and low maintenance). The solid line represents the mean carbon sequestration of the fifty forest growth trajectories. The high-maintenance scenario assumes state-of-the-art pruning, replanting, and disease treatment, while lower maintenance scenarios involve reduced or nonexistent care provisions.

These simulation results would tell a decision maker the amount of carbon a proposed green urban project could be expected to sequester over time, and the decision maker could use that information in combination with cost estimates for each maintenance scenario specific to their area to better understand the cost-benefit tradeoffs associated with green urban planning.

Figure 8 – Monte Carlo simulations of sequestration showing that the higher the tree maintenance, the more carbon will be captured over time.

Conclusion

Urban forests are an important part of the fight against climate change. Digital technologies such as scenario analysis, digital twins, agent-based modeling, and high performance computing on AWS using services like AWS Batch can help us simulate urban forests and examine how different potential green urban projects can have an impact on our cities.

In this post, we’ve not only examined why simulation techniques like Lucidmind’s Green Urban Simulator (GUS) are important tools for the future or urban planning and green cities, but also how AWS Batch can be used to scale simulations and drastically reduce the amount of time to get results.

The content and opinions in this blog are those of the third-party author and AWS is not responsible for the content or accuracy of this blog.

References

[1] Journal of Arboriculture 18(5): September 1992 227 ASSESSING THE BENEFITS AND COSTS OF THE URBAN FOREST by John F. Dwyer, E. Gregory McPherson, Herbert W. Schroeder, and Rowan A. Rowntree

[2] David J. Nowak and John F. Dwyer, Understanding the Benefits and Costs of Urban Forest Ecosystems. Urban and Community Forestry in the Northeast. Doi: 10.1007/978-1-4020-4289-8_2.

[3] Bulent Ozel and Marko Petrovic. 2023. Green Urban Scenarios: A framework for digital twin representation and simulation for urban forests and their impact analysis, Journal of Arboriculture & Urban Forestry (Forthcoming).

AWS HPC Blog