AWS HPC Blog

Harnessing the power of large language models for agent-based model development

Harnessing the power of large language models for agent-based model developmentIn the realm of computational modeling, agent-based models (ABMs) have emerged as powerful tools for simulating complex systems and exploring their dynamics. ABMs allow researchers to model the behavior of individual entities (agents) and their interactions within a given environment, enabling a bottom-up approach to understanding emergent phenomena. However, developing accurate and realistic ABMs can be a challenging task, especially when dealing with domains where expert knowledge is limited or scattered across multiple sources.

Large language models (LLMs) are a cutting-edge technology that have the potential to revolutionize the way we approach agent-based modeling. LLMs are language models trained on vast amounts of textual data, enabling them to understand and generate human-like text with remarkable fluency and coherence. By leveraging the power of LLMs, researchers can gain access to a wealth of knowledge and accelerate the process of developing ABMs, even in domains where they lack extensive expertise.

In this blog post, we will explore how an LLM, specifically Claude 3 Sonnet provided in Amazon Bedrock, can be used in conjunction with other technologies to augment an ABM developer’s workflow, enabling the rapid development of models on topics they might not be experts in, like financial portfolio managers, insurance risk assessor, and more. We will walk through a real-world example of using Claude 3 Sonnet to create an ABM simulating wildfire propagation, demonstrating the power of this approach and the potential it holds for future research endeavors.

Application background

In this post we’ll focus on the hypothetical scenario of an asset manager assessing risks from forest wild fires in a specific geographic location.

The evolution of a wildfire involves interactions with vegetation type, moisture content, wind direction, slope, presence of non-burnable material such as rocks or water, and the probability of flame extinguishment. We could develop a rather complicated statistical or machine-learning (ML) model to incorporate all of these factors and predict the probability of an asset being destroyed.

However, a major shortcoming of using ML is the lack of sizable data sets on which to train models. For a given geographic location we most likely would have a handful of historical wildfires to train on (e.g. on the order of hundreds of data points that may or may not include data for the variety of features listed above). We could pool wildfire data from around the world, however this dramatically increases the input feature set since the ML must now understand 2D spatial relationships in a variety of climates.

The use of ABM completely removes these limitations by allowing us to generate as much synthetic data as needed. It also no longer requires an all-encompassing general model, but instead can create models for specific entities like the probability of fire igniting surrounding vegetation. In addition, depending on the time required to run the ABM simulation, we may not need the speed of a trained ML algorithm and instead can rely solely on the ABM simulation for risk analysis.

In what follows here, we’ve used an LLM to create an ABM in which the agents represent individual fire elements that can propagate or extinguish. We’ve based this on probabilities derived from existing academic literature.

These fire agents move within a 2D spatial grid, where the properties of each cell determine the likelihood of spawning new fire agents or extinguishing existing ones.

From knowledge gathering to model generation

Conceptually, we are following the workflow shown in Figure 1. However, for our demonstration we’re not comparing with true historical data and thus we’re not including the last row of the workflow in this blog post. Let’s review the details of each step.

Figure 1 Conceptual workflow for working with an LLM to develop an ABM

Figure 1 Conceptual workflow for working with an LLM to develop an ABM

Knowledge acquisition with Retrieval-Augmented Generation (RAG)

The first step in developing an ABM is gathering relevant knowledge from various sources. Our goal was to find papers that had been peer-reviewed and demonstrated to have validity for this specific type of ABM, i.e. we require guidance on the structure and equations needed to setup the wildfire ABM. In this case, we manually curated a set of journal papers on wildfire propagation.

In step 2, we inserted this manually curated content into an AWS Bedrock knowledge base. The knowledge base is a RAG (Retrieval-Augmented Generation), which is an AI technique that combines information retrieval from external sources with language model generation to produce more accurate and up-to-date responses. The RAG data store uses Amazon OpenSearch Serverless which handles the provisioning of infrastructure and rapid retrieval of documents. We used the Faiss vector index engine to determine the relevant documents for retrieval in the knowledge base.

The Bedrock knowledge base first passes all the data added to the knowledge base through Amazon Titan (a suite of LLMs for text generation, summarization, semantic search, and image generation) to generate language embeddings of the documents for LLM ingestion. Embeddings are numerical representations of text that capture semantic meaning, allowing the AI to understand and compare the content of different documents efficiently.

Iterative collaboration with Claude 3 Sonnet to generate ABM code

With the knowledge base in place, our third step was to engage in a collaborative process with Claude 3 Sonnet, reviewing the relevant journal papers and iteratively refining the ABM code (step 4) for the wildfire simulation. Figure 2 shows the prompts we sent to Claude 3 Sonnet (in bold text) and the LLM responses.

Claude 3 Sonnet’s ability to understand and synthesize complex information allowed us to rapidly explore different modeling approaches and incorporate domain-specific insights from the literature. Our main goal was to develop a prompt that informs the over-arching ABM structure and some light details so that we can generate code from the prompt.

The result of this step should be an ABM algorithm. Claude 3 Sonnet determined that the fire itself should be an agent and started to define the input properties needed to define the environment.

Due to token limitations and the general observation that an LLM will produce the minimum required to answer the prompt, we shouldn’t expect an LLM to produce a 10,000-line prompt/code in “one-shot”.

Figure 2 Initial interaction with Claude 3 Sonnet on ABM method development

Figure 2 Initial interaction with Claude 3 Sonnet on ABM method development

Using these prompts, we used Claude 3 Sonnet’s capabilities to generate a Python-based implementation of the model using the Mesa ABM framework (Figure 3). By providing Claude 3 Sonnet with the conceptual model and our desired features, we were able to quickly translate the textual prompts into executable code.

In short, we first used the LLM to assess strategy with verifiable references, and then converted that into code.

Figure 3 Output snippet showing the code produced based on the developed prompts

Figure 3 Output snippet showing the code produced based on the developed prompts

The initial code generated by Claude 3 Sonnet served as a starting point for further refinement. Typically, an LLM will produce code that results in certain functions with zero or minimal logic requiring further prompting, as shown in Figure 4.

Figure 4 Example of an LLM output that provides spots which require iterative prompts to fill in missing logic or equations.

Figure 4 Example of an LLM output that provides spots which require iterative prompts to fill in missing logic or equations.

At step 5, these missing sections can be filled in through an iterative process with the LLM of reviewing the available knowledge base, discussing potential improvements, and incorporating feedback. This included adding features like wind speed, wind direction, moisture levels, and material types – all of which impact the probability of ignition and propagation of wildfires.

As a further example, in Figure 5, we selected a location in the LLM-produced code that did not have sufficient logic. We (the developer) first requested what we should do for preheating, verifying the reference, and converting it into useable code. Currently we’re manually reviewing the code to assess locations that require refinement. The reflection strategy for LLMs (where you request the LLM to review its own code for improvements) is a possible avenue we could choose to automate this process in the future.

Figure 5 Example iteration about specific portions of the ABM code to improve and insert updates

Figure 5 Example iteration about specific portions of the ABM code to improve and insert updates

Manually reviewing the sources is important at this stage for enhancing the accuracy of the LLM. Figure 6 presents an equation that encompasses sub-equations and intricate exponential expressions. By reviewing the source (the LLM implementation in Bedrock furnished the precise lines), we were able to validate the information it provided, and we found the equation in Figure 6 to be inaccurate. This error likely occurred because the RAG workflow parses a sequence of images containing equations with varying font sizes and formatting before providing the output.

That kind of image parsing is a challenging task. That’s because, when using RAG, the process involves converting PDFs to tokens that the LLM can understand. This conversion step, or the subsequent image parsing by the LLM, could potentially introduce errors.

However, despite this inaccuracy, the LLM guided us to the appropriate equation and its location within the knowledge base. This highlights that while LLMs are powerful, they are not infallible and may sometimes produce incorrect outputs, especially when dealing with complex tasks like image parsing or PDF conversion.

Figure 6 Example discussing probabilistic equations to use with references

Figure 6 Example discussing probabilistic equations to use with references

Integrating geospatial data and computer vision

To further enhance the realism of the model, we incorporated geospatial data and computer vision techniques. We used satellite imagery of San Diego, a region prone to wildfires. We processed this imagery using computer vision methods, converting it into pixel clusters (see Figure 7) and associating forest properties based on the identified clusters.

We then used this to create a grid representation of the environment, allowing the ABM to simulate wildfires in a realistic spatial context. The source journal papers provided the properties of different cell types like forests, grass, shrubs, water, and urban. Note that this data is only good for demo purposes because we cannot actually know the true properties of land based solely on simple pixel clustering – and for this reason we’ve skipped steps 6 through 8 in this blog post. For example, we would need to collect data to find the true mapping of pixel intensity to plant type, slopes, and fuel potentials.

Figure 7 Satellite imagery clustered for local pixel categorization

Figure 7 Satellite imagery clustered for local pixel categorization

Simulation, visualization, validation

With the ABM fully developed and integrated with geospatial data, we proceeded to run simulations of multiple simulated days. The simulations modeled the propagation of wildfires through the grid, with each agent representing a fire agent that spread based on the environmental conditions and properties of the surrounding area.

Figure 8 shows an example simulation with a forced westward wind, providing a dynamic representation of the wildfire’s progression over time. The demo proves the overall concept but for true ABM use, we must collect data to calibrate and validate the model. Often agents will leverage data-driven models or rely on semi-empirical equations. Determination of these coefficients requires calibration of the ABM to historical data. In addition – with a calibrated model – the results of the ABM have to be validated to ensure acceptable accuracy for the use case.

Once a user has a validated ABM, they’re ready to use the simulation results. Since an ABM is intentionally stochastic, the output is expected to change after each execution. Hence, we can develop a set of replicates that enable assessment of probabilities of occurrences. We can do this by re-running the ABM simulation thousands of times in parallel with AWS Batch to obtain a distribution of output metrics.

A hypothetical scenario might be to determine the proportion of ABM runs that resulted in destruction of assets. This concept can inform the user of existing structures, but sampling the ABM can also be used to assess of placement for the asset. For example, if a company has the choice of 5 different plots of land around San Diego, the ABM can be used to assess the probability of a wildfire reaching each location and enable the company to compare the cost of the land versus the risk of asset damage. It may be worth selecting more expensive plots of land that exhibit lower risk of damage.

Figure 8 Movie of the resulting ABM simulation rapidly developed through iteration with an LLM

Figure 8 Movie of the resulting ABM simulation rapidly developed through iteration with an LLM

AWS architecture

This blog post demonstrates how an LLM can act as a development augmentation that enables the authors to spend more time thinking about the equations being used and less about the actual creation of the code and methods needed to run the ABM. Consequently, the AWS architecture is rather simple.

Figure 9 shows that in step 1 we first upload our reference documents to an Amazon S3 Bucket. In step 2, we initiate an Amazon Bedrock knowledge base on the bucket using the AWS console. This automatically indexes and embeds the data. At step 3, we iteratively query/converse with an LLM selected in Amazon Bedrock to quickly develop our model without any concern about infrastructure provisioning, GPU setup, or neural network model loading and inferencing.

Any code generated by the LLM is immediately tested in a development Amazon Elastic Compute Cloud (Amazon EC2) instance to determine what must improve – or if any bugs appear. Users can either converse and test LLM code using the AWS console by copying and pasting code from the console into a script, or use the cloud SDK (boto3) to directly work with a Bedrock LLM from an Amazon EC2 instance.

Figure 9 AWS cloud architecture used to perform iterative ABM development with LLMs in Amazon Bedrock

Figure 9 AWS cloud architecture used to perform iterative ABM development with LLMs in Amazon Bedrock

Once we have a validated ABM ready to run many simulations, we can deploy this in a cloud HPC environment by utilizing AWS Batch. In Figure 10, at step 1 a user inserts their ABM inside a container and pushes up to Amazon Elastic Container Registry (ECR) to enable access from other AWS services. At step 2, we use AWS Batch to run 1000s of replicants in parallel with the output saved to an S3 Bucket (at step 3).

This step leverages cloud-based high-performance computing (HPC) by enabling elastic parallelization. Running ABM simulations on a single or a few machines would necessitate serial processing, leading to extended overall runtime. However, by elastically executing all simulations concurrently, the overall runtime is reduced to the duration of a single simulation. This parallel processing approach, facilitated by AWS Batch, significantly enhances computational efficiency and accelerates the completion of complex ABM simulations.

At step 4, an AWS Lambda function runs to monitor Batch for task completion. Once Batch has completed its tasks, the Lambda function can process the data in the Amazon S3 bucket generating the probability distributions and assessment of risk to any asset that may be in the path of the wildfire.

Figure 10 Using AWS Batch to run 1000s of ABM simulations for risk assessments

Figure 10 Using AWS Batch to run 1000s of ABM simulations for risk assessments

Summary

The integration of large language models (LLMs) like Claude 3 Sonnet can revolutionize agent-based modeling (ABM) workflows by overcoming limited domain expertise and enabling rapid development of accurate models.

While end-to-end automation requires further vetting, LLMs can augment human expertise through iterative collaboration, gathering knowledge, generating code, and incorporating realistic factors.

In a wildfire ABM demonstration, Claude 3 Sonnet accelerated our model development while ensuring the final product aligned with the latest research. As LLMs evolve, their impact on computational modeling will surely grow, enabling breakthroughs in understanding complex systems across various domains through human-AI collaboration.

If you want to request a proof of concept or if you have feedback on the AWS tools, please reach out to us at ask-hpc@amazon.com.

Ross Pivovar

Ross Pivovar

Ross has over 15 years of experience in a combination of numerical and statistical method development for both physics simulations and machine learning. Ross is a Senior Solutions Architect at AWS focusing on development of self-learning digital twins, multi-agent simulations, and physics ML surrogate modeling.

Fei Chen

Fei Chen

Fei Chen has 15 years of industry experience leading teams in developing and productizing AI and machine learning at scale. At AWS, she leads the worldwide solution teams in advanced compute, including AI accelerators, HPC, IoT, Visual & Spatial Compute, and emerging technology focusing on technical innovations in AI and generative AI.

Ilan Gleiser

Ilan Gleiser

Ilan Gleiser is a Principal Emerging Technologies Specialist at AWS WWSO Advanced Computing team focusing on Circular Economy, Agent-Based Simulation and Climate Risk. He is an Expert Advisor of Digital Technologies for Circular Economy with United Nations Environmental Programme. Ilan’s background is in Quant Finance and Machine Learning.