AWS HPC Blog

Enhancing Equity Strategy Backtesting with Synthetic Data: An Agent-Based Model Approach

This post was contributed by Ilan Gleiser (AWS), Apostolos Psaros (Fidelity Investments), Igor Halperin (Fidelity Investments), and Jim Gerard (Fidelity Investments), and Ross Pivovar (AWS). 

An abstract representation of financial modeling and simulations. Financial professionals continually seek ways to develop and test profitable investment strategies. While backtesting serves as a crucial tool, the limited availability of historical data often constrains its effectiveness. This comprehensive two-part guide explores how synthetic data, generated through agent-based models (ABMs), can enhance backtesting capabilities.

Part One (this post) will establish the theoretical foundations of our approach. We’ll explore the core principles of synthetic data generation, examine the framework for agent-based modeling, and detail our methodology for creating realistic market simulations. This section will particularly interest readers who want to understand the underlying concepts and mathematical foundations of our approach.

Part Two will focus on practical implementation and results. We’ll dive into detailed simulation outcomes, showcase real-world applications, and provide a complete technical guide for implementing these models using AWS infrastructure. This section will especially benefit practitioners looking to apply these concepts in their own work.

Throughout both parts, we’ll demonstrate how synthetic data can help overcome common backtesting challenges, from insufficient historical data to the need for diverse market scenarios. Let’s begin with the theoretical framework that underpins our approach.

The challenge of insufficient daily data

Backtesting requires a comprehensive historical dataset, including price movements, volume, and other market factors like news events or economic indicators. For many markets or securities, particularly newer or less liquid ones, gathering enough daily data to cover various market conditions proves challenging. Limited data can lead to several issues, including overfitting, where strategies may perform well during historical periods but fail to adapt to future conditions. Additionally, a scarcity of data can result in a lack of representation of different market phases, such as bull and bear markets or periods of volatility clustering. Relying solely on historical data can also introduce survivorship bias, offering an overly optimistic view of historical returns.

The solution: synthetic data through agent-based models

Synthetic data, especially that produced by ABMs of the market, presents an effective solution. Agent-based modeling simulates the interactions of agents, such as individual traders or institutions, according to a set of rules, thereby replicating the complex dynamics of financial markets. These models generate artificial yet plausible data, providing several advantages for backtesting equity strategies.

To work with an agent-based market simulator, its necessary first calibrate the chosen investment universe for fundamental statistical characteristics. This includes accounting for average returns, daily trading volumes, volatility, and kurtosis. Once calibrated, synthetic data can replicate these characteristics for scenario analysis.

Synthetic data can increase the volume and variety of historical data by creating endless possibilities of market scenarios, including extreme but realistic events that have not yet occurred. Researchers can alter modeled market conditions and agent behaviors to systematically test, in a digital lab, how strategies perform across a broad range of market scenarios, a feat not achievable with historical data alone.

Additionally, synthetic markets offer an unbiased testing ground for strategies, eliminating pre-existing biases such as survivorship bias. With the diverse and extensive scenarios generated by synthetic data, investment strategies can be developed to better prepare for future market conditions.

Benefits of using synthetic data for equity market back tests

There are numerous benefits to this approach.

Enhanced data volume and variety: Synthetic data allows for the creation of an almost infinite amount of data, reflecting a wide range of market scenarios, including extreme but plausible events that have not occurred yet but could.

Controlled experimentation: With agent-based models, researchers can systematically vary market conditions or agent behaviors, or even the composition of the market in terms of numbers of different types of agents, exploring how strategies perform across an extensive array of situations, something not possible with historical data alone.

Innovative strategy development: The breadth and variety of scenarios produced by synthetic data can drive the creation of more resilient and flexible investment strategies, equipped to navigate diverse future market environments. Quants who build optimal portfolios often have a clear perspective on current market conditions, such as stressed market regimes. They can focus on specific time frames, like the next six months. In such cases, they can adjust their simulators to generate more crisis-like data, allowing them to rigorously test their portfolios’ resilience.

Equity market dynamics

In equity markets, several factors are crucial in determining the strength of price movements and market liquidity. For example, volume reflects the number of shares traded during a specific period. High volume in the tech sector might signal bullish market conditions, indicating strong investor interest and confidence in tech stocks. Volatility measures return variation over time and significantly impacts investments. During the 2008 financial crisis, forced selling of highly-leveraged positions caused waves of high volatility across markets.

Monitoring both short and long-term price trends is essential for accurately assessing market conditions. The bull market of the 1990s resulted from long-term trends in technology and consumer spending. Rapid advancements in information technology, particularly the internet and the dot-com boom, fueled investor enthusiasm and drove substantial stock price gains.

Liquidity plays a critical role in determining market depth and price slippage. For instance, during the 2020 oil market crash, a significant drop in demand and an oversupply of oil led to a steep decline in oil prices. The lack of sufficient buyers to absorb the excess supply contributed to poor market depth and increased price slippage. Order imbalances, the difference between buy and sell orders, provide insights into directional pressures on market prices. Public news, rumors, and reports can impact market prices and volatility. Market sentiment, reflecting investor attitudes towards securities or the market, can significantly impact market movements.

Agent-based behaviors

Different types of agents make trading decisions using various methods.

Fundamental analysis agents consider economic indicators, industry conditions, and company financials. Technical analysis agents use chart patterns, past price movements, and technical indicators to predict market behavior. Noise traders act on irrelevant information, adding liquidity but potentially increasing market volatility. Market makers provide liquidity by buying and selling securities at different prices. Institutional investors represent large organizations and can move markets with substantial trades. Momentum traders follow market trends, buying and selling assets accordingly.

Lastly, arbitrageurs profit from price inefficiencies between markets or related securities without taking directional market risk.

Implementing agent-based models for synthetic data creation

Implementing ABMs for synthetic data creation involves several key steps. Initially, the model requires a foundation of market rules and agent behaviors, drawing from empirical observations, stock exchange regulations, and financial theory. Simulation runs generate extensive datasets, simulating years of market activity in mere hours or days. Analysts then sift through this synthetic data, applying equity strategies to assess performance across myriad scenarios.

Designing an agent-based model requires careful definition of the rules governing market mechanics and agent behaviors, a realistic representation of market microstructure, and the inclusion of mechanisms for information dissemination and trading. Model calibration involves tuning parameters to replicate historical market patterns and behaviors observed in real-world data.

During the calibration process, simulation parameters can be adjusted to generate return distributions, volatility clustering effects, and correlations across sectors and asset classes that closely match those seen in actual market data from periods like the 2008 financial crisis or the recent COVID-19 pandemic market turmoil. Calibration helps build confidence that the synthetic data generated is realistic and reliable for strategy testing across a range of potential future scenarios.

Despite the promise, the reliability of agent-based models depends on assumptions about market dynamics, agent behavior, and data quality used for calibration. Thorough calibration efforts against observed historical precedents ensure the credibility of the synthetic data.

The equity market simulator use case

Our use case simulates the equity markets, focusing on the role of information asymmetry in market behavior, efficiency, and stability during crises. It’s concerned with the different approaches of active (informed) and passive (uninformed) asset managers in building portfolios, with active managers using additional predictive signals at a cost to optimize their investments.

Garleanu and Pedersen (GP 2022) 1 developed a static model that analyzes the equilibrium state of such a market. A one-period model only considers a single time period, snapshot, or static state of market and investor behavior. However, this approach has significant limitations. Specifically, it struggles to address practical concerns like transaction costs, including price impact and bid-ask spreads. The model also falls short in accounting for the multi-period nature of portfolio optimization. We reference a recent static model by Garleanu and Pedersen (GP 2022)1, which analyses the equilibrium state of such a market, but the paper highlights the limitations of a one-period model (a one-period model essentially means the model only considers a single time period, snapshot or static state of the market and investor behavior) in addressing practical concerns such as transaction costs such as price impact, bid-aks spreads, and the multi-period nature of portfolio optimization.

To overcome these limitations, we developed an ABM using HPC services on AWS to simulate the behaviors of retail investors, passive managers, and active managers in a more complex and realistic financial market scenario.

This approach aims to model dynamic market scenarios involving multiple agents and their decision-making processes. The goal is to provide deeper insights into market dynamics, particularly in how different types of asset managers and retail investors interact and make investment decisions. This approach aims to model dynamic market scenarios involving multiple agents and their decision-making processes, with the hope of providing deeper insights into market dynamics, particularly in how different types of asset managers and retail investors interact and make investment decisions.

The rest of this post outlines the model setup, dynamics, and the possible implications of these interactions on market efficiency and stability.

Three main objectives for our equity market simulator

Market simulators and synthetic data are becoming relevant tools for market practitioners aiming to back-test their strategies. These simulators generally emulate market price dynamics over short time horizons, frequently beginning with the simulation of a limit order book.

What sets our market simulator apart is its innovative choice of modeling primitives and its focus on longer-term simulations. Our methodology is crafted to align with the practices of fundamental portfolio managers and analysts.

This defines our main modeling primitives: the process starts by modeling the fundamental factors, such as value and growth, for individual companies. Following this, we employ a multi-factor model that integrates these fundamental factors and includes price impact as a price-setting mechanism. This construction follows the modeling framework typically employed by fundamental analysts and portfolio managers who apply similar factor models to model future market prices. Here we use the same approach but in reverse, to simply postulate market dynamics that, driven by a given set of fundam­ental factor and a price impact function, and fix coefficients in our model to produce model market dynamics with similar statistics to the real market dynamics.

The result is a market simulator designed for long time horizons, where market dynamics are influenced by a blend of fundamental factors and price impact, aligning with the conventional modeling frameworks utilized by fundamental portfolio managers and analysts.

The framework of our market simulator is built to accomplish three primary objectives:

  1. Provide a realistic simulation of market equity price dynamics over extended periods, measured in months and years, for specific market structures that include active and passive funds and retail investors.
  2. Enable users to explore the performance of custom strategies by incorporating additional user-defined agents into the market simulator.
  3. Allow for the training of adaptive (reinforcement learning) agents using the market simulator as a dynamic environment.

Practical applications

Enhanced back-testing capabilities

The conventional method for validating investment strategies involves historical simulations, where historical market data is bootstrapped to evaluate a strategy’s performance. In this context, historical market data can be viewed as a path of some (possibly high-dimensional) vector of ‘features’.

We can backtest any strategy by combining the strategy with this historical market data and a specific impact function. This function essentially converts a single path of market prices and other relevant features into a single path of the portfolio’s profit and loss (P&L) based on the strategy. Trading strategies often contain several hyper-parameters, making a single path of simulated historical performance susceptible to overfitting. However, our simulator generates an entire distribution of a P&L that aligns with historical data, which is harder to overfit for a strategy with a limited number of hyper-parameters.

In this way, this framework provides a robust tool for evaluating various investment strategies across different market scenarios while mitigating the risk of backtest overfitting. By producing distributions of performance metrics like Sharpe ratios and drawdowns across multiple scenarios, researchers and practitioners can gain valuable insights into the efficacy of different strategies under diverse conditions.

Assessing strategies in varied market scenarios

Traditional backtesting methods that rely solely on historical data test a given strategy only against market regimes present in the historical dataset. If a particular market regime, such as a stressed or crisis market, is absent from the historical data, these methods fail to offer insights into the potential future behavior of your portfolio under such conditions.

In contrast, with synthetic data from an ABM, the performance of a portfolio can be evaluated under a range of market scenarios, making it an effective tool for scenario analysis. By endogenously generating various market conditions, including bull or bear markets, high volatility periods, and unique idiosyncratic events like the GameStop short squeeze. This can allow investors to gauge the robustness of their strategies and identify potential vulnerabilities.

This comprehensive approach provides a deeper understanding of how a strategy may perform in different market environments, thereby enabling investors to make more informed decisions.

Evaluating multiple strategies in a fixed market scenario

Another dimension of scenario analysis involves fixing a market scenario and then exploring which strategy performs best across multiple variations of that scenario. The framework also facilitates the identification of the market regime or scenario in which a specific strategy outperforms a benchmark strategy.

By comparing the performance of various strategies under different market conditions, investors can identify the strengths and weaknesses of each approach. This information can be instrumental in optimizing portfolio allocations and enhancing overall performance relative to a benchmark.

In summary, our markets simulator offers an advanced and nuanced approach to back-testing and strategy evaluation, providing significant advantages over traditional methods. Its ability to simulate long-term market dynamics, assess strategies under diverse conditions, and compare multiple strategies within fixed scenarios makes it a valuable tool for both researchers and practitioners aiming to refine their investment approaches.

Next step

Read about the practical implementation and results in Part Two of this post. We’ll dive into detailed simulation outcomes, showcase real-world applications, and provide a complete technical guide for implementing these models using AWS infrastructure.

References

1 N.Garleanu and L.H. Pedersen, “Active and Passive Investing: Understanding Samuelson’s Dictum”,

The Review of Asset Pricing Studies, 12(2), 389-446, https://doi.org/10.193/parstu/rrab020 (2022).

2 Palmer, R., Arthur, W., Holland, John, Lebaron, Blake, “An artificial stock market”, Artificial Life and Robotics, https://www.researchgate.net/publication/225471692_An_artificial_stock_market, (1999).

Ilan Gleiser

Ilan Gleiser

Ilan Gleiser is a Principal GenAI Specialist at AWS WWSO Frameworks team focusing on developing scalable Artificial General Intelligence architectures and optimizing foundation model training and inference. With a rich background in AI and machine learning, Ilan has published over 20 blogs and delivered 100+ prototypes globally over the last 5 years. Ilan holds a Master’s degree in mathematical economics.

Apostolos Psaros

Apostolos Psaros

Apostolos Psaros joined Fidelity Investments in 2022 and holds the position of Director in Data Science at the Asset Management Technology group, specializing in machine learning research for asset management applications. With a decade of experience in engineering, mathematics, and machine learning, he has published more than 15 papers and co-authored the book “Path Integrals in Stochastic Engineering Dynamics” (Springer, 2024). His work has been featured in top-tier venues such as the Journal of Computational Physics and NeurIPS. Apostolos specializes in physics-informed machine learning, uncertainty quantification, and deep meta-learning.

Igor Halperin

Igor Halperin

Igor Halperin is a Group Data Science Leader at CKSI, Fidelity Investments. Prior to joining Fidelity, Igor worked as a Research Professor of Financial Machine Learning at NYU Tandon School of Engineering. Before that, Igor was an Executive Director of Quantitative Research at JPMorgan, and a quantitative researcher at Bloomberg LP. Igor has published numerous articles in finance and physics journals, and is a frequent speaker at financial conferences. He has co-authored the books “Machine Learning in Finance: From Theory to Practice” (Springer 2020) and “Credit Risk Frontiers” (Bloomberg LP, 2012). Igor has a Ph.D. in theoretical high energy physics from Tel Aviv University, and a M.Sc. in nuclear physics from St. Petersburg State Technical University. In February 2022, Igor was named the Buy-Side Quant of the Year by RISK magazine.

Jim Gerard

Jim Gerard

Jim Gerard retired from Fidelity Investments in 2023 after a 33-year career in numerous quantitative research roles, including mortgage-backed securities modeling, risk assessment and management for the money markets funds complex and the core-plus funds group, and risk-controlled asset allocation modeling for the Fidelity Asset Management Solutions division. Jim joined Fidelity in 1990 from Morgan Stanley and Co. in New York, where he worked on the first generations of option-adjusted spread models for mortgage-backed securities. Prior to that, he was an assistant professor of Economics at Rutgers University, researching markets with incomplete and asymmetric information. He holds a Bachelor of Arts degree in physics and applied math from Northwestern University, and a master’s and Ph.D. in applied economic theory from the California Institute of Technology.

Ross Pivovar

Ross Pivovar

Ross has over 15 years of experience in a combination of numerical and statistical method development for both physics simulations and machine learning. Ross is a Senior Solutions Architect at AWS focusing on development of self-learning digital twins, multi-agent simulations, and physics ML surrogate modeling.