AWS HPC Blog

Enhancing Equity Strategy Backtesting with Synthetic Data: An Agent-Based Model Approach – part 2

This post was contributed by Ilan Gleiser (AWS),  Apostolos Psaros (Fidelity Investments), Igor Halperin (Fidelity Investments), and Jim Gerard (Fidelity Investments), and Ross Pivovar (AWS).

Financial professionals continually seek ways to develop and test profitable investment strategies. While backtesting serves as a crucial tool, the limited availability of historical data often constrains its effectiveness. This comprehensive two-part guide explores how synthetic data, generated through agent-based models (ABMs), can enhance backtesting capabilities.

In Part One, we established the theoretical foundations of our approach. We explored the core principles of synthetic data generation, examined the framework for agent-based modeling, and detail our methodology for creating realistic market simulations. This section will particularly interest readers who want to understand the underlying concepts and mathematical foundations of our approach.

On this part of the blog (part 2), we will focus on practical implementation and results. We’ll dive into detailed simulation outcomes, showcase real-world applications, and provide a complete technical guide for implementing these models using AWS infrastructure. This section will especially benefit practitioners looking to apply these concepts in their own work.

Throughout both sections, we’ll demonstrate how synthetic data can help overcome common backtesting challenges, from insufficient historical data to the need for diverse market scenarios. Let’s begin with the theoretical framework that underpins our approach.

Setting up the simulation

An ABM for equity market backtesting and simulation can offer insightful observations into market dynamics and the effectiveness of trading strategies under various conditions.

An agent is defined as an entity with the ability to make decisions about buying, selling, or holding equity securities. Agents range from individual retail traders and institutional investors to automated trading algorithms, and are characterized by a unique set of attributes such as risk tolerance, investment horizon, capital, information access, and decision-making strategies.

Agents can assume multiple roles in equity trading and investment scenarios based on their attributes. These roles include a market maker and various portfolio allocators. Each agent follows its strategy and responds to market changes based on its predefined behavior and rules, collectively determining market dynamics, prices, and trends.

To ensure realistic market dynamics, the simulation operates under a set of specified rules. Agents have different trade frequencies based on their type, volume limits to avoid unrealistic market dominance, and market impact constraints to simulate real-world supply and demand dynamics.

As users adjust parameters like volatility levels, the frequency of news events, or overall market trends, the simulation can be “conditioned” to generate various market regimes. Users can backtest trading strategies under different market conditions, simulate the impact of large trades on market prices and liquidity, and analyze how different types of agents affect market dynamics during high volatility periods or market crashes.

The goal of the simulation is to create authentic synthetic market data through an agent-based model, thereby mitigating overfitting problems associated with limited historical data. This approach offers numerous advantages, including the ability to produce extensive and varied market scenarios for strategy testing, controlled experimentation, and the development of resilient adaptive strategies. In practical terms, it enhances backtesting, scenario analysis, and strategy comparison and optimization under diverse market conditions.

The overarching process involves aligning the simulation with real data, executing agent-based simulations, and examining the outputs. Specific methods for calibration include:

  1. Using fundamental ratios – We can calibrate key financial ratios like price-to-earnings, price-to-book value, debt-to-equity for individual stocks in the simulation to match the distributions we observe in the actual market over the historical period we’re modeling.
  2. Feature selection – Choosing the most relevant stock-level features (market cap, sector, trading volume, analyst forecasts etc.) and market-level features (interest rates, commodity prices, economic indicators etc.) that drive equity returns. We can match features to those observed through machine-learning models on historical data.
  3. Identifying periods of market stress – We can adjust simulation parameters like volatility, correlation regimes, trading volumes to accurately generate synthetic data that exhibits qualities of actual market behaviors during stressed periods like the 2008 crisis, 2020 COVID crash, and so on.

By calibrating in this way, the simulation can generate synthetic data whose statistical properties closely mirror real-world observations across individual stocks, sectors, market-level factors, and regimes – increasing confidence in the realism of the data. Key distributions, correlations, tail behaviors are all matched through this calibration process before running the simulations used for strategy testing.

To start the simulation, we established a comprehensive universe of 1,180 stocks spanning various sectors. We seeded this investment landscape with real-world data, incorporating a snapshot of stock prices and shares outstanding from late 2017. Rather than relying on arbitrary initializations, we ensured that the simulation began with a realistic distribution of market capitalizations across firms. We fixed the stock prices and shares outstanding at their late 2017 levels, providing a stable foundation upon which the simulated market dynamics could unfold.

In its initial incarnation, the simulation operated in an “agent-free” mode, devoid of any active trading entities. This baseline scenario allowed us to observe market behaviors driven solely by the underlying economics of the firms, as represented by their fundamental data. It served as a control environment, enabling comparisons with subsequent simulations that introduced the intricate behaviors and interactions of various agent types.

While the agent-free simulations provided valuable insights, the true power of the model lay in its ability to populate the market with a diverse ecosystem of agents. Subsequent configurations introduced up to 5,040 agents, each imbued with distinct attributes such as risk preferences, investment horizons, capital levels, and information access. We divided these agents into categories like growth, value, momentum, equal-weight, mean-variance, risk parity, and ESG – with each group employing strategies aligned with their respective philosophies.

For example, a “growth” agent could be configured to:

  • Highly weight factors like expected earnings growth, sales growth, and growth in operating cash flows when evaluating stocks
  • Focus on companies operating in high growth sectors like technology
  • Have a higher risk tolerance to take advantage of potential upside
  • Use fundamental signals like high P/E and P/B ratios to identify growth opportunities
  • Employ momentum indicators to add to winning positions showing strong price momentum
  • Rebalance regularly to maintain desired positioning towards growth attributes

In contrast, a “value” agent may prioritize metrics like free cash flow yield, earnings yield, or low price-to-book when evaluating investments. They could tailor their risk preferences, sector focus, and rebalancing rules accordingly.

We then randomly assigned the agents to focus on different sectors and stocks, creating a heterogeneous market environment that more closely reassembled the real world.

Simulation results

This session presents the preliminary findings from the agent-based equity market simulator developed to model equity market dynamics. The simulation aims to generate realistic synthetic data to enhance backtesting capabilities for investment strategies. By incorporating agent behaviors and interactions, the model seeks to capture emergent properties of real financial markets that are often missed when relying solely on historical data.

1 – Initial Agent-Free Equity Market Simulations: The simulation began with agent-free equity price simulations, using a snapshot of total outstanding shares matched to firm tickers from late 2017. Market-value-weighted indices and sector indices were created with constant share counts. The simulated sector-level return correlations appear reasonable compared to historical observations as seen in Figure 1.

Fig 1. The simulation initiated with equity price projections devoid of any agents, employing a snapshot of the total outstanding shares aligned with firm tickers from late 2017. Market-value-weighted indices and sector indices were constructed with fixed share counts. The simulated correlations of returns at the sector level seem reasonable when compared to historical data.

Fig 1. The simulation initiated with equity price projections devoid of any agents, employing a snapshot of the total outstanding shares aligned with firm tickers from late 2017. Market-value-weighted indices and sector indices were constructed with fixed share counts. The simulated correlations of returns at the sector level seem reasonable when compared to historical data.

2 – Simulated vs. Historical Index Return Distributions: The simulated index return distributions closely match the characteristics of (monthly frequency) historical data, such as the Russell 1000 index. Key statistics like skewness, kurtosis, and the Jarque-Bera test for normality are broadly in line, indicating the synthetic data captures the essential properties of real index returns over monthly periods. Note that the historical sample of the Russell 1000 includes returns from highly stressed market regimes, which the simulation has not generated so far; the simulated market returns therefore exhibit less heavy-tailedness than the actual market. Fig 2.

Fig 2: Comparative Index Return Distributions, RIY vs Simulated (stdrtns [mo], N ~ 300)

Fig 2: Comparative Index Return Distributions, RIY vs Simulated (stdrtns [mo], N ~ 300)

3 – Time-Series Properties: The autocorrelation functions (ACFs) of index returns and squared returns (volatility proxy) from the simulations exhibit properties consistent with real-world observations. While simulated squared returns show no significant autocorrelation, index returns display some degree of volatility persistence, aligning with the behavior seen in historical data for the S&P 500 index. Again, this largely results from the transition from high to lower volatility regimes in real-world markets, a feature that our ABM simulation model should be able to reproduce. Fig 3,4.

Fig 3: Timeseries Properties: Simulation Set

Fig 3: Timeseries Properties: Simulation Set

Fig 4: Timeseries Properties: S&P 500 [1999-2024, mo]

Fig 4: Timeseries Properties: S&P 500 [1999-2024, mo]

 

4 – Sector Residual Correlations: The residual correlations between sector returns, after adjusting for market movements, also display patterns similar to those observed in actual market data. These correlations likely arise from underlying risk factors like growth and value that drive co-movements across sectors. Fig 5.

Fig 5. Sector-Level Residual Correlation

Fig 5. Sector-Level Residual Correlation

5 – Impact of Agent Trading: When introducing agents into the simulations, their trading activities systematically impact sector-level returns and volatilities. Some sectors experience higher returns, while others see lower returns, with a noticeable shift away from low-Sharpe ratio sectors. This effect varies across simulation trials, reflecting the complex interactions between agents. Fig 6.

Fig 6. Agent actions systematically raise some sector returns while lowering others, varying by simulation; note evident migration away from low-Sharpe sectors. SECT - This column lists the different sectors in the equity market simulation. R0 - This shows the annualized return (in percentage) for each sector in the simulation run with zero agents trading. RAG - This shows the annualized return for each sector when there are 5040 agents actively trading in the simulation. σ0 - This column displays the annualized volatility (standard deviation of returns in percentage) for each sector in the zero agent simulation run. σAG - This column shows the annualized volatility for each sector when 5040 agents are trading. So in summary: SECT = Sector name R0 = Annual return with 0 agents (%) RAG = Annual return with 5040 agents (%) σ0 = Annual volatility with 0 agents (%) σAG = Annual volatility with 5040 agents (%). This allows comparing how the introduction of 5040 trading agents impacts both the returns and volatilities across the different sectors, relative to the agent-free baseline case. For example, we can see that for the INFOTECH sector: R0 (zero agents) = 15.91% annual return RAG (5040 agents) = 22.42% annual return So the agents collectively bid up the INFOTECH sector returns. But the volatility is largely unchanged: σ0 = 21.29% σAG = 21.47%

Fig 6. Agent actions systematically raise some sector returns while lowering others, varying by simulation; note evident migration away from low-Sharpe sectors. SECT – This column lists the different sectors in the equity market simulation. R0 – This shows the annualized return (in percentage) for each sector in the simulation run with zero agents trading. RAG – This shows the annualized return for each sector when there are 5040 agents actively trading in the simulation. σ0 – This column displays the annualized volatility (standard deviation of returns in percentage) for each sector in the zero agent simulation run. σAG – This column shows the annualized volatility for each sector when 5040 agents are trading. So in summary: SECT = Sector name R0 = Annual return with 0 agents (%) RAG = Annual return with 5040 agents (%) σ0 = Annual volatility with 0 agents (%) σAG = Annual volatility with 5040 agents (%). This allows comparing how the introduction of 5040 trading agents impacts both the returns and volatilities across the different sectors, relative to the agent-free baseline case. For example, we can see that for the INFOTECH sector: R0 (zero agents) = 15.91% annual return RAG (5040 agents) = 22.42% annual return So the agents collectively bid up the INFOTECH sector returns. But the volatility is largely unchanged: σ0 = 21.29% σAG = 21.47%

6 – Agent Initialization and Trading Impact: The agents require an initialization period (approximately 24 time steps) to establish their expected return and risk estimates. After this phase, the actions of growth and momentum agents become evident, as they bid up returns in sectors like Information Technology, causing the market index to exhibit a growth tilt in certain simulations (as has been the case in the actual market over recent years). Fig 7, 7a.

Fig 7: Simulated Returns, Zero vs 5040 Agents

Fig 7: Simulated Returns, Zero vs 5040 Agents

Fig 7a: Trading of growth, momentum agents bid up returns of INFOTECH sector, MKT index

Fig 7a: Trading of growth, momentum agents bid up returns of INFOTECH sector, MKT index

7 – Aggregate Return Distributions with Agents: Despite the significant impact of agents on individual stock and sector returns, the aggregate return distributions at the market level remain relatively unchanged. The simulations do not exhibit excess kurtosis or volatility persistence, even with agents actively trading.

It is possible that trading among informed, long-only, zero-leverage, 100% equity portfolio managers is not conducive to heavy-tailed portfolio returns or autocorrelated volatility. These phenomena may instead be more likely when large-scale flows into and out of the equity market occur, as market participants reassess near-term equity returns compared to other markets, for example.

Fig 8. Agents trading clearly shifts the returns of stocks, but not their aggregate distribution (no excess kurtosis or vol persistence)

Fig 8. Agents trading clearly shifts the returns of stocks, but not their aggregate distribution (no excess kurtosis or vol persistence)

To better understand the observed return patterns, we suggest examining the trends in portfolio positioning of representative agents or average portfolios of agent types. Additionally, access to the “true” factor status (value vs. growth) for all firms at each time step could provide insights into the interactions between growth and momentum strategies that contribute to the growth tilt in some simulations.

In reality, we often see that the collective actions of market participants can have a pronounced impact on the overall distribution of market returns, especially in the tails of the distribution. A few examples:

  1. During market crises or periods of extreme volatility, the return distributions tend to exhibit higher kurtosis (fatter tails) than normal due to the increased probability of large positive or negative returns.
  2. Market returns, especially at higher frequencies like daily or weekly, often show negative skewness during downturns as large negative returns accumulate.
  3. Volatility clustering effects, where periods of high volatility are followed by high volatility and vice versa, introduce autocorrelation and change the shape of the return distribution.

So the fact that the simulations currently show aggregate return distributions remaining relatively normal/Gaussian, without excess kurtosis or volatility persistence, even with agents trading, suggests the model may not yet fully capture some of the more extreme collective behaviors seen in actual markets.

This could be an area for further investigation – introducing mechanisms that allow agents to significantly change positioning/leverage in response to market conditions or explore agent designs that generate stronger herding/feedback effects. Accurately modeling information flow and overreactions could also lead to fatter tails.

While matching Gaussian returns is a good baseline, enhancing the ability to replicate the deviations from normality, especially in the tails, could further improve the realism of the synthetic data generated by the agent-based simulations. Continued calibration against historical stress periods will be important.

AWS reference architecture

We can utilize AWS Batch to run our agent-based modeling (ABM) simulations in a scalable and secure manner. A user can initiate tasks by submitting job requests to an AWS Batch queue. A client interface alters JSON configurations in the job submission, allowing users to modify parameters while maintaining security. For security purposes, users can only initiate and alter pre-defined Batch job definitions.

The initiated job loads a container from Amazon Elastic Container Registry (ECR) and then loads the financial ABM and any required customer data from an Amazon Simple Storage Service (S3) bucket. These jobs are created inside a private subnet to ensure they are separated from external networks.

Batch provisions a cluster of Amazon Elastic Compute Cloud (EC2) instances, and a Ray cluster is initiated on the EC2 group. All graph execution and processing of agents are handled via Ray, including the specific allocation of resources depending on function requirements.

The simulation saves all data and progress to Amazon DynamoDB. DynamoDB provides a scalable NoSQL database, which naturally conforms to saving data of agents with properties and state.

An AWS Batch dependent job is activated once the ABM simulation has completed. This dependent job can be submitted at the same time as the ABM job but will not execute until all dependencies have completed. The dependent job processes the DynamoDB data into a tabular format that is readily accessible by dashboards or report generators and is saved to an S3 bucket.

Amazon QuickSight provides a dashboard to review and analyze the results of the ABM, including an optional Amazon Q LLM (Large Language Model) to aid in creation or insight discovery. By leveraging the power of AWS services such as Batch, ECR, S3, DynamoDB, and QuickSight, we can run complex ABM simulations at scale while ensuring data security, persistence, and accessibility for analysis and visualization. Fig 9.

Fig 9. AWS Batch enables scalable and secure execution of agent-based modeling (ABM) simulations by allowing users to submit job requests that alter JSON configurations within predefined job definitions. The simulations run in isolated private subnets, using containers from Amazon ECR, data from Amazon S3, and compute resources from an EC2-based Ray cluster. Upon completion, simulation data is stored in DynamoDB, and a dependent job processes this data into a tabular format for easy access via Amazon QuickSight dashboards, leveraging multiple AWS services to ensure data security, persistence, and robust analysis.

Fig 9. AWS Batch enables scalable and secure execution of agent-based modeling (ABM) simulations by allowing users to submit job requests that alter JSON configurations within predefined job definitions. The simulations run in isolated private subnets, using containers from Amazon ECR, data from Amazon S3, and compute resources from an EC2-based Ray cluster. Upon completion, simulation data is stored in DynamoDB, and a dependent job processes this data into a tabular format for easy access via Amazon QuickSight dashboards, leveraging multiple AWS services to ensure data security, persistence, and robust analysis.

Conclusion

Overall, the preliminary results demonstrate the potential of agent-based simulations to generate realistic synthetic data for equity markets. The simulation captures many stylized facts and generates data distributions/dynamics that increasingly match real markets as more agent complexity is added, demonstrating its potential for enhanced strategy development and testing. Historically, the computational complexity and cost of this progressive tuning process has been high; but AWS’s high performance computing services and products make running iterations of complex, multi-agent simulations feasible.

As we navigate the complexities of modern financial markets, the challenge of backtesting with insufficient data cannot be overstated. Synthetic data, produced through agent-based models, offers a promising avenue for enriching our dataset, allowing for more comprehensive, resilient investment strategy development. By leveraging this innovative approach, investors and analysts can better equip themselves to design strategies that thrive not only in the historical market conditions they have observed but in the vast, uncharted territories of future market landscapes as well.

References

1 N.Garleanu and L.H. Pedersen, “Active and Passive Investing: Understanding Samuelson’s Dictum”,

The Review of Asset Pricing Studies, 12(2), 389-446, https://doi.org/10.193/parstu/rrab020 (2022).

2 Palmer, R., Arthur, W., Holland, John, Lebaron, Blake, “An artificial stock market”, Artificial Life and Robotics, https://www.researchgate.net/publication/225471692_An_artificial_stock_market, (1999).

Ilan Gleiser

Ilan Gleiser

Ilan Gleiser is a Principal GenAI Specialist at AWS WWSO Frameworks team focusing on developing scalable Artificial General Intelligence architectures and optimizing foundation model training and inference. With a rich background in AI and machine learning, Ilan has published over 20 blogs and delivered 100+ prototypes globally over the last 5 years. Ilan holds a Master’s degree in mathematical economics.

Apostolos Psaros

Apostolos Psaros

Apostolos Psaros joined Fidelity Investments in 2022 and holds the position of Director in Data Science at the Asset Management Technology group, specializing in machine learning research for asset management applications. With a decade of experience in engineering, mathematics, and machine learning, he has published more than 15 papers and co-authored the book “Path Integrals in Stochastic Engineering Dynamics” (Springer, 2024). His work has been featured in top-tier venues such as the Journal of Computational Physics and NeurIPS. Apostolos specializes in physics-informed machine learning, uncertainty quantification, and deep meta-learning.

Igor Halperin

Igor Halperin

Igor Halperin is a Group Data Science Leader at CKSI, Fidelity Investments. Prior to joining Fidelity, Igor worked as a Research Professor of Financial Machine Learning at NYU Tandon School of Engineering. Before that, Igor was an Executive Director of Quantitative Research at JPMorgan, and a quantitative researcher at Bloomberg LP. Igor has published numerous articles in finance and physics journals, and is a frequent speaker at financial conferences. He has co-authored the books “Machine Learning in Finance: From Theory to Practice” (Springer 2020) and “Credit Risk Frontiers” (Bloomberg LP, 2012). Igor has a Ph.D. in theoretical high energy physics from Tel Aviv University, and a M.Sc. in nuclear physics from St. Petersburg State Technical University. In February 2022, Igor was named the Buy-Side Quant of the Year by RISK magazine.

Jim Gerard

Jim Gerard

Jim Gerard retired from Fidelity Investments in 2023 after a 33-year career in numerous quantitative research roles, including mortgage-backed securities modeling, risk assessment and management for the money markets funds complex and the core-plus funds group, and risk-controlled asset allocation modeling for the Fidelity Asset Management Solutions division. Jim joined Fidelity in 1990 from Morgan Stanley and Co. in New York, where he worked on the first generations of option-adjusted spread models for mortgage-backed securities. Prior to that, he was an assistant professor of Economics at Rutgers University, researching markets with incomplete and asymmetric information. He holds a Bachelor of Arts degree in physics and applied math from Northwestern University, and a master’s and Ph.D. in applied economic theory from the California Institute of Technology.

Ross Pivovar

Ross Pivovar

Ross has over 15 years of experience in a combination of numerical and statistical method development for both physics simulations and machine learning. Ross is a Senior Solutions Architect at AWS focusing on development of self-learning digital twins, multi-agent simulations, and physics ML surrogate modeling.