Automated and personalized asset portfolio optimization combining ESG and financial data on Amazon FinSpace

As detailed in our previous post on the topic, Environmental, Social, and Governance (ESG) data has become an indispensable supplemental data to assess a company’s risk and performance alongside traditional financial and alternative data. Institutional investors and asset managers are increasingly integrating ESG data into investment decisions, as responsible or sustainable investing moves from niche to mainstream. This is highlighted by the fact that 40% of the $110 trillion professionally managed global financial assets have an ESG consideration.

Integrating ESG data into investment decision isn’t trivial. Sourcing, cleaning, processing, integrating, and analyzing ESG data requires significant manual effort, and even more if the company isn’t equipped with the right technology. This is why 59% of asset managers and institutional investors report data as a barrier to ESG integration.

One example of a specific challenge is the lack of standardization, which continues to drive the proliferation of ESG data, from the metrics reported, to the way in which the data is structured. On top of this, evolving ESG-related regulations mean that an ESG factor defined as material today may not be the same in the future.

Additionally, for asset managers, bringing together multiple data sources is an important step in overcoming the lack of standardization. And we’ve seen customers use anywhere from a couple of sources to over 40 data sources (covering thousands of entities).

Greenwashing represents an industry-wide challenge: global financial regulators are taking steps to address the greenwashing issue by launching regulations that require more detailed ESG product reporting and granular ESG product labelling. We’ve seen this in Europe with Sustainable Finance Disclosure Regulation (SFDR) and Sustainable Finance Roadmap, and in the US as announced by the SEC.

AWS is helping financial services customers as they navigate this changing environment. Financial services customers are turning to the cloud, where it’s never been easier to bring together multiple ESG data sources, residing in different formats (both structured and unstructured), for analysis.

There are two particular AWS services that lend themselves well to the challenges presented by ESG data:

Amazon FinSpace is a fully-managed data management and data analytics service that makes it easy to store, catalog, prepare, and analyze financial industry data at scale. This reduces the time it takes for financial services industry (FSI) customers to find, access, and analyze all types of financial and ESG data.

AWS Data Exchange is a service that makes it easy for AWS customers to find, subscribe to, and use third-party data in the AWS Cloud. As a subscriber, you can find and subscribe to thousands of products from qualified data providers.

Amazon FinSpace and AWS Data Exchange are designed to simplify and accelerate financial evaluations and decision based on data analysis. This is enabled by the ease of use and capabilities available in FinSpace and the variety of data products available on AWS Data Exchange.

In this post, we’ll show how to get started using the AWS Data Exchange and FinSpace for a specific use case: automatically identify optimized asset portfolios, based on user preferences (like min/max % of each stock in your portfolio) and combine different ESG datasets, from AWS Data Exchange, with financial data from third-party data sources.

Dataset Description

In this post, we’ll use the following datasets:

ESG Safeguard – Transcripts Sentiment Dataset

This is the Amenity Analytics dataset description as shown on AWS Data Exchange:

“This trial dataset is industrial-scale NLP applied to earnings call transcripts to develop in-depth, real-time scoring at the company level on ESG issues. It provides company-specific scoring on 12,000 companies globally to track portfolio and company exposures. Scoring derives from net sentiment divided by total neg. and pos. extractions in the transcript, and per ESG topic. Also includes counts making up those scores. *Paid subscriber datasets update daily.*”

To learn more about AWS Partner Amenity Analytics visit www.amenityanalytics.com

RepRisk ESG Data Feed: Dow Jones Industrial Average company universe

This is the RepRisk dataset description as shown on AWS Data Exchange:

“RepRisk ESG Data Feed Data Feeds are customizable data exports of RepRisk ESG metrics and analytics that are easily integrated into internal or third-party systems. The RepRisk ESG Data Feed serves to systematically screen and monitor ESG and business conduct issues related to clients, investments, and/or suppliers. In this sample, you have access to the standard RepRisk ESG Data Feed for the Dow Jones Industrial Average company universe.

To learn more about RepRisk visit https://reprisk.com/

Additionally, for historical stock prices, we’ll use Yahoo Finance dataset. This dataset is used for demonstration purposes and can be replaced with another market dataset of your choice.

For the sake of executing this post, you can also use Amenity Analytics and RepRisk dataset trial version, but in this case, you’ll have significantly smaller datasets available, which will lead to less meaningful results.

ESG Safeguard – Transcripts Sentiment Dataset – Trial Product

The results shown at the end of this post are based on datasets that are slightly larger than the trial datasets.

Architecture and workflow

portfolio optimization functional workflow

Figure 1: Portfolio optimization functional workflow

1 – We take the two ESG datasets, described in the paragraph above, from ADX, and we load them into the FinSpace data catalog.

2 – We access FinSpace environment, load the two ESG datasets from FinSpace data catalog, then we load the third dataset from Yahoo Finance.

The two ESG datasets contain ESG scores for the same subset of companies. These ESG scores are calculated by two different data providers with different criteria, so they can’t be directly compared or combined. We execute a normalization of these scores, so that we can later combine them.

Personalization option: To execute the normalization step, we must define arbitrary thresholds/coefficients. You can use the values provided, or you can change them to fit your needs or to explore different scenarios.

The Yahoo Finance dataset contains historical stock prices. We load this dataset directly into FinSpace environment and we use historically adjusted stock prices to calculate % returns, thereby enabling the comparison of stocks returns, which we’ll execute later.

Once we have % returns and normalized ESG scores, we combine all of these datasets into a single working dataset (i.e., dataframe).

3 – For each stock, we now have the % return and two ESG scores. We apply an arbitrary business logic to define the final ESG score to be considered for each stock. This final ESG score is the score that will be provided to the portfolio optimization that we’ll run later.

Personalization option: Each user may want to apply his/her ‘secret sauce’ here to combine these scores. For example, if a user thrusts the Corporate Governance scores (i.e., the ‘G’ of ESG) coming from a given data provider, then he/she may select the ‘G’ from that data provider and take the E and S from other data providers. Another user may want to average different ESG scores, maybe with different weights.
You can change the logic that we provided in this post to fit your needs, or to explore different scenarios.

4 – Here we introduce the concept of ESG variance and then ESG min/average variance: given the initial datasets (Amenity Analytics and RepRisk), containing ESG scores, we calculate the variance of these ESG scores over time. This is what we call as ESG variance or ESG volatility. The higher the ESG volatility, the higher the ESG score rate of change.

At this step, we set the requirements that will be used later to define our best portfolio. Specifically, we set the minimum and maximum percentage allowed for each stock, the minimum expected portfolio return we require and set to calculate and score the equal weighted portfolio.

Personalization option: Each user can and should configure these preferences to fit their needs and explore different scenarios.

5 – This is the key step where prepared data and defined configurations are used to determine the minimum ESG variance portfolio, a Markowitz style ESG Efficiency Frontier, and the Portfolio that maximizes the modified Sharpe ratio on the frontier. The modified Sharpe ratio is calculated using ESG Volatility (described at Step 4) as opposed to the return volatility and we consider 0 to be the risk free rate.

Personalization option: we decided to set 0 as the risk free rate, and you can easily change it to the risk free rate that you prefer.

Once the analysis is complete, we plot the resulting portfolios as a Markowitz style ESG efficiency frontier and as a pie chart showing the composition of the optimal portfolio.

The following image shows the technical architectures:

Portfolio optimization technical workflow

Figure 2: Portfolio optimization technical workflow

Step 1 is executed on AWS Data Exchange and FinSpace web UI, and it has the objective of populating the FinSpace data catalog with the required ESG datasets.

Steps 2 to 5 are executed on FinSpace managed Jupyter notebooks. FinSpace executes these steps on a managed Amazon EMR Spark cluster, and this is done transparently to the user running the notebooks.

Set-up

To implement the architecture described in this post, you’ll need an AWS account and an FinSpace environment.

If you don’t have an AWS Account, you can create one following the Sign up for AWS guide. It’s best practice to avoid using the AWS account root user for these tasks. You can follow the Create an IAM User guide to create an AWS Identity and Access Management (IAM) User to execute the tasks described in this post.

If you don’t have a FinSpace environment, then you can create one following the Create an Amazon FinSpace environment.

Getting data from AWS Data Exchange

To access the ‘ESG Safeguard – Transcripts Sentiment Dataset’ and ‘RepRisk ESG Data Feed: Dow Jones Industrial Average company universe’ datasets, you can follow the instructions described in How to get data from AWS Data Exchange.

You must execute the above instructions twice:

first for Amenity Analytics ESG Safeguard – Transcripts Sentiment Dataset

second for RepRisk ESG Data Feed: Dow Jones Industrial Average company universe

Optionally, if you want to review a more comprehensive guide, describing all of the AWS Data Exchange subscription options, then you can review the documentation on ‘Subscribing to a product containing file-based data’ in the Subscribing to a data product in the AWS Data Exchange user guide.

How to load data into FinSpace data catalog

There are different options for loading and cataloging data into FinSpace, as explained in How to manually load data into Amazon FinSpace and How to programmatically load data into Amazon FinSpace. You can also load data into FinSpace Using the Amazon FinSpace API and from FinSpace Jupyter notebooks, as shown in this post.

For the purposes of this post, we’ll load data into FinSpace using FinSpace’s web UI. Follow the documentation described in How to manually load data into Amazon FinSpace for both Amenity Analytics and RepRisk datasets that you subscribed to in the previous paragraph.

At the end of these steps, you’ll have a FinSpace data catalog populated with the two datasets.

In the following image, you can browse through the categories on the left-hand navigation pane or select ESG which displays the results.

Searching for ESG dataset in FinSpace data catalog

Figure 3: Searching for ESG dataset in FinSpace data catalog

You can also use the keyword search functionality to search through the catalog for the relevant datasets.

Figure 4: Using FinSpace data catalog to filter search results

You can easily configure and change the categories and field/values attached to each dataset, depending on your preferences.

Step-by-step analysis

Now that the two datasets are loaded into FinSpace, we execute the portfolio optimization analysis.

Once you’re logged into FinSpace’s Web UI, select ‘Go to Notebook’ as shown in the following image.

Figure 5: Click on Go to Notebook to go to FinSpace managed Jupiter notebook

Download “ESG Optimizer Public Demo.ipynb”notebook and upload it by selecting the icon highlighted in the image (you can also drag and drop it).

Figure 6: Amazon FinSpace notebook

Double-click to open the uploaded notebook.
Select FinSpace PySpark kernel, as shown in the following image.

Figure 7: Select FinSpace kernel

Replace dataset ID and dataview ID of Amenity Analytics and RepRisk with the ones that you have in your environment, in the following cell .

Figure 7: Dataset ID and Dataview ID are used to select the specific view on the dataset to be used

As shown in the following image, the notebook contains a summary of the steps to be executed.

Figure 8: This is the list of sections composing the ESG portfolio analysis

Run all of the cells in the notebook to execute the following steps:

1. Create and connect to a FinSpace managed Spark cluster.

Running these cells, you connect (or reconnect) to your https://docs.aws.amazon.com/finspace/latest/userguide/working-with-Spark-clusters.html i.

2. Import package dependencies.

Running these cells, we import the libraries used, like CVXPY for convex optimization and matplotlib for charts.

3. Load and combine FinSpace data views and price data into Spark DataFrames.

Running these cells, we import the two datasets from FinSpace data catalog, we execute alpha score conversion, from letters (AAA, AA, …) to numeric, from 0 to 1, and we map dates to quarters.
Then we combine the two datasets and we load the market price data.

4. Lightly explore data.

Running these cells, we print schemas and preview some rows of data.

5. Prepare data for optimization input.

With these cells, we create two independent matrices: first create a matrix containing the expected returns from the stocks historical values, then we create a covariance matrix of ESG scores (i.e. ESG Score volatility) across all the securities in the portfolio.

6. Define optimization constraints and utility goal.

Running these cells, we define the ESG_Volatility_Optimizer function, which will calculate the Minimum ESG Variance Portfolio and the Mean Variance Analysis; both of these are calculated with respect to ESG Score Volatility.

To get these outcomes, we can set our minimum and maximum percentage holdings for any security, as well as choose a target return for mean-variance.

7. Determine optimal ESG portfolios.

Running these cells, we’ll set the min/max percentage holding, the target returns, and we’ll call the ESG_Volatility_Optimizer function defined in Step 6. As as result, we’ll get the Minimum ESG Variance Portfolio, a Markowitz style ESG Efficiency Frontier, and the Portfolio that maximizes the modified Sharpe ratio on the frontier.

We also print some examples of portfolios produced by the optimizer function (the following image). These are the columns: Portfolio ID, ESG volatility, Percentage returns, Portfolio Modified Sharpe Ratio, and then the list of stocks included in the portfolio. These are the tickers that we’re considering for the purposes of this post. You can add your own data (i.e., full Amenity Analytics and RepRisk dataset or other dataset of your choice) to change the ticker list as required.

Figure 9: This is a sample of the portfolios created and scored by the ESG portfolio optimization function

8. Plot Markowitz style ESG efficiency frontier and optimal portfolio.

Running these cells, we print the efficiency frontier. Each point of the frontier is a minimum ESG volatility portfolio for a given percent return. Black dot and red dot highlight the minimum ESG variance and highest modified Sharpe ratio portfolio, respectively.

Figure 10: ESG portfolio efficiency frontier

Figure 10: ESG portfolio efficiency frontier, based on ESG score volatility

What is this chart telling us? Considering percentage returns, ESG score volatility, constraints we provided, and our portfolio universe, this is the efficiency frontier we get. If we have a portfolio that is on the right of the frontier, then we can move it (i.e., rebalance it) horizontally to the frontier and we expect to have the same returns with lower ESG score volatility.

With the optimal portfolio identified, based on our data and on our chosen parameters, it’s then plotted as a pie chart.

Figure 11: Composition of optimal portfolio, based on the data and constraints we provided.

9. Plot delta between equal weighted portfolio and optimal portfolio.

Running the cells in this section, we calculate and plot the % stock differences between the optimal portfolio and the equal-weighted portfolio.

Fugre 12: Waterfall chart showing the differences between the optimal portfolio shown in Figure 11 and the equal-weighted portfolio.

Figure 12: Waterfall chart showing the differences between the optimal portfolio shown in Figure 11 and the equal-weighted portfolio.

10. Define 3D optimization constraints and utility goal.

Now we have our efficiency frontier and we’ve identified the optimal portfolio.Let’s consider the case in which we want to incorporate in our analysis not only the percentage return and the ESG volatility, but also the absolute ESG score.

To understand why this makes sense, consider this scenario: we could have a stock with low volatility (which is good) and another stock with high volatility (which is bad). And given only this information, we would choose the stock with lower volatility. If we consider also the absolute ESG score, then we could find that the stock with low volatility has a low ESG score (which is bad) and the volatility is low but near a value that’s not acceptable for our portfolio.

On the other hand, we could find that the stock with high ESG score volatility has a very good ESG score average. Is this acceptable? It depends on the volatility, but at least we know that this stock has an average ESG score that’s acceptable for our portfolio, as compared to the first stock which wasn’t.

Given this consideration, we want to calculate and plot the efficiency frontier while also incorporating the ESG scores. What we’ll get is a surface where the z-axis (ESG score) tells us which is the ESG score.Running the cells in this section, we define the optimizer and constraints required to plot the efficiency frontier surface, then we run the optimizer.

11. Plot 3D efficiency frontier.

Running the first cell in this section, we plot the surface, as shown in the following image.

Figure 13: 3D efficiency frontier, created adding ESG absolute score to the efficiency frontier shown in Figure 10.

What is this chart telling us? Consider the efficiency frontier that we plotted in section 8 and imagine plotting this frontier for each value of ESG score. We’ll get a different frontier for each ESG score (each of these frontiers will have their own Max Sharpe and Min Variance portfolio). Then, imagine stacking each of these frontiers in a surface: this is the surface that we plotted above.

The dashed and dotted line highlight the Max Sharpe and Minimum Variance portfolios, respectively, for each value of the ESG score.

To make this easier to understand, we created the following animation which shows the above surface from different angles. The last cell in the notebook is used to plot the same chart with a different angle (degree of rotation), and you can execute it multiple times to get a different view of the surface each time (that is, a higher angle, or degree of rotation, of the surface).

esg optimizer

Figure 14: You can rotate and view by any angle the surface in Figure 13, to better understand surface shape.

What’s next?

The datasets we used in this post are for demonstration purposes; FinSpace is designed to scale to enable you to focus on financial data analysis without having to worry about the infrastructure needed for executing the analysis and managing data. Leveraging AWS Data Exchange and your own datasets, you can run this analysis using additional and wider datasets: wider portfolio universe, deeper history and more scores/metrics.

Some examples of extensions to this analysis are: using price volatility and Sharpe ratio, instead of ESG score volatility and modified Sharpe ratio, and adding a reference index or benchmark. You can also use the personalization options mentioned above and further extend this analysis adding your portfolio optimization constraints.

As we’ve shown above, moving from the line to surface efficiency frontier, you can also incorporate your own dimensions (absolute ESG score, in our analysis) to further tailor the frontier.

Clean up

In case you don’t want to continue using FinSpace, and you want to remove all of the resources deployed with this post (data, analysis, etc.), you can follow the following procedure, showing how to clean up resources:

1) Delete FinSpace environment. Be aware that when you delete an environment, you’re deleting all of the assets (Jupyter notebooks, data, etc.) that you have in that environment. To delete the FinSpace environment:

Go to the Amazon FinSpace landing page, and select Environments.
Select the environment name, and then select the “Delete” button.

2) Delete Amenity Analytics and RepRisk data you put on the Amazon Simple Storage Service (Amazon S3) buckets.

3) Unsubscribe from Amenity Analytics and RepRisk by following the Unsubscribe from a product guide.

Conclusion

In this post, we’ve shown you how to get different ESG datasets from AWS Data Exchange (Amenity Analytics and RepRisk), use FinSpace to normalize and combine these datasets with market price data, and then calculate portfolio efficiency frontier and optimal portfolios based on user provided constraints (i.e., min/max percentage of stocks and others).

The analysis presented in this post is provided as a customizable template: you can use it as described in this post, or you can run it using your data and tailor the analysis to change the requirements, highlighted above as personalization options.

You can also extend this analysis, for example by adding a benchmark/index, use the “traditional” volatility and Sharpe ratio and adding additional constraints like maximum volatility allowed in the portfolio. In the latter case, you can add constraints as inputs to the optimization function.

You can find more data from Amenity Analytics, RepRisk, and other providers on AWS Data Exchange.

You can find additional FinSpace analysis available on Amazon FinSpace Examples GiHub repository.

We welcome your feedback on this post and on any further analysis that you’d like to see implemented.

This post is for educational purposes only and is not meant to provide investment advice.

AWS for Industries