
Overview
Overview
The ESG Book Estimated Emissions Data Module provides investors with estimated emissions for ~45,000 public corporate entities that do not disclose their emissions. The dataset includes estimations for Scope 1, Scope 2, and Scope 3 (total) emissions, as well as the 15 Scope 3 Categories in tonnes of CO2 equivalents. A confidence rating is also provided alongside each estimated emissions figure, indicating the degree of accuracy of the estimation based on the amount of available data used in the estimation process. Importantly, PCAF data quality indicators are included.
The dataset additionally includes the actual reported emissions data of public companies. We currently cover 4000 public companies, where approximately half of them disclose their emissions data. Our Estimated Emissions Data Module thus significantly expanding the coverage of emissions data for use in portfolio analysis and index creation, for instance.
Methodology Overview
Emissions are estimated using the Extreme Gradient Boosting (XGBoost) Model. The model is an unsupervised machine learning model which identifies and analyses complex relationships between large numbers of predictor variables to generate estimations for unknown data. In this case, the model identifies the relationship between 15 financial and non-financial predictor variables and emissions for each region, country, sector and industry to estimate the emissions of companies which are not disclosing emissions data.
We have chosen to use a machine learning estimation model rather than a traditional statistical regression model for several key reasons. Firstly, the XGBoost model (machine learning model) is able to handle non-linear relationships. As the predictor variables might be non-linearly correlated with emissions (for instance, a company with 500 employees might not generate 5 times the emissions of a company with only 100 employees due to economies of scale), the ability of the XGBoost model to handle non-linear relationships provide an extra layer of robustness to accurately capture the relationships between the predictor variables and emissions.
Secondly, the XGBoost model is able to handle missing data unlike conventional regression models or other machine learning models such as Adaptive Boosting. Though 15 predictor variables are used in the model, all 15 datapoints might not be available for all companies. As such, a threshold of datapoints is set such that the model will estimate emissions for companies which meet this minimum data threshold. Conventional regression models are unable to account for this missing data, where this missing data has to be interpolated, or simply replaced with zeros. This introduces higher order errors into the model, reducing the accuracy of the emissions estimations due to the ambiguity of input data. This issue does not affect the XGBoost model due to its ability to handle missing data.
Lastly, the XGBoost model uses a decision-tree algorithm to identify and analyse the complex relationships between the predictor variables and emissions, which is subsequently used in the estimation process. This allows for greater accuracy as the decision tree process corrects the mistakes of the previous trees. The parameters of the model are fine-tuned to increase the precision of estimations. This is done using the Optuna4 , an open source hyperparameter optimization framework, that tests different configurations of hyperparameters on a holdout test set to determine the optimal values for a given regression.
Overall, due to the reasons explained above, the XGBoost model shows better accuracy when compared to traditional statistical models such as the Ridge Regression model or other machine learning models such as the Adaptive Boost model.
Use Cases
The Emissions data can be instrumental for Asset Managers and Corporates:
Portfolio Management
Emissions data can be used by Portfolio Managers during portfolio construction for:
- Exclusion - The screening out of companies that are not aligned with the Paris Agreement temperature goals.
- Carbon Intensity - The scaling of emissions data by financial metrics to compute carbon intensities, monitor the portfolio and benchmark against other portfolios
- TCFD & SFDR Reporting - The reporting of climate-related financial metrics to understand the climate-related risks and opportunities of the companies within a portfolio
- Portfolio Alignment to Climate Goals - Identify to what extent a portfolio is aligned with the Paris Agreement to minimise exposure to carbon-intensive companies
- Regulation Compliance - Generate voluntary TCFD disclosures on how climate-related risks and opportunities are factored into relevant investment strategies
- Alignment to investor demand- Increasing number of investors require asset managers to integrate climate risks and opportunities into their investing strategy
Corporates
Emissions data includes climate metrics related to emissions, reporting, policy and frameworks and enables:
- Tailored benchmarking - The quality and granularity of the data allows corporations to analyse their climate performance against direct peers, industry, sector and region
- Climate Reporting - The identification of climate-related topics that need to be reported on for a company to stay ahead of its peers
- Tailor-made Comparison Metrics - Combining emissions data with financial metrics such as revenue or EBITDA or non-financial metrics such as production quantity enables the creation of innovative carbon intensity metrics relevant to each company
- Market Positioning & Differentiation - Understand which climate-related topics corporations need to report on to be a leader among their peers
Metadata
| Meta Data | Information |
|---|---|
| Update Frequency | Weekly |
| Data Source(s) | Estimations produced using ESG Book raw emissions data and ESG raw data. Financial data from third-party provider |
| Geographic coverage | Global |
| Time period coverage | Present |
| Is historical data “point-in-time” | YES |
| Raw or scraped data | All input data is collected from public sources such as Annual Reports, CSR Reports, Investor Relation Presentations and Reports, ESG reports, Company Websites |
| Number of companies covered | ~37,000 |
| Standard entity identifiers | Ticker (please contact for information on other identifiers) |
Pricing Information
Pricing is determined on a use-case basis, thus please contact for more information.
When requesting please include the following information:
- Organization Name
- Position (non-mandatory)
- Business Email Address or Telephone Number
- Country
- Use-case
Regulatory and Compliance Information
This product is allowed for internal use only, users are not allowed to distribute the data externally.
If you're interested in a re-distribution of data use case, please contact us.
Need Help?
- If you have questions about our products, contact us at support@esgbook.com
About Your Company
Details
Introducing multi-product solutions
You can now purchase comprehensive solutions tailored to use cases and industries.
Features and programs
Financing for AWS Marketplace purchases
Pricing
Dimension | Description | Cost/12 months |
|---|---|---|
Product Access | Dimension that grants access to the product for subscribers. | $15,000.00 |
Vendor refund policy
No refunds.
How can we make this page better?
Legal
Vendor terms and conditions
Content disclaimer
Delivery details
AWS Data Exchange (ADX)
AWS Data Exchange is a service that helps AWS easily share and manage data entitlements from other organizations at scale.
Additional details
You will receive access to the following data sets.
Resources
Vendor resources
Similar products



