AWS for Industries

Data architectures to track Scope 3 emissions for retailers


Sustainability is a topic of increasing interest and concern after the recent publication from the Intergovernmental Panel on Climate Change (IPCC). On April 4, 2022, the IPCC published the third part (WG III) of the Sixth Assessment Report (AR6), entitled Climate Change 2022: Mitigation of Climate Change. This report explains that current plans to address climate change are not ambitious enough to limit global warming to 1.5°C. For reference, limiting warming to approximately 1.5°C requires global greenhouse gas emissions to peak before 2025 at the latest and be reduced by 43 percent by 2030.

The retail industry has come under scrutiny to provide greater transparency of its greenhouse gas emissions. In the first article of this blog post series, we outlined the three main categories used in carbon emissions reporting—Scope 1, Scope 2, and Scope 3—and described the difficulty of accurately measuring and reporting Scope 3 emissions. The British Retail Consortium (BRC) estimates that for grocery retailers, the sector responsible for the majority of the overall emissions within retail, Scope 3 emissions account for 80–90 percent of total emissions. Similarly, the World Business Council for Sustainable Development (WBCSD) published in 2018 that 93 percent of retail industry carbon footprint is attributed to Scope 3 emissions.

Since the UK’s packaging tax took effect on April 1, 2022, and Scotland’s ban on single-use plastics started to be enforced on June 1, 2022, sustainability targets and regulations have become a changing landscape. In the United States, for example, California’s SB 54 requiring a 25 percent reduction in plastic packaging and food ware by weight and item count by 2032 is under consideration. To meet these new business demands, retailers need an IT architecture flexible enough to adapt. The architecture must help users to combine datasets and rapidly integrate with third-party data providers and consumers with minimal operational overhead.

In the previous blog post, we presented a high-level architecture capable of gathering the necessary data for emissions calculations and reporting. We also discussed what sustainability means and its increasing importance in consumer trends affecting purchase decisions. For instance, Retail Economics reveals that one in three consumers would be willing to spend more when shopping at a grocery retailer with environmentally friendly products and credentials.

In this blog post, we share more details of a reference architecture that could be adopted by retailers not only to address reporting on emissions but also to help them to create new consumer services and experiences with a focus on sustainability.

What data should we be collecting?

To understand the data needed to make emissions calculations possible, we first need to understand the boundaries of Scope 1, 2, and 3 emissions.

Scope 3 data and data gap management

Understanding baseline activity is the first step to successfully tracking Scope 3 emissions. This baseline allows organizations to track their progress toward their strategic sustainability goals. It also enables businesses to make smarter decisions when budgeting initiatives and establishing partnerships. Our customers have highlighted a collection of Scope 3–relevant data, such as product and transport carbon footprint, as a critical challenge when measuring and baselining their carbon footprint.

Retailers have shared that collecting accurate and quality data from their suppliers is a time- and cost-consuming task. Due to retailers’ complex supply chain structure, including producers, manufacturers, and transporters, these intermediary entities might lack adequate processes to report on their own emissions and sustainability metrics.

To fill these data gaps, data providers and industry reporting organizations use “proxy” metrics. Proxy metrics and proxy methodologies are used to estimate incomplete data impact and are based on using available data related to the metric being evaluated. For example, financial and sales data can be used to estimate missing vehicle trip distance. Alternatively, similar metrics can be used to infer missing values. The carbon footprint of similar product categories, for instance, could be a suitable substitute for a specific SKU. Lastly, older historical data could be used in the absence of more recent information. An example of using proxy metrics to evaluate improvements in IT workloads is given in the Sustainability Pillar of the AWS Well Architected Framework.

Data platform options

The choice of a specific data collection mechanism will depend on the retailer’s data management and governance strategy, its data maturity, and its operating model. Retailers who have built or are in the process of building a data lake/lake house architecture can use the same approach for ingesting, storing, processing, and analyzing the data needed for their Scope 3 emissions calculations.

The diagram in figure 1 illustrates the modern data architecture approach for customer data in the real world and the data movement required between all the data analytics services and data stores, inside-out, outside-in, and around the perimeter.

Fig. 1: Lake house pattern

More details on the lake house architecture in AWS can be found in this blog post. Taking this architecture as a starting point, we can next illustrate how it can be adapted for carbon emissions data. The AWS services in figure 2 are a suggestion to showcase different options on how data can be ingested, stored, processed, and presented. Each retail organization would need to have individual guidance on how to implement these architectures.

Fig. 2: Centralized data architecture for Scope 3 emissions

The architecture shown above would work for retailers that have a centralized data operating model with the corresponding infrastructure and data pipelines in place. The Scope 3 emissions data would be a new pipeline that can ingest the data in the enterprise lake house, making it available for consumption and use as input to the Scope 3 emissions calculation engine.

Alternatively, we suggest a hybrid approach for retailers seeking a more flexible operating model to accommodate new business cases like those in sustainability use cases. These novel use cases include providing end consumers with the carbon footprint information of their basket contents, recommending alternatives that lower this metric, or improving the visibility of Scope 3 emissions associated with specific suppliers in order to support better decision-making during the product buying process.

This hybrid approach will benefit from an existing centralized data architecture component combined with a data mesh pattern. In this paradigm, a separate team, distinct from the one managing the enterprise lake house, will have a federation of responsibilities for data management of sustainability and Scope 3 emissions data.

The reference architecture for this second approach would look similar to the one in figure 3.

Fig. 3: Hybrid data platform for Scope 3 emissions

Below we describe the components illustrated in the hybrid data platform diagram.

Data sources

  • Enterprise lake house: In the above section, we assume that retailers have, or are in the process of building, an enterprise lake house that holds the organization’s data. This will include data from systems or records (ERP), financial data, product catalog and taxonomy data, and transactional and customer data. This component can be further broken down into business unit–specific data domains, each one exposing data from their data lake house as needed.
  • Third-party datasets/APIs: This component includes a set of public datasets (e.g., IPCC AR5 GWPs) and data subscriptions from AWS Data Exchange data providers
  • Supplier portal: This is a supplier-facing app where suppliers can log and report on a set of questions relevant to the accounting methodology to be used later on.
  • Supplier lake house: This is the suppliers’ proprietary data, shared with the retailer for the Scope 3 emission calculations, such as vehicle and trips information or product and packaging information.

Data quality component

This component will be responsible for auditing the ingested datasets and alerting users on any changes in the expected schema, missing or inconsistent values, and outliers based on historical values.

Carbon database

This component will hold the sustainability metrics, either ingested from external sources or calculated internally. It will provide input to the emissions calculations by providing emissions factors and proxy metric values when needed. The emissions factor is a metric that converts an activity to an amount of a greenhouse gas emitted and can be informed through a lifecycle database (e.g., GaBi).

Emissions calculation engine

This component will calculate the Scope 3 emissions. It will receive inputs from the emissions storage and the carbon database and apply the necessary calculations based on calculation methodologies that are applicable to the different categories of emission factors, which in turn depend on the completeness of the data collected and the desired business outcome. The engine will gather the various activities—either measured, calculated, or estimated by a proxy—and display an emissions factor relevant to the activity from the carbon database and a metric of the impact of each greenhouse gas on global warming to infer the gas’s CO2 equivalent. The exact calculation methodology will depend upon the framework chosen by the customer.

Emissions ledger

The ledger will preserve the outcomes of the emission calculations and can be decomposed on a per domain basis, if needed. We suggest using a ledger database for this component, because it is important to maintain immutability of the records for reporting, auditing, and regulation compliance. The data can never be tampered with or deleted after having been committed to the ledger and can be verified at any time for accuracy and authenticity.

Presentation layer

The data can be pushed back into the presentation layer of the enterprise lake house and be used by the systems below:

  • Reporting systems: These systems will produce the necessary reporting in a cadence needed by the business.
  • Intelligence systems: This component includes analytics systems and machine learning systems used for iterative improvement and modeling as well as the creation of insights to enhance business use cases.
  • API layer: This component will expose APIs to the rest of the organization and external parties.

Tools and technology

The retailer’s team of builders can implement the above high-level architecture by building the components in-house. AWS helps companies of all sizes and across all sectors to build and implement solutions that meet their sustainability goals.

Learn more about AWS Sustainability solutions >>

Additionally, retailers can build the Scope 3 emissions data platform and the calculation engine by working with AWS Partners and leverage their SaaS offerings powered by AWS to bring these capabilities into their organizations.

For example, AWS Partner Altruistiq, enables retailers and other large enterprises to automate sustainability measurement, management, and exchange—empowering businesses with the technology infrastructure to translate ambition into action with unparalleled accuracy and ease.

Learn more about Altruistiq on AWS >>

What’s next?

For retailers embarking on this venture, we suggest prioritizing data collection and data inventory assessments that will show the gaps in data, if any. This process will help the organization to baseline and decide which emission calculation method is most appropriate and where they will need to proceed with proxies in the reporting process.

Stay tuned for the next post, which will showcase how retailers can use the data gathered and calculated from the platforms above to innovate with a sustainability focus on behalf of their customers.

To learn more about how AWS can help you reach your sustainability goals, contact your account team today to get started or visit AWS Retail.