Guidance for SAP Sustainability Data Lake on AWS

This Guidance demonstrates how to combine and consolidate greenhouse gas emissions data from SAP and non-SAP sources using AWS services. Customers who use Enterprise Resource Planning (ERP) solutions to manage and optimize their business processes can build a data lake that facilitates the generation of carbon footprint insights.

Architecture Diagram

Download the architecture diagram PDF

Guidance Architecture Diagram for SAP Sustainability Data Lake on AWS

Step 1
Customer Emissions & Activity Data can be sourced from various systems, including SAP S/4HANA Sustainability, SAP ERP Central Component (SAP ECC)/ SAP Business Warehouse (BW), SAP Manufacturing Execution System (MES), and SAP Transportation Management and Logistics System (TMS).

Data can also be sourced from Software-as-a-Service (SaaS) apps, file shares, AWS Data Exchange, AWS Customer Carbon Footprint Tool, and Internet of Things (IoT) devices.

Step 2
Data is ingested into the customer’s account through various ingestion mechanisms, depending on the source. Data can be ingested using AWS IoT Core, Amazon Kinesis Data Streams, Amazon AppFlow, AWS Database Migration Service (AWS DMS), Amazon API Gateway, AWS Transfer Family, or AWS DataSync.

Step 3
Amazon Simple Storage Service (Amazon S3) provides a single landing area for all ingested emissions and business activity data. Data ingress to the landing zone bucket triggers the data pipeline.

Step 4
AWS Step Functions orchestrates the data pipeline that includes data quality checks, data compaction, transformation, standardization, and enrichment using AWS Glue.

Step 5
The enriched emission data is then stored in Amazon S3 in a format optimized for consumption and made available to various downstream consumers.

Step 6
Manage the data lake using AWS Glue crawlers to infer new table schemas from objects in Amazon S3 for storage in the AWS Glue Data Catalog. AWS Lake Formation enables permissions access controls to govern the access to the data lake objects.

Step 7
Analyze and visualize your data using Amazon Athena and Amazon QuickSight, or load the data into Amazon Redshift for powerful data warehousing uses.

Empower your artificial intelligence and machine learning (AI/ML) workloads using Amazon SageMaker and Amazon Forecast. Application interface stacks can use AWS Lambda for calculating emissions and AWS Amplify for preconfigured web application management.

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

To respond to incidents and events while operating this Guidance, Amazon CloudWatch can be seamlessly integrated to collect and visualize logs, metrics, and event data. This allows customers to create alarms that alert them of operational anomalies.

Read the Operational Excellence whitepaper
Security

We recommend data be encrypted at rest using AWS Key Management Service (AWS KMS) with customer-managed AWS KMS keys. The keys should be rotated on a regular schedule. Services like Kinesis Data Streams, AWS Glue, and Amazon S3 all integrate with AWS KMS for easy encryption. For data in transit, customers should ensure any application connections require SSL/TLS.

Read the Security whitepaper
Reliability

This Guidance is designed with services that have initial service limits that accommodate a large majority of customer workloads. If necessary, service quotas can be expanded. For example, a customer can increase the number of concurrent executions of AWS Glue jobs or concurrent active data manipulation language (DML) queries in Athena.

Read the Reliability whitepaper
Performance Efficiency

This Guidance uses serverless managed services that automatically scale up and down in response to changing demand, reducing resource overhead.

Storing data in Amazon S3 allows consumers to bring various tools or services to their data, dependent on their needs. For example, customers can query data directly in Amazon S3 using Athena, or they can use QuickSight for a business intelligence (BI) dashboard.

Read the Performance Efficiency whitepaper
Cost Optimization

This Guidance relies on serverless AWS services like AWS Glue, Step Functions, and Athena that are fully managed and automatically scale according to workload demand. As a result, customers only pay for what they use.

Read the Cost Optimization whitepaper
Sustainability

Data in Amazon S3 can be stored in more efficient file formats (such as Parquet) to prevent unnecessary processing and reduce the overall storage required.

Amazon S3 lifecycle policies can automatically move less volatile data to more energy-efficient storage classes (such as Amazon S3 Glacier) that use magnetic storage rather than solid state memory. Deletion timelines can also be enforced to minimize overall storage requirements.

Read the Sustainability whitepaper

Implementation Resources

A detailed guide is provided to experiment and use within your AWS account. Each stage of building the Guidance, including deployment, usage, and cleanup, is examined to prepare it for deployment.

The sample code is a starting point. It is industry validated, prescriptive but not definitive, and a peek under the hood to help you begin.

Open implementation guide

Open sample code on GitHub

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

[Title]

Disclaimer

Was this page helpful?

Guidance for SAP Sustainability Data Lake on AWS

Architecture Diagram

Well-Architected Pillars

Implementation Resources

Related Content

[Title]

Disclaimer

Was this page helpful?

Ending Support for Internet Explorer