Guidance for Building a SAP Cloud Data Warehouse on AWS

[SEO Subhead]

I'm ready to deploy

This Guidance shows how to extract data and business logic from SAP systems to build a data warehouse that integrates the business context and logic embedded within the SAP system. Users can select functional areas such as Order-to-Cash (including customers, sales orders, customer deliveries, and invoices) and Procure-to-Pay (including vendors, purchase orders, good receipts, and vendor invoices).

Included are AWS CloudFormation templates that deploy the required data models, translating the technical data architecture into business-friendly terms and relationships. Additionally, this Guidance provides near real-time, simple, and adaptable data pipelines, with incremental change data capture (CDC) processes, conversion rules, and automatic inclusion of custom fields. This comprehensive approach delivers high-quality, contextual data to enable the creation of reports and the performance of advanced analytics with SAP and non-SAP data at speed, supporting data-driven decision making.

Architecture Diagram

Download the architecture diagram PDF

Overview
SAP metadata replication
Amazon Redshift data marts

Overview
This architecture diagram shows how to build a cloud data warehouse on AWS by extracting data from SAP using the OData protocol. You can use the data warehouse to model and combine SAP data with that of other sources loaded into the data warehouse. The next two tabs show metadata replication and data marts, respectively.

Step 1
Configure operational data provisioning (ODP) for extraction in the SAP Gateway of your SAP system.

Step 2
Create the OData system connection from Amazon AppFlow to your SAP source system. This is through AWS PrivateLink for SAP on AWS. You can connect with AWS through a virtual private network (VPN), AWS Direct Connect, or over the internet.

Step 3
In Amazon AppFlow, create the flow using the SAP source created in step 2. Run the flow to extract data from SAP and save it to an Amazon Simple Storage Service (Amazon S3) bucket.

Step 4
Use an AWS Glue crawler to create a data catalog entry with metadata for the extracted SAP data in an Amazon S3 bucket.

Step 5
Load data into Amazon Redshift through simple ‘COPY’ commands. Model the data with other non-SAP sources in your data warehouse.

Step 6
Create the dataset in Amazon QuickSight with Amazon Redshift as the data source.

Step 7
Create a dashboard to visualize the business data according to user requirements. Use inbuilt machine learning (ML) and insight features to help enable speed to insight.

Step 8
Deploy AWS Step Functions for overall orchestration and alerting of your data pipelines and processes.

Step 9
The generative BI capabilities of Amazon Q within Amazon QuickSight offer insights from the data.

Click to enlarge

Step 1
Configure operational data provisioning (ODP) for extraction in the SAP Gateway of your SAP system.

Step 2
Create the OData system connection from Amazon AppFlow to your SAP source system. This is through AWS PrivateLink for SAP on AWS. You can connect with AWS through a virtual private network (VPN), AWS Direct Connect, or over the internet.

Step 3
In Amazon AppFlow, create the flow using the SAP source created in step 2. Run the flow to extract data from SAP and save it to an Amazon Simple Storage Service (Amazon S3) bucket.

Step 4
Use an AWS Glue crawler to create a data catalog entry with metadata for the extracted SAP data in an Amazon S3 bucket.

Step 5
Load data into Amazon Redshift through simple ‘COPY’ commands. Model the data with other non-SAP sources in your data warehouse.

Step 6
Create the dataset in Amazon QuickSight with Amazon Redshift as the data source.

Step 7
Create a dashboard to visualize the business data according to user requirements. Use inbuilt machine learning (ML) and insight features to help enable speed to insight.

Step 8
Deploy AWS Step Functions for overall orchestration and alerting of your data pipelines and processes.

Step 9
The generative BI capabilities of Amazon Q within Amazon QuickSight offer insights from the data.
SAP metadata replication
This architecture uses Amazon Lambda and SAP OData to replicate metadata and create Amazon Redshift Data Definition Language (DDL) tables. A Python script with PyOdata queries SAP OData sources and generates DDL to create tables in Amazon Redshift and the AWS Glue Data Catalog.

Step 1
Configure ODP for extraction in the SAP Gateway of your SAP system.

Step 2
Configure and run the SAP Metadata Replication AWS Lambda function.

Step 3
Lambda retrieves the SAP credentials from AWS Secrets Manager.

Step 4
Lambda reads the config file from Amazon S3.

Step 5
Lambda creates Data Definition Language (DDL) statements and runs them in Amazon Redshift.

Step 6
Lambda creates AWS Glue tables in the AWS Glue Data Catalog.

Click to enlarge

Step 1
Configure ODP for extraction in the SAP Gateway of your SAP system.

Step 2
Configure and run the SAP Metadata Replication AWS Lambda function.

Step 3
Lambda retrieves the SAP credentials from AWS Secrets Manager.

Step 4
Lambda reads the config file from Amazon S3.

Step 5
Lambda creates Data Definition Language (DDL) statements and runs them in Amazon Redshift.

Step 6
Lambda creates AWS Glue tables in the AWS Glue Data Catalog.
Amazon Redshift data marts
This architecture diagram illustrates how the Redshift data mart layers are used. With the Slowly Changing Dimension Type 2 (SCD2) data modeling technique, you have the full history of your data movements that you can query.

Step 1
The corporate memory layer contains all SAP extracted data in Amazon S3.

Step 2
The operational data store layer is used to temporarily hold the data until the data is loaded into the Data Mart (DM) and the architected data mart (ARCHDM) layers.

Step 3
The propagation layer includes the Slowly Changing Dimension Type 2 (SCD2) tables or tables that will contain the entire history of changes.

Step 4
The virtual data mart layer provides the presentation layer through the use of materialized views in Amazon Redshift.

Click to enlarge

Step 1
The corporate memory layer contains all SAP extracted data in Amazon S3.

Step 2
The operational data store layer is used to temporarily hold the data until the data is loaded into the Data Mart (DM) and the architected data mart (ARCHDM) layers.

Step 3
The propagation layer includes the Slowly Changing Dimension Type 2 (SCD2) tables or tables that will contain the entire history of changes.

Step 4
The virtual data mart layer provides the presentation layer through the use of materialized views in Amazon Redshift.

Get Started

Deploy this Guidance

Sample code

Use sample code to deploy this Guidance in your AWS account

Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

Operational Excellence

Observability is derived from the managed services used for data processing, with process-level metrics, logs, and dashboards available through Amazon CloudWatch. These services provide valuable insights into your operations, enabling the continuous improvement of your underlying processes and procedures.

Read the Operational Excellence whitepaper
Security

The managed services used in this Guidance are granted access only to the specified data, with access to the SAP workload facilitated through Amazon AppFlow, which supports PrivateLink to create private data flows between AWS services. Data is encrypted both in transit and at rest, and data stored in Amazon S3 is secured from unauthorized access through the use of encryption features and access management tools. Moreover, the Amazon Redshift data warehouse cluster is isolated within your virtual private cloud (VPC).

These services support robust security measures, as the serverless components within the architecture are protected through AWS Identity and Access Management (IAM)-based authentication for secure validation of user identities.

Read the Security whitepaper
Reliability

Amazon AppFlow is capable of handling large data volumes without the need to break them down into multiple batches, thereby enhancing the overall reliability of the data transfer process. Furthermore, Amazon Redshift offers several features, such as multi-Availability Zone (AZ) deployment, that serve to bolster the reliability of the data warehouse cluster. Amazon Redshift also continuously monitors the health of your system, automatically replicating data from failed drives and replacing nodes as necessary for fault tolerance. Lastly, all the serverless components in this Guidance are designed to be highly available, while the non-SAP components allow for automatic scaling.

Read the Reliability whitepaper
Performance Efficiency

By using Amazon S3 for the corporate data memory, the storage capabilities of this Guidance are optimized. The processing of the data is then performed within the Amazon Redshift environment. Additionally, to enhance performance and agility, multiple flows are configured in Amazon AppFlow for the different groups of business data.

Read the Performance Efficiency whitepaper
Cost Optimization

By using serverless technologies, you pay only for the resources you use. To further optimize cost, extract only the business data group you need and minimize the number of flows being run based on the granularity of your reporting needs. Notably, the Amazon S3 Lifecycle configuration policies allow you to manage the objects so that they're stored cost-effectively throughout their lifecycle.

Read the Cost Optimization whitepaper
Sustainability

With managed services and dynamic scaling, you minimize the environmental impact of the backend services. As new features or capabilities become available for Amazon AppFlow, consider adopting those updates so that the data warehouse can continuously improve its efficiency and performance and meet your evolving business needs over time. Lastly, reducing the quantity and frequency of extraction improves sustainability, helps reduce cost, and improves the overall performance of your workloads.

Read the Sustainability whitepaper

Related Content

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.

References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.

Was this page helpful?

Feedback