[SEO Subhead]
This Guidance demonstrates how to combine and consolidate SAP and non-SAP data from disparate sources using data lakes and machine learning services on AWS. The included AWS CloudFormation template provisions SAP data flows using Amazon AppFlow, an Amazon Simple Storage Service (Amazon S3) data lake with an Apache Iceberg open table format, and AWS Glue data transformations. It is designed to extend on-premises SAP Enterprise Resource Planning (ERP) and RISE with SAP, as well as complement SAP Business Technology Platform (SAP BTP) services such as SAP Datasphere.
Note: [Disclaimer]
Architecture Diagram
[Architecture diagram description]
Step 1
Data Integration & Management for SAP solutions on AWS let you extract SAP data by using Odata Operational Data Provisioning (ODP) and using SAP Business Warehouse (BW) extractors, ABAP Core Data Services (CDS) views, and SAP Landscape Transformation Replication Server (SLT) table replication.
Step 2
You can also use AWS Glue to extract data from non-SAP systems and combine it with SAP data.
Step 3
The raw layer in Amazon Simple Storage Service (Amazon S3) holds the extracted SAP data. The enriched layer contains a true representation of the data available in the source system, using the Apache Iceberg open table format for advanced data functionality, including time travel and UPSERT operations.
The curated layer holds data that is ready for consumption (including calculated fields), such as for creating data warehouses or data marts.
Step 4
AWS Glue is used for data processing. Data is propagated through the layers and inserted, updated, and merged based on applied business logic. AWS Glue jobs identify new, changed, and deleted records based on the indicators provided through the ODP framework.
Before and after images, as well as duplicated records generated by CDS views, are also handled within these jobs so that all data is reconciled and data processing is optimized for performance and cost.
Step 5
Curated data can be consumed by a variety of data and analytics services. For example, Amazon Athena, Amazon Redshift for data warehousing, Amazon QuickSight for data visualization and reporting (using Amazon Q), Amazon SageMaker for artificial intelligence and machine learning (AI/ML), and Amazon Bedrock for generative AI.
Step 6
Data is cataloged in the AWS Glue Data Catalog for technical usage and Amazon DataZone in a business catalog by domain. Optionally, you can apply fine-grained access control through AWS Lake Formation.
Step 7
Data pipelines are centrally orchestrated and monitored with AWS Step Functions.
Get Started
Deploy this Guidance
Well-Architected Pillars
The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
You can fully deploy this Guidance using AWS CloudFormation, which defines infrastructure and application resources as code. This helps you avoid manual errors and maintain consistent deployment across different environments. You can also incorporate this automation into your own development pipeline to enable iteration and consistent deployments across your SAP landscape. Additionally, this Guidance uses Amazon CloudWatch to enhance and centralize observability. CloudWatch provides detailed logs and dashboards so that you can monitor the managed services used for data extraction and transformation.
-
Security
This Guidance uses AWS Identity and Access Management (IAM) for secure validation of user identity. AWS IAM Identity Center provides centralized identity management and granular access control, and managed services only have access to the data that is specified. Access to the SAP workload occurs using Amazon AppFlow and AWS Glue, which encrypt data in transit and at rest. You can also use Amazon CloudTrail to log API calls and to monitor and log access to sensitive data and resources. This can help you comply with regulations like the Payment Card Industry Data Security Standard, the Health Insurance Portability and Accountability Act, and the General Data Protection Regulation.
-
Reliability
The serverless components of this Guidance are highly available and automatically scale. Further increasing reliability, Amazon AppFlow and AWS Glue can move large volumes of data without breaking it down into multiple batches. A fully managed integration service, Amazon AppFlow securely transfers data between source applications and AWS services. It provides a reliable and scalable way to move data, with features like automatic retries, error handling, and monitoring. AWS Glue provides a reliable and scalable data processing pipeline, with features like automatic scaling, fault tolerance, and checkpoint-based recovery. Additionally, Amazon S3 offers industry-leading scalability, data availability, security, and performance for your data lake.
-
Performance Efficiency
By using serverless technologies, you only provision the exact resources you use. Amazon S3 optimizes storage for your data lake architecture and automatically scales to meet demand without any need for manual intervention. It also offers low-latency data retrieval, enabling quick access to the data you need. AWS Glue automates tasks like data preparation, discovery, transformation, and loading. This improves the performance and efficiency of your data processing pipelines, reducing the time and resources required to prepare data for analysis. Finally, Amazon AppFlow automates the data integration process, reducing the time and effort required to move data between different systems.
-
Cost Optimization
This Guidance uses serverless technologies that scale based on demand so that you only pay for the resources you use. Amazon S3 offers cost-effective storage, and you can use data tiering to organize your data by access level to optimize storage costs. Additionally, AWS Glue provides pay-per-use compute resources for data processing, and Amazon AppFlow streamlines data integration workflows. To further optimize costs, you can choose to extract only the business data groups that you need, and you can minimize the number of flows implemented based on the granularity of your reporting needs.
-
Sustainability
By using managed services and dynamic scaling, this Guidance minimizes the environmental impact of the backend services. Additionally, Amazon AppFlow and AWS Glue are fully managed services that use an energy-efficient cloud infrastructure. This removes the need for you to provision and manage infrastructure, reducing the energy consumption and carbon footprint associated with running and maintaining physical servers.
Related Content
[Title]
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.