Guidance for Data Lakes with SAP and Non-SAP Data on AWS
Overview
This Guidance demonstrates how to combine and consolidate SAP and non-SAP data from disparate sources using data lakes and machine learning services on AWS. The included AWS CloudFormation template provisions SAP data flows using Amazon AppFlow, an Amazon Simple Storage Service (Amazon S3) data lake with an Apache Iceberg open table format, and AWS Glue data transformations. It is designed to extend on-premises SAP Enterprise Resource Planning (ERP) and RISE with SAP, as well as complement SAP Business Technology Platform (SAP BTP) services such as SAP Datasphere.
How it works
This architecture diagram provides a high-level overview of an enterprise data lake. Your organization can improve decisionmaking and operational processes with a holistic view of transformed and catalogued SAP and non-SAP data.
Get Started
Deploy this Guidance
Use sample code to deploy this Guidance in your AWS account
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
You can fully deploy this Guidance using AWS CloudFormation, which defines infrastructure and application resources as code. This helps you avoid manual errors and maintain consistent deployment across different environments. You can also incorporate this automation into your own development pipeline to enable iteration and consistent deployments across your SAP landscape. Additionally, this Guidance uses Amazon CloudWatch to enhance and centralize observability. CloudWatch provides detailed logs and dashboards so that you can monitor the managed services used for data extraction and transformation.
Security
This Guidance uses AWS Identity and Access Management (IAM) for secure validation of user identity. AWS IAM Identity Center provides centralized identity management and granular access control, and managed services only have access to the data that is specified. Access to the SAP workload occurs using Amazon AppFlow and AWS Glue, which encrypt data in transit and at rest. You can also use Amazon CloudTrail to log API calls and to monitor and log access to sensitive data and resources. This can help you comply with regulations like the Payment Card Industry Data Security Standard, the Health Insurance Portability and Accountability Act, and the General Data Protection Regulation.
Reliability
The serverless components of this Guidance are highly available and automatically scale. Further increasing reliability, Amazon AppFlow and AWS Glue can move large volumes of data without breaking it down into multiple batches. A fully managed integration service, Amazon AppFlow securely transfers data between source applications and AWS services. It provides a reliable and scalable way to move data, with features like automatic retries, error handling, and monitoring. AWS Glue provides a reliable and scalable data processing pipeline, with features like automatic scaling, fault tolerance, and checkpoint-based recovery. Additionally, Amazon S3 offers industry-leading scalability, data availability, security, and performance for your data lake.
Performance Efficiency
By using serverless technologies, you only provision the exact resources you use. Amazon S3 optimizes storage for your data lake architecture and automatically scales to meet demand without any need for manual intervention. It also offers low-latency data retrieval, enabling quick access to the data you need. AWS Glue automates tasks like data preparation, discovery, transformation, and loading. This improves the performance and efficiency of your data processing pipelines, reducing the time and resources required to prepare data for analysis. Finally, Amazon AppFlow automates the data integration process, reducing the time and effort required to move data between different systems.
Cost Optimization
This Guidance uses serverless technologies that scale based on demand so that you only pay for the resources you use. Amazon S3 offers cost-effective storage, and you can use data tiering to organize your data by access level to optimize storage costs. Additionally, AWS Glue provides pay-per-use compute resources for data processing, and Amazon AppFlow streamlines data integration workflows. To further optimize costs, you can choose to extract only the business data groups that you need, and you can minimize the number of flows implemented based on the granularity of your reporting needs.
Sustainability
By using managed services and dynamic scaling, this Guidance minimizes the environmental impact of the backend services. Additionally, Amazon AppFlow and AWS Glue are fully managed services that use an energy-efficient cloud infrastructure. This removes the need for you to provision and manage infrastructure, reducing the energy consumption and carbon footprint associated with running and maintaining physical servers.
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages