Archive, Manage, and Leverage SAP Documents on AWS with Syntax CxLink Documents
By Chandrasekhar Chittuluru, Partner Solutions Architect, SAP – AWS
By Marcel Törpe, Partner Solutions Architect, SAP – AWS
By Soumya Sekhar Das, Partner Solutions Architect, SAP – AWS
By Mario De Felipe Díaz, Global Director, SAP on AWS – Syntax
SAP customers have been hosting their mission-critical SAP workloads on Amazon Web Services (AWS) since 2011 and AWS has been certified to host SAP production systems since 2012.
Today, more than 5,000 active SAP customers are hosting their SAP environment on AWS and securing, modernizing, and innovating their SAP ERP applications with a broad and deep set of services and features that AWS provides.
As customers continue their digital transformation journey, they are looking for better ways to manage big data along with the high volume of documents SAP systems receive daily. That volume is only increasing with digitization and is predicted to grow exponentially in the near future.
Customers are demanding a single, reliable document extraction solution with secure storage that addresses long-term growth and storage goals. Having the ability to classify documents and extract selected text will simplify the overall storage and archival operations. As SAP ERP is the economical backbone for thousands of AWS customers, the overall solution needs to be highly performant and scalable.
Syntax, an SAP Gold Partner and AWS Premier Tier Services Partner with Competencies in SAP and Migration, has focused on customer needs to develop a cloud-native, SAP-certified solution to meet data and document modernization requirements through its CxLink product portfolio.
This post will dive into the Syntax CxLink Documents solution, showing the process of how to handle a large number of documents from SAP applications and store them directly on Amazon Simple Storage Service (Amazon S3) buckets.
We’ll also provide an overview of how to automate, scale, and connect this solution with other AWS services such as Amazon Textract to extract text from tables and printed or hand-written documents, adding extra value to current customer needs.
SAP systems need a capable document management solution. However, the reality is that saving documents generated on SAP and linked from specific SAP applications to an internal server is cost-intensive and complex.
In addition, customers are facing multiple challenges with the bulk volume of documents in SAP. Extracting the documents out of the application, archiving, retrieving, or storing them into scalable and cost-effective storage brings another layer of complexity to operations.
Classifying the documents and extracting texts from them is an added frustration. Manual document classification, optical character recognition (OCR) or small-scaled machine learning (ML) solutions do not meet SAP’s document management requirements.
Introduction to Syntax CxLink Documents
In this context, Syntax CxLink Documents allows customers to manage documents easily on AWS and enables full integration with SAP systems, irrespective of where the documents are located. With CxLink, customers can continue to benefit from all of the cloud capabilities that AWS offers.
CxLink Documents leverages Syntax’s in-depth knowledge of both SAP and AWS technologies. It provides a simple interface to integrate with SAP and other AWS services, to deliver efficient and fast document storage, as well as access and retrieval of documents.
Furthermore, users can easily define retention policies for all SAP documents, linking those policies to Amazon S3 storage classes to optimize cost savings with long-term storage purposes, ensuring security at low-cost rates.
Architecture Solution, Tools, and AWS Services
The architecture for Syntax CxLink Documents includes AWS-native storage services, machine learning, serverless technologies, and NoSQL database.
The objective is to consolidate multiple AWS tools into a single product interface to provide efficient, faster document storage with access and SAP data recovery. How the architecture is built also benefits the cost-optimization, scalability, and performance of the solution to the end user.
The main services are listed below:
- Amazon S3: Object storage service offering scalability, data availability, security, and performance.
- Amazon EventBridge: Serverless event bus that makes it easier to build event-driven applications at scale using events generated from customer applications, integrated software-as-a-service (SaaS) applications, and AWS services.
- AWS Lambda: Serverless, event-driven compute service allowing users to run code for virtually any type of application or backend service without provisioning or managing servers.
- Amazon Textract: Machine learning service that automatically extracts text, handwriting, and data from scanned documents.
- Amazon DynamoDB: Fully managed, serverless, key-value No-SQL database designed to run high-performance applications at any scale.
The architecture diagram below shows the overall workflow used to build the solution leveraging Syntax CxLink Documents and AWS services.
Figure 1 – Architecture pattern includes SAP, Syntax CxLink Documents, and AWS services.
A deeper explanation of how the architecture works is listed below:
- SAP application receives documents from different sources.
- Syntax CxLink Add-On is installed on top of the SAP application, storing the documents into Amazon S3 bucket.
- Once the documents as objects are stored into Amazon S3 bucket, then S3 event notification triggers real-time events to Amazon EventBridge.
- Amazon EventBridge rule invokes AWS Lambda functions.
- This Lambda function makes an API call to Amazon Textract to extract texts out of SAP documents and stores them on another output S3 bucket.
- In parallel, Syntax CxLink Documents extracts the metadata information out of the SAP generated documents and stores them on Amazon DynamoDB.
Customers can build anything they need on top of this architecture leveraging Syntax CxLink Documents and AWS services.
For example, companies can use Amazon Rekognition to classify documents, identify, and detect document contents. In most cases, documents contain multiple pages and Amazon Rekognition can split those documents into individual pages and save them into an S3 bucket if configured.
Customers can also use natural language processing (NLP) capabilities in the above architecture using Amazon Comprehend to gain more insights into the extracted text.
Syntax CxLink Documents is designed to consolidate SAP document management efficiently with AWS. Documents are integrated into the customer SAP environment for easy access. With the provided interfaces, customers can easily define retention policies for all their SAP documents, linking them to S3 storage classes to optimize cost savings.
As an SAP native application, Syntax CxLink Documents is written in ABAP, leveraging ABAP SDK for AWS. It’s also an SAP-certified solution available on AWS Marketplace. It’s a Syntax managed add-on package that is compatible with current releases of SAP. Syntax ensures add-on updates are compatible with future SAP versions.
Below are key features of CxLink Documents:
- SAP-certified with SAP Business Suite and SAP S/4HANA Add-On.
- HTTPS communication with SSL certificates between SAP and AWS.
- Encryption supported on the client (SSF) and the server (AWS Key Management Service).
- AWS Identity and Access Management (IAM) policies for restricted access.
- Efficient and cost optimization by leveraging appropriate S3 storage classes.
- Natively integrated with general object services (GOS), ArchiveLink, and document management systems (DMS).
Technical Considerations Prior to Installation
Before adopting Syntax CxLink Documents, users should consider if they meet the below prerequisites:
- Any SAP Business Suite with NW 7.31 or higher, including SAP-supported S/4HANA environments, can host the CxLink Add-On.
- SAP environment will require dedicated connectivity to Amazon Public or Private Endpoint. Through proxy (type G RFC), the SAP system doesn’t need to be hosted on AWS.
- Customers must have an AWS account, at least one S3 bucket for the SAP repositories, and one IAM policy.
- Syntax CxLink supports two authentication mechanisms, IAM users or Amazon Elastic Compute Cloud (Amazon EC2) instance profiles. If SAP is already hosted on AWS, Syntax recommends using EC2 instance profiles for enhanced security.
- SAP must communicate with S3 in HTTPS, so HTTPS services in ICM should be enabled, and SAP Cryptolib is needed.
Setup and Configuration
After the basic setup of the Syntax CxLink Add-On (licenses, encryption, and certificates), the SAP environment is prepared for three potential use cases: general object services (GOS), ArchiveLink, and document management systems (DMS).
SAP application documents such as invoices, purchase orders, sales orders, and similar use standard functions such as “create attachments,” “list attachments,” “show links to an internet address,” and “display associated workflows.” The GOS toolbox provides these functions.
Syntax CxLink Documents configuration creates a one-by-one relation between an S3 bucket and a content repository in the OAC0 transaction. This relation is attached to the SOFFHIO document class of the attachments. Once the link is created, all new attachments are automatically stored in the proper S3 bucket.
Figure 2 – Management of S3 buckets using Syntax CxLink.
Below, the SOFFPHIO document class associated with the current content repository linked to the S3 bucket where attachments are stored.
Figure 3 – Documents association with the S3 bucket.
SAP stores all newly-attached documents in the S3 bucket. Customers can link them using the standard SAP transactions for GOS like FB03, ME23N, and VA01.
In case of ArchiveLink (Business Documents), Syntax CxLink Documents creates a one-by-one relation between an object type and content repository in the OAC3 transaction. Once the link is active, all new attachments go to the proper S3 bucket.
Figure 4 – Object links for content repositories.
From that moment, all documents related to the configured object type are managed by Syntax CxLink Documents and stored in the selected S3 bucket. Customers can have as many repositories as buckets they want to drive, which helps with the AWS services that will participate in the process.
If a customer uses the SAP DMS, there are enhanced advantages such as electronic search tools or finding documents using known SAP transactions. Also, customers can use document distribution to assign documents that are managed in the DMS, either manually or automatically according to company-specific processes.
AWS Database Migration Service (AWS DMS), in addition to governing documents, also coordinates document processing. This process ensures all responsible users can view or process up-to-the-minute information.
Syntax CxLink Documents interacts between SAP DMS and S3 as the storage provider.
Figure 5 – Content repository management.
In transaction OAC0, selecting the DMS-enabled content repository, a link is established in transaction OACT with /LNKAWS/AT category and RA repository ID.
Figure 6 – Document category management.
From that moment, all documents related with the new category will be stored in the apt cloud repository.
Figure 7 – Documents stored on Amazon S3.
Syntax CxLink Documents stores any documents generated in SAP directly into S3 leveraging ABAP SDK developed for AWS services.
Users can now benefit from infrastructure-as-code (IaC) technology such as AWS CloudFormation to provision multiple AWS resources as a stack. There is no need to build services individually and integrate them separately later.
Also, the CloudFormation stack maintains the IAM permissions based on the YAML template provided during stack creation. It maintains the workflow the way it’s preferred to the desired solution.
Customers will use their own AWS account and AWS Management Console to create new resources and stack by using CloudFormation.
Figure 8 – Build solutions with AWS CloudFormation.
Users can develop their own YAML template file to provision AWS resources. This YAML needs to be uploaded into the customer’s own S3 bucket.
In this architecture, the Syntax YAML template is developed to create two S3 buckets—one bucket as an input and another for output. Storing or uploading documents into the input bucket triggers an Amazon EventBridge rule which eventually calls an AWS Lambda function. This function in turn calls Amazon Textract to extract the text from the documents and store it in the output bucket.
The same CloudFormation template creates an Amazon DynamoDB table to store the document metadata in it.
Figure 9 – CloudFormation reads YAML template from S3 bucket.
Users will need to provide their specific CloudFormation stack name, DynamoDB table, input, and output S3 bucket names as parameters and proceed to create the CloudFormation stack.
Figure 10 – Input mandatory details and parameters in CloudFormation stack.
Once the stack is created, the user can start validating their AWS resources created by the execution of this CloudFormation stack.
Figure 11 – Successful execution of CloudFormation stack.
A successful execution of the CloudFormation stack creates two new S3 buckets and a DynamoDB table in the same region where the stack gets executed.
Figure 12 – New S3 buckets and DynamoDB table.
Each time SAP receives a document, Syntax CxLink Documents stores it directly into the input S3 bucket. This starts the workflow, and the output S3 bucket gets populated by the text delivered from Amazon Textract under specific output folder.
In parallel, a DynamoDB table starts getting populated with the document metadata. Amazon CloudWatch logs also can be validated if there are any potential errors during this overall process presented in the workflow above.
Ready-to-Purchase Solution on AWS Marketplace
Syntax CxLink Documents is offered as a subscription service through AWS Marketplace under CxLink ABAP Suite portfolio for Syntax, including premium support from Syntax’s engineering team to ensure proper setup, maintenance, and update of customer installations.
By leveraging Syntax CxLink Documents and AWS-native services, customers can resolve document storage and archival challenges as well as improve document classification, text extraction, and image isolation using AWS machine learning services.
Syntax – AWS Partner Spotlight
Syntax is an AWS Premier Tier Services Partner that provides comprehensive technology solutions and trusted professional, advisory, and application management services to power businesses’ mission-critical applications in the cloud.