AWS Big Data Blog

Extract data from SAP ERP using AWS Glue and the SAP SDK

This is a guest post by Siva Manickam and Prahalathan M from Vyaire Medical Inc.

Vyaire Medical Inc. is a global company, headquartered in suburban Chicago, focused exclusively on supporting breathing through every stage of life. Established from legacy brands with a 65-year history of pioneering breathing technology, the company’s portfolio of integrated solutions is designed to enable, enhance, and extend lives.

At Vyaire, our team of 4,000 pledges to advance innovation and evolve what’s possible to ensure every breath is taken to its fullest. Vyaire’s products are available in more than 100 countries and are recognized, trusted, and preferred by specialists throughout the respiratory community worldwide. Vyaire has 65-year history of clinical experience and leadership with over 27,000 unique products and 370,000 customers worldwide.

Vyaire Medical’s applications landscape has multiple ERPs, such as SAP ECC, JD Edwards, Microsoft Dynamics AX, SAP Business One, Pointman, and Made2Manage. Vyaire uses Salesforce as our CRM platform and the ServiceMax CRM add-on for managing field service capabilities. Vyaire developed a custom data integration platform, iDataHub, powered by AWS services such as AWS Glue, AWS Lambda, and Amazon API Gateway.

In this post, we share how we extracted data from SAP ERP using AWS Glue and the SAP SDK.

Business and technical challenges

Vyaire is working on deploying the field service management solution ServiceMax (SMAX, a natively built on SFDC ecosystem), offering features and services that help Vyaire’s Field Services team improve asset uptime with optimized in-person and remote service, boost technician productivity with the latest mobile tools, and deliver metrics for confident decision-making.

A major challenge with ServiceMax implementation is building a data pipeline between ERP and the ServiceMax application, precisely integrating pricing, orders, and primary data (product, customer) from SAP ERP to ServiceMax using Vyaire’s custom-built integration platform iDataHub.

Solution overview

Vyaire’s iDataHub powered by AWS Glue has been effectively used for data movement between SAP ERP and ServiceMax.

AWS Glue a serverless data integration service that makes it easy to discover, prepare, and combine data for analytics, machine learning (ML), and application development. It’s used in Vyaire’s Enterprise iDatahub Platform for facilitating data movement across different systems, however the focus for this post is to discuss the integration between SAP ERP and Salesforce SMAX.

The following diagram illustrates the integration architecture between Vyaire’s Salesforce ServiceMax and SAP ERP system.

In the following sections, we walk through setting up a connection to SAP ERP using AWS Glue and the SAP SDK through remote function calls. The high-level steps are as follows:

  1. Clone the PyRFC module from GitHub.
  2. Set up the SAP SDK on an Amazon Elastic Compute Cloud (Amazon EC2) machine.
  3. Create the PyRFC wheel file.
  4. Merge SAP SDK files into the PyRFC wheel file.
  5. Test the connection with SAP using the wheel file.

Prerequisites

For this walkthrough, you should have the following:

Clone the PyRFC module from GitHub

For instructions for creating and connecting to an Amazon Linux 2 AMI EC2 instance, refer to Tutorial: Get started with Amazon EC2 Linux instances.

The reason we choose Amazon Linux EC2 is to compile the SDK and PyRFC in a Linux environment, which is compatible with AWS Glue.

At the time of writing this post, AWS Glue’s latest supported Python version is 3.7. Ensure that the Amazon EC2 Linux Python version and AWS Glue Python version are the same. In the following instructions, we install Python 3.7 in Amazon EC2; we can follow the same instructions to install future versions of Python.

  1. In the bash terminal of the EC2 instance, run the following command:
sudo apt install python3.7
  1. Log in to the Linux terminal, install git, and clone the PyRFC module using the following commands:
ssh -i "aws-glue-ec2.pem" ec2-user@10.XXX.XX.XXX 
mkdir aws_to_sap 
sudo yum install git 
git clone https://github.com/SAP/PyRFC.git

Set up the SAP SDK on an Amazon EC2 machine

To set up the SAP SDK, complete the following steps:

  1. Download the nwrfcsdk.zip file from a licensed SAP source to your local machine.
  2. In a new terminal, run the following command on the EC2 instance to copy the nwrfcsdk.zip file from your local machine to the aws_to_sap folder.
scp -i "aws-glue-ec2.pem" -r "c:\nwrfcsdk\nwrfcsdk.zip" ec2-user@10.XXX.XX.XXX:/home/ec2-user/aws_to_sap/
  1. Unzip the nwrfcsdk.zip file in the current EC2 working directory and verify the contents:

unzip nwrfcsdk.zip

  1. Configure the SAP SDK environment variable SAPNWRFC_HOME and verify the contents:
export SAPNWRFC_HOME=/home/ec2-user/aws_to_sap/nwrfcsdk
ls $SAPNWRFC_HOME

Create the PyRFC wheel file

Complete the following steps to create your wheel file:

  1. On the EC2 instance, install Python modules cython and wheel for generating wheel files using the following command:
pip3 install cython, wheel
  1. Navigate to the PyRFC directory you created and run the following command to generate the wheel file:
python3 setup.py bdist_wheel

Verify that the pyrfc-2.5.0-cp37-cp37m-linux_x86_64.whl wheel file is created as in the following screenshot in the PyRFC/dist folder. Note that you may see a different wheel file name based on the latest PyRFC version on GitHub.

Merge SAP SDK files into the PyRFC wheel file

To merge the SAP SDK files, complete the following steps:

  1. Unzip the wheel file you created:
cd dist
unzip pyrfc-2.5.0-cp37-cp37m-linux_x86_64.whl

  1. Copy the contents of lib (the SAP SDK files) to the pyrfc folder:
cd ..
cp ~/aws_to_sap/nwrfcsdk/lib/* pyrfc

Now you can update the rpath of the SAP SDK binaries using the PatchELF utility, a simple utility for modifying existing ELF executables and libraries.

  1. Install the supporting dependencies (gcc, gcc-c++, python3-devel) for the Linux utility function PatchELF:
sudo yum install -y gcc gcc-c++ python3-devel

Download and install PatchELF:

wget https://download-ib01.fedoraproject.org/pub/epel/7/x86_64/Packages/p/patchelf-0.12-1.el7.x86_64.rpm
sudo rpm -i patchelf-0.12-1.el7.x86_64.rpm
  1. Run patchelf:
find -name '*.so' -exec patchelf --set-rpath '$ORIGIN' {} \;
  1. Update the wheel file with the modified pyrfc and dist-info folders:
zip -r pyrfc-2.5.0-cp37-cp37m-linux_x86_64.whl pyrfc pyrfc-2.5.0.dist-info

  1. Copy the wheel file pyrfc-2.5.0-cp37-cp37m-linux_x86_64.whl from Amazon EC2 to Amazon Simple Storage Service (Amazon S3):
aws s3 cp /home/ec2-user/aws_to_sap/PyRFC/dist/ s3://<bucket_name> /ec2-dump --recursive


Test the connection with SAP using the wheel file

The following is a working sample code to test the connectivity between the SAP system and AWS Glue using the wheel file.

  1. On the AWS Glue Studio console, choose Jobs in the navigation pane.
  2. Select Spark script editor and choose Create.

  1. Overwrite the boilerplate code with the following code on the Script tab:
import os, sys, pyrfc
os.environ['LD_LIBRARY_PATH'] = os.path.dirname(pyrfc.__file__)
os.execv('/usr/bin/python3', ['/usr/bin/python3', '-c', """
    from pyrfc import Connection
    import pandas as pd
    ## Variable declarations
    sap_table = '' # SAP Table Name
    fields = '' # List of fields required to be pulled
    options = '' # the WHERE clause of the query is called "options"
    max_rows = '' # MaxRows
    from_row = '' # Row of data origination
    try:
        # Establish SAP RFC connection
        conn = Connection(ashost='', sysnr='', client='', user='', passwd='') 
        print(f“SAP Connection successful – connection object: {conn}”)
        if conn:
            # Read SAP Table information
            tables = conn.call("RFC_READ_TABLE", QUERY_TABLE=sap_table, DELIMITER='|', FIELDS=fields, OPTIONS=options, ROWCOUNT=max_rows, ROWSKIPS=from_row) 
            # Access specific row & column information from the SAP Data 
            data = tables["DATA"] # pull the data part of the result set
            columns = tables["FIELDS"] # pull the field name part of the result set
            df = pd.DataFrame(data, columns = columns)
            if df:
                print(f“Successfully extracted data from SAP using custom RFC - Printing the top 5 rows: {df.head(5)}”) 
            else:
                print(“No data returned from the request. Please check database/schema details”)
        else:
            print(“Unable to connect with SAP. Please check connection details”)
    except Exception as e:
        print(f“An exception occurred while connecting with SAP system: {e.args}”)
"""])
  1. On the Job details tab, fill in mandatory fields.
  2. In the Advanced properties section, provide the S3 URI of the wheel file in the Job parameters section as a key value pair:
    1. Key--additional-python-modules
    2. Values3://<bucket_name>/ec2-dump/pyrfc-2.5.0-cp37-cp37m-linux_x86_64.whl (provide your S3 bucket name)

  1. Save the job and choose Run.

Verify SAP connectivity

Complete the following steps to verify SAP connectivity:

  1. When the job run is complete, navigate to the Runs tab on the Jobs page and choose Output logs in the logs section.
  2. Choose the job_id and open the detailed logs.
  3. Observe the message SAP Connection successful – connection object: <connection object>, which confirms a successful connection with the SAP system.
  4. Observe the message Successfully extracted data from SAP using custom RFC – Printing the top 5 rows, which confirms successful access of data from the SAP system.

Conclusion

AWS Glue facilitated the data extraction, transformation, and loading process from different ERPs into Salesforce SMAX to improve Vyaire’s products and its related information visibility to service technicians and tech support users.

In this post, you learned how you can use AWS Glue to connect to SAP ERP utilizing SAP SDK remote functions. To learn more about AWS Glue, check out AWS Glue Documentation.


About the Authors

Siva Manickam is the Director of Enterprise Architecture, Integrations, Digital Research & Development at Vyaire Medical Inc. In this role, Mr. Manickam is responsible for the company’s corporate functions (Enterprise Architecture, Enterprise Integrations, Data Engineering) and produce function (Digital Innovation Research and Development).

Prahalathan M is the Data Integration Architect at Vyaire Medical Inc. In this role, he is responsible for end-to-end enterprise solutions design, architecture, and modernization of integrations and data platforms using AWS cloud-native services.

Deenbandhu Prasad is a Senior Analytics Specialist at AWS, specializing in big data services. He is passionate about helping customers build modern data architecture on the AWS Cloud. He has helped customers of all sizes implement data management, data warehouse, and data lake solutions.