Field Notes: How to Integrate Your Non-Cloud-Native COTS Software with AWS for Batch Processing

This post was co-written by Ashutosh Pateriya, AWS Partner Solutions Architect, GSI and Verny Quartara, Technology Architect, Infosys Ltd.

Integrating legacy or non cloud-native products and tools inside cloud-native applications is a common requirement for enterprise customers looking to migrate their applications to AWS. Many legacy applications such as CRM, accounting, billing or supply chain management software are based on a batch processing model. These are often implemented using COTS (Commercial-Off-The-Shelf) software.

In the context of an enterprise application, a COTS product performs a specific task and is typically integrated with a number of upstream and downstream systems.

When you plan to move your enterprise application to AWS, you have to first decide on the approach. For example, whether to completely re-factor your application and decommission the COTS software, or, re-platform or simply re-host the application. Perhaps you choose a mixed approach, where you re-factor the integration layer and re-host/re-platform your COTS only, for the following reasons:

You are bound to long-term license contract.
The COTS product you have is actually the best choice for your use case.
You are using a phased migration, because you want to quickly move to the cloud and re-architect the application at a later date.

Whatever the reason, you can realize the benefits of AWS cloud-native services, while maintaining the core COTS components. So, how do you successfully integrate your batch-focused, non cloud-native COTS software with AWS Cloud services?

In this post, you learn how to architect a solution based on AWS Step Functions, AWS Lambda, Amazon EC2 and AWS Systems Manager Run Command services.

Solution overview

We demonstrate how to migrate your COTS product to AWS, using a lift-and-shift approach, and installed on an EC2 instance. Your COTS product offers some batch processing capabilities, which may be part of a business process within an enterprise application. However, the COTS does not offer any APIs to be invoked remotely, so the only way to start the process is to launch a shell script on the server.

You can also orchestrate and monitor the process, and handle failure scenarios.

COTS Software architecture diagram

For example, let’s assume your batch process runs a web crawler to index a specific set of websites, searching for a specific set of keywords. That process can take a few hours to complete and must be initiated every night.

To meet the preceding requirements with AWS-native services, and with the given constraint, you can use:

AWS Systems Manager Run Command, to launch the COTS’ shell scripts.
AWS Lambda, to automate Run Command invocations.
AWS Step Functions, to orchestrate and monitor the overall process, and implement error handling.

Architecture

COTS Software with AWS for Batch Processing

As we have shown in the preceding figure, AWS Systems Manager Run Command is used to remotely launch shell scripts on the COTS EC2 instance, via SSM API operations (SendCommand, ListCommandInvocations). These API operations are called by AWS Lambda functions (SendCommand, GetStatus) using AWS SDK. An AWS Step Function is used to orchestrate the overall launch.

Solution walkthrough

At the core of the solution, you have your COTS software which is installed on an EC2 instance, which is managed by AWS SSM. Using the SSM agent and SSM Run Command, you can remotely start the launch of any shell script that starts the batch process.

To initiate the batch process and monitor its launch, you have some AWS Lambda functions available to invoke AWS SSM Run Commands; SendCommand and ListCommandInvocations respectively.

To launch and orchestrate the Lambda functions, you have an AWS Step Functions workflow, represented in the following diagram:

Lambda functions for COTS batch processing

Step function workflow:

Once activated, the state machine (SendCommand function) calls the SendCommand SSM API, which in turn launches the batch execution on the COTS.
Batch execution is typically a long-running process. The next step is to wait for a configurable amount of time, based on time expected to launch (for example 30 minutes).
- To balance costs and performances, you just have to understand the appropriate wait time inside the state machine:
- If too short, you’ll have more state transitions and function calls, thus incurring more costs
- If too long, you’ll have a longer execution time, since the batch process might be already completed but you have to wait until your configured wait time
To schedule the batch process to run on a regular basis, you can consider leveraging Amazon EventBridge.
To improve your monitoring, the SSM Run Command can deliver the logs to Amazon CloudWatch as well as to Amazon S3.
After waiting, the state machine (GetStatus function) calls the ListCommandInvocations SSM API to retrieve the launch status.
Upon checking, if the batch is still running, we wait again and check later. Otherwise, we can take appropriate action based on the launch results. In case of a failure, we can send an email notification attaching the logs (via Amazon Simple Notification Service) to the support team for investigation.

Prerequisites

A typical production environment is likely to have more than one EC2 instance running the COTS product. For simplicity, in this post we consider a single EC2 instance. The concept can easily be extended to multiple instances, working with Lambda functions and SSM API.

The prerequisites for implementing the solution are:

An AWS account
An EC2 instance with your COTS software installed, configured and working
Your choice of a programming language, supported by AWS SDK and Lambda
Basic knowledge of AWS Step Functions state machines

For this example, I am using Python 3.8.

Implementation steps

The high-level steps to implement the solution are:

EC2 SSM setup: Ensure the EC2 instance hosting the COTS software is managed by AWS SSM
Lambda function creation: Implement the Lambda functions using AWS SDK and AWS SSM API
Step Functions: implement the AWS Step Functions workflow to orchestrate the Lambda functions

EC2 SSM setup:

Each COTS software is different, with different use cases, technologies and processes for, installation, and configuration. The installation, configuration and testing of your COTS product is out of the scope for this post.

It is assumed that you have already installed, configured and tested your COTS batch processing capabilities, it is working as expected and you know how to initiate the launch. You need to know which shell script to call, and the appropriate parameters.

Once the COTS software has been installed and configured, configure your instance to be managed by SSM:

Create a Systems Manager instance profile and attach it to the EC2 instance
Install the SSM agent. If you are using Amazon Linux, you can skip this step, as it is installed by default.
Go the AWS Systems Manager Console and access the Fleet Manager page. If everything is ok, the EC2 instance should show as “Online”:

AWS Systems Manager screenshot

Note: it could take up to 20 minutes to go online – if is takes longer, there might be a problem with the SSM agent, that you need to troubleshoot.

Lambda function creation:

While you are waiting for your instance to go online, you can start creating two Lambda functions, one for starting the batch process, and one for monitoring the launch.

Note: your EC2 instance should show as online before testing your functions.

4. Create a new “SendCommand” Lambda function, including the following:

import os 
import json
import boto3
ec2 = boto3.resource('ec2')
client = boto3.client('ssm')

def lambda_handler(event, context):
  
    response = client.send_command(
        InstanceIds='<your instance ID here>',
        DocumentName='AWS-RunShellScript',
        DocumentVersion='$DEFAULT',
        TimeoutSeconds=<your preferred timeout here, in seconds>,

        Parameters={
            'commands': '<your shell script with parameters here>'
        },

        CloudWatchOutputConfig={
        'CloudWatchOutputEnabled': True
        }
    )
    print(response)
    
    responseCommand = response['Command']
    commandId = responseCommand['CommandId']

    return {
       'commandId': commandId,
    }

The preceding code calls the send_command SSM API, passing:

the instance ID
the document name and version (AWS-RunShellScript, to execute shell script on EC2)
the timeout, in seconds
the shell script command line with the appropriate parameters
the flag to enable CloudWatch logging
It will then pass the commandId as output payload, so that the subsequent function can use it to retrieve its status.

5. Create a new “GetStatus” Lambda function, including the following code:

import json
import boto3

def lambda_handler(event, context):

    commandId = event['commandId']
    
    client = boto3.client('ssm')
    response = client.list_command_invocations(
        CommandId=commandId,
        MaxResults=10,
        Details=True
    )

    invocation = response['CommandInvocations'][0]
    status = invocation['Status']    
    commandPlugins = invocation['CommandPlugins'][0]
    responseCode = commandPlugins['ResponseCode']
    output = commandPlugins['Output']
    
    return {
        'RunCommandStatus' : status,
        'RunCommandResponseCode' : responseCode,
        'RunCommandOutput' : output
    }

The preceding code calls the list_command_invocations SSM API, passing:

the command ID
the maximum number of results, for pagination purpose
the flag to retrieve all the details of the invocation

It will then pass the following as output payload, to be evaluated in the subsequent step of the state machine:

the invocation status
the invocation response code
the text output of the shell script execution.

Step functions:

6. Now it’s time to implement the state machine with AWS Step Functions. The following example includes also a basic email notification using SNS:

{
  "Comment": "State machine implementation for batch processing with COTS software",
  "StartAt": "RunCommand-Run",
  "States": {
    "RunCommand-Run": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:xxxxxxxxx:012345678901:function:SendCommand",
      "Comment": "This step will call the SendCommand lambda function",
      "Next": "RunCommand-Wait",
      "ResultPath": "$.RunCommand-Run-Result"
    },
    "RunCommand-Wait": {
      "Type": "Wait",
      "Comment": "This step will wait for a configured amount of time, and then will move to the next step",
      "SecondsPath": "$.timeoutSeconds",
      "Next": "RunCommand-GetStatus"
    },
    "RunCommand-GetStatus": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:xxxxxxxxx:012345678901:function:GetStatus",
      "InputPath": "$.RunCommand-Run-Result",
      "Comment": "This step will trigger the GetStatus function",
      "Next": "RunCommand-CheckStatus",
      "ResultPath": "$.RunCommand-GetStatus-Result"
    },
    "RunCommand-CheckStatus": {
      "Type": "Choice",
      "Comment": "This step will check the execution status, using the payload added by the GetStatus function. The next step will depend on the execution status. If the process is still in progress, it will go back to the Wait status. Else, it will go either to the Success or Notification status, based on the execution result.",
      "Choices": [
        {
          "Variable": "$.RunCommand-GetStatus-Result.RunCommandStatus",
          "StringEquals": "Success",
          "Next": "RunCommand-Success"
        },
        {
          "Variable": "$.RunCommand-GetStatus-Result.RunCommandStatus",
          "StringEquals": "Pending",
          "Next": "RunCommand-Wait"
        },
        {
          "Variable": "$.RunCommand-GetStatus-Result.RunCommandStatus",
          "StringEquals": "InProgress",
          "Next": "RunCommand-Wait"
        },
        {
          "Variable": "$.RunCommand-GetStatus-Result.RunCommandStatus",
          "StringEquals": "Failed",
          "Next": "RunCommand-notification"
        }
      ]
    },
    "RunCommand-Success": {
      "Type": "Pass",
      "Comment": "The batch process has completed successfully",
      "End": true
    },
    "RunCommand-notification": {
      "Type": "Task",
"Comment": "The batch process has completed with errors, this step will send an email notification via SNS",
      "Resource": "arn:aws:states:::sns:publish",
      "Parameters": {
        "TopicArn": "arn:aws:sns:xxxxxxxxx:012345678901:target-topic",
        "Message": {
          "Body": "There is a failure in a batch job. Please find below the details",
          "Input.$": "$"
        },
        "Subject": "Attention Required ! Batch Job Failure"
      },
      "Next": "RunCommand-Failure"
    },
    "RunCommand-Failure": {
      "Type": "Fail",
      "Comment": "The batch process has completed with errors or timeout."
    }
  }
}

Remember to replace the ARNs as per your setup.

Now, you need to Start the state machine execution. You can check the status of execution in the AWS Step Functions console. A successful execution will look like the following:

Graph Inspektor Flowchart

Here, the state machine has started the batch process execution, has waited for it to complete (perhaps checking more than once if needed), and then has completed successfully.

In case of failure, your execution will look like this:

Graph Inspector image with failure

In the preceding diagram, the batch process has failed, hence a notification has been sent and the state machine has terminated with failure.

Congrats! Your non cloud-native batch processing is now integrated with AWS services.

Now that your batch process is fully integrated with AWS, other possible enhancements are:

include it as part of a parent AWS Step Functions state machine
schedule it using Amazon EventBridge
send notifications with Amazon Simple Notification Service (Amazon SNS)
start it from a client application using the AWS API, SDK or Command Line Interface (AWS CLI).

Cleaning up

To avoid further charges, you can stop or terminate your EC2 instance hosting the COTS product. Remember to create a backup if you want to recreate it later. As per the other services, both Lambda and Step Functions are serverless, so you’ll only be charged for what you use.

Conclusion

In this post, we showed how you can successfully integrate your batch-focused, non cloud-native COTS software with AWS Cloud services. The key benefits offered by the solution are:

Minimal change complexity: you can just ‘lift and shift’ your COTS product to the cloud, and run it as if it were on-premises
Minimal costs: due to the serverless nature of AWS Lambda and Step Functions, you’ll only pay for what you use.
Integration with AWS native services, suitable for larger scale or more complex use cases.

We hope you found this post helpful, and look forward to your feedback in the comments!

Field Notes provides hands-on technical guidance from AWS Solutions Architects, consultants, and technical account managers, based on their experiences in the field solving real-world business problems for customers.

AWS Architecture Blog