AWS Compute Blog

Automating scalable business workflows using minimal code

Organizations frequently have complex workflows embedded in their processes. When a customer places an order, it triggers a workflow. Or when an employee requests vacation time, this starts another set of processes. Managing these at scale can be challenging in traditional applications, which must often manage thousands of separate tasks.

In this blog post, I show how to use a serverless application to build and manage enterprise workflows at scale. This minimal-code solution is highly scalable and flexible, and can be modified easily to meet your needs. This application uses Amazon S3, AWS Lambda, and AWS Step Functions:

Using S3-to-Lambda to trigger Step Functions workflows

AWS Step Functions allows you to represent workflows as a JSON state machine. This service can help remove custom code and convoluted logic from distributed systems, and make it easier to maintain and modify. S3 is a highly scalable service that stores trillions of objects, and Lambda runs custom code in response to events. By combining these services, it’s simple to build resilient workflows with high throughput, triggered by putting objects in S3 buckets.

There are many business use-cases for this approach. For example, you could automatically pay invoices from approved vendors under a threshold amount by reading the invoices stored in S3 using Amazon Textract. Or your application could automatically book consultations for patients emailing their completed authorization forms. Almost any action that is triggered by a document or form is a potential candidate for an automated workflow solution.

To set up the example application, visit the GitHub repo and follow the instructions in the README.md file. The code uses the AWS Serverless Application Model (SAM), enabling you to deploy the application in your own AWS account. This walkthrough creates resources covered in the AWS Free Tier but you may incur cost if you test with large amounts of data.

How the application works

The starting point for this serverless solution is S3. When new objects are stored, this triggers a Lambda function that starts an execution in the Step Functions workflow. Lambda scales to keep pace as more objects are written to the S3 bucket, and Step Functions creates a separate execution for each S3 object. It also manages the state of all the distinct workflows.

Simple Step Functions workflow.

  1. A downstream process stores data in the S3 bucket.
  2. This invokes the Start Execution Lambda function. The function creates a new execution in Step Functions using the S3 object as event data.
  3. The workflow invokes the Decider function. This uses Amazon Rekognition to detect the contents of objects stored in S3.
  4. This function uses environment variables to determine the matching attributes. If the S3 object matches the criteria, it triggers the Match function. Otherwise, the No Match function is invoked.

The application’s SAM template configures the Step Functions state machine as JSON. It also defines an IAM role allowing Step Functions to invoke the Lambda functions. The initial function invoked by S3 is defined to accept the state machine ARN as an environment variable. The template also defines the permissions needed and the S3 trigger:

  StartExecutionFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: StartExecutionFunction/
      Handler: app.handler
      Runtime: nodejs12.x
      MemorySize: 128
      Environment:
        Variables:
          stateMachineArn: !Ref 'MatcherStateMachine'
      Policies:
        - S3CrudPolicy:
            BucketName: !Ref InputBucketName
        - Statement:
          - Effect: Allow
            Resource: !Ref 'MatcherStateMachine'
            Action:
              - states:*
      Events:
        FileUpload:
          Type: S3
          Properties:
            Bucket: !Ref InputBucket
            Events: s3:ObjectCreated:*

This uses SAM policy templates to provide read access to the S3 bucket. It also defines the event that causes the function invocation from S3, filtering only for new objects with a .json suffix.

The Decider function is the first step of the Step Functions workflow. It uses Amazon Rekognition to detect labels and words from the images provide. The SAM template passes the required labels and words to the function, together with an optional confidence score:

  DeciderFunction:
    Type: AWS::Serverless::Function 
    Properties:
      CodeUri: deciderFunction/
      Handler: app.handler
      Environment:
        Variables:
          requiredWords: "NEW YORK"
          requiredLabels: "Driving License,Person"
          minConfidence: 70

If the requiredLabels environment variable is present, the function’s code calls Amazon Rekognition’s detectLabel method. It then calls the detectText method if the requiredWords environment variable is used:

// The standard Lambda handler
exports.handler = async (event) => {
  return await processDocument(event)
}

// Detect words/labels on document or image
const processDocument = async (event) => {

  // If using a required labels
  if (process.env.requiredLabels) {
    // If no match, return immediately
    if (!await checkRequiredLabels(event)) return 'NoMatch'
  }  

  // If using a required words test
  if (process.env.requiredWords) {
    // If no match, return immediately
    if (!await checkRequiredWords(event)) return 'NoMatch'
  }

  return 'Match'
}

The Decider function returns “Match” or “No match” to the Step Functions workflow. This invokes downstream functions depending on the result. The Match and No Match functions are stubs where you can build the intended functionality in the workflow. This Step Functions workflow is designed generically so you can extend the functionality easily.

Testing the application

Deploy the first application by following the README.md in the GitHub repo, and note the application’s S3 bucket name. There are three test cases:

  • Create a workflow for a matched subject in an image. From photos uploaded to S3, identify which images contain one or more subjects, and invoke the Match path of the workflow.
  • Create a workflow for invoices from a specific vendor. From multiple invoices uploaded, matching those from a vendor, and trigger the Match path of the workflow.
  • Create a workflow for driver licenses issued by a single state. From a collection of drivers licenses, trigger the Match workflow for only a single state.

1. Create a workflow for matched subject in an image

In this example, the application identifies cats in images uploaded to the S3 bucket. The default configuration in the SAM template in the GitHub repo contains the environment variables set for this example:

Environment variables in SAM template.

First, I upload over 20 images of various animals to the S3 bucket:

Uploading files to the S3 bucket.

After navigating to the Step Function console, and selecting the application’s state machine, it shows 24 separate executions, one per image:

Step Functions execution detail.

I select one of these executions, for cat3.jpg. This has followed the MatchFound execution path of the workflow:

MatchFound execution path.

2. Create a workflow for invoices for a specified vendor.

For this example, the application looks for a customer account number and vendor name in invoices uploaded to the S3 bucket. The Decider function uses environment variables to determine the matching keywords. These can be updated by either deploying the SAM template or editing the Lambda function directly.

I modify the SAM template to match the vendor name and account number as follows:

SAM template with vendor information.

Next I upload several different invoices from the local machine to the S3 bucket:

Uploading different files to the S3 bucket.

In the Step Functions console, I select the execution for utility-bill.png. This execution matches the criteria and follows the MatchFound path in the workflow.

MatchFound path in visual workflow.

3. Matching a driver’s license by state

In this example, the application routes based upon the state where a driver’s license is issued. For this test, I use a range of sample images of licenses from DMVs in multiple states.

I modify the SAM template so the Decider function uses both label and word detection. I set “Driving License” and “Person” as required labels. This ensures that Amazon Rekognition identifies a person is in the photo in addition to the document type.

Environment variables in the SAM template.

Next, I upload the driver’s license images to the S3 bucket:

Uploading files to the S3 bucket.

In the Step Functions console, I open the execution for the driver-license-ny.png file, and it has followed the MatchFound path in the workflow:

Execution path for driver's license test.

When I select the execution for the Texas driver’s license, this did not match and has followed the NoMatchFound execution path:

Execution path for NoMatchFound.

Extending the functionality

By triggering Step Functions workflows from S3 PutObject events, this application is highly scalable. As more objects are stored in the S3 bucket, it creates as many executions as needed in the state machine. The custom code only handles the specific logic requirements for a single object and the Lambda service scales up to meet demand.

In these examples, the application uses Amazon Rekognition to analyze specific document types or image contents. You could extend this logic to include value ranges, multiple alternative workflow paths, or include steps to enable human intervention.

Using Step Functions also makes it easy to modify workflows as requirements change. Any incomplete workflows continue on the existing version of the state machine used when they started. As a result, you can add steps without impacting existing code, making it faster to adapt applications to users’ needs.

Conclusion

You can use Step Functions to model many common business workflows with JSON. Combining this powerful workflow management service with the scalability of S3 and Lambda, you can quickly build nuanced solutions that operate at scale.

In this post, I show how you can deploy a simple Step Functions workflow where executions are created by objects stored in an S3 bucket. Using minimal code, it can perform complex workflow routing tasks based on document types and contents. This provides a highly flexible and scalable way to manage common organizational workflow needs.