AWS Compute Blog

Implementing reactive progress tracking for AWS Step Functions

This blog post is written by Alexey Paramonov, Solutions Architect, ISV and Maximilian Schellhorn, Solutions Architect ISV

This blog post demonstrates a solution based on AWS Step Functions and Amazon API Gateway WebSockets to track execution progress of a long running workflow. The solution updates the frontend regularly and users are able to track the progress and receive detailed status messages.

Websites with long-running processes often don’t provide feedback to users, leading to a poor customer experience. You might have experienced this when booking tickets, searching for hotels, or buying goods online. These sites often call multiple backend and third-party endpoints and aggregate the results to complete your request, causing the delay. In these long running scenarios, a transparent progress tracking solution can create a better user experience.

Overview

The example provided uses:

  • AWS Serverless Application Model (AWS SAM) for deployment: an open-source framework for building serverless applications.
  • AWS Step Functions for orchestrating the workflow.
  • AWS Lambda for mocking long running processes.
  • API Gateway to provide a WebSocket API for bidirectional communications between clients and the backend.
  • Amazon DynamoDB for storing connection IDs from the clients.

The example provides different options to report the progress back to the WebSocket connection by using Step Functions SDK integration, Lambda integrations, or Amazon EventBridge.

The following diagram outlines the example:

  1. The user opens a connection to WebSocket API. The OnConnect and OnDisconnect Lambda functions in the “WebSocket Connection Management” section persist this connection in DynamoDB (see documentation). The connection is bidirectional, meaning that the user can send requests through the open connection and the backend can respond with a new progress status whenever it is available.
  2. The user sends a new order through the WebSocket API. API Gateway routes the request to the “OnOrder” AWS Lambda function, which starts the state machine execution.
  3. As the request propagates through the state machine, we send progress updates back to the user via the WebSocket API using AWS SDK service integrations.
  4. For more customized status responses, we can use a centralized AWS Lambda function “ReportProgress” that updates the WebSocket API.

How to respond to the client?

To send the status updates back to the client via the WebSocket API, three options are explored:

Option 1: AWS SDK integration with API Gateway invocation

As the diagram shows, the API Gateway workflow tasks starting with the prefix “Report:” send responses directly to the client via the WebSocket API. This is an example of the state machine definition for this step:

          'Report: Workflow started':
            Type: Task
            Resource: arn:aws:states:::apigateway:invoke
            ResultPath: $.Params
            Parameters:
              ApiEndpoint: !Join [ '.',[ !Ref ProgressTrackingWebsocket, execute-api, !Ref 'AWS::Region', amazonaws.com ] ]
              Method: POST
              Stage: !Ref ApiStageName
              Path.$: States.Format('/@connections/{}', $.ConnectionId)
              RequestBody:
                Message: 🥁 Workflow started
                Progress: 10
              AuthType: IAM_ROLE
            Next: 'Mock: Inventory check'

This option reports the progress directly without using any additional Lambda functions. This limits the system complexity, reduces latency between the progress update and the response delivered to the client, and potentially reduces costs by reducing Lambda execution duration. A potential drawback is the limited customization of the response and getting familiar with the definition language.

Option 2: Using a Lambda function for reporting the progress status

To further customize response logic, create a Lambda function for reporting. As shown in point 4 of the diagram, you can also invoke a “ReportProgress” function directly from the state machine. This Python code snippet reports the progress status back to the WebSocket API:

apigw_management_api_client = boto3.client('apigatewaymanagementapi', endpoint_url=api_url)
apigw_management_api_client.post_to_connection(
            ConnectionId=connection_id,
            Data=bytes(json.dumps(event), 'utf-8')
        )

This option allows for more customizations and integration into the business logic of other Lambda functions to track progress in more detail. For example, execution of loops and reporting back on every iteration. The tradeoff is that you must handle exceptions and retries in your code. It also increases overall system complexity and additional costs associated with Lambda execution.

Option 3: Using EventBridge

You can combine option 2 with EventBridge to provide a centralized solution for reporting the progress status. The solution also handles retries with back-off if the “ReportProgress” function can’t communicate with the WebSocket API.

You can also use AWS SDK integrations from the state machine to EventBridge instead of using API Gateway. This has the additional benefit of a loosely coupled and resilient system but you could experience increased latency due to the additional services used. The combination of EventBridge and the Lambda function adds a minimal latency, but it might not be acceptable for short-lived workflows. However, if the workflow takes tens of seconds to complete and involves numerous steps, option 3 may be more suitable.

This is the architecture:

  1. As before.
  2. As before.
  3. AWS SDK integration sends the status message to EventBridge.
  4. The message propagates to the “ReportProgress” Lambda function.
  5. The Lambda function sends the processed message through the WebSocket API back to the client.

Deploying the example

Prerequisites

Make sure you can manage AWS resources from your terminal.

  • AWS CLI and AWS SAM CLI installed.
  • You have an AWS account. If not, visit this page.
  • Your user has sufficient permissions to manage AWS resources.
  • Git is installed.
  • NPM is installed (only for local frontend deployment).

To view the source code and documentation, visit the GitHub repo. This contains both the frontend and backend code.

To deploy:

  1. Clone the repository:
    git clone "https://github.com/aws-samples/aws-step-functions-progress-tracking.git"
  2. Navigate to the root of the repository.
  3. Build and deploy the AWS SAM template:
    sam build && sam deploy --guided
  4. Copy the value of WebSocketURL in the output for later.
  5. The backend is now running. To test it, use a hosted frontend.

Alternatively, you can deploy the React-based frontend on your local machine:

  1. Navigate to “progress-tracker-frontend/”:
    cd progress-tracker-frontend
  2. Launch the react app:
    npm start
  3. The command opens the React app in your default browser. If it does not happen automatically, navigate to http://localhost:3000/ manually.

Now the application is ready to test.

Testing the example application

  1. The webpage requests a WebSocket URL – this is the value from the AWS SAM template deployment. Paste it into Enter WebSocket URL in the browser and choose Connect.
  2. On the next page, choose Send Order and watch how the progress changes.

    This sends the new order request to the state machine. As it progresses, you receive status messages back through the WebSocket API.
  3. Optionally, you can inspect the raw messages arriving to the client. Open the Developer tools in your browser and navigate to the Network tab. Filter for WS (stands for WebSocket) and refresh the page. Specify the WebSocket URL, choose Connect and then choose Send Order.

Cleaning up

The services used in this solution are eligible for AWS Free Tier. To clean up the resources, in the root directory of the repository run:

sam delete

This removes all resources provisioned by the template.yml file.

Conclusion

In this post, you learn how to augment your Step Functions workflows with low latency progress tracking via API Gateway WebSockets. Consider adding the progress tracking to your long running workflows to improve the customer experience and provide a reactive look and feel for your application.

Navigate to the GitHub repository and review the implementation to see how your solution could become more user friendly and responsive. Start with examining the template.yml and the state machine’s definition and see how the frontend handles WebSocket communication and message visualization.

For more serverless learning resources, visit  Serverless Land.