Handle Errors in Serverless Applications

with AWS Step Functions and AWS Lambda

In this tutorial, you will learn how to use AWS Step Functions to handle workflow runtime errors. AWS Step Functions is a serverless orchestration service that lets you easily coordinate multiple Lambda functions into flexible workflows that are easy to debug and easy to change. AWS Lambda is a compute service that lets you run code without provisioning or managing servers. 

Lambda functions can occasionally fail, such as when an unhandled exception is raised, when they run longer than the configured timeout, or when they run out of memory. Writing and maintaining error handling logic in every one of your Lambda functions to handle situations such as API throttling or socket timeouts can be time-intensive and complicated–especially for distributed applications. Embedding this code in each Lambda function creates dependencies between them, and it can be difficult to maintain all of those connections as things change.

To avoid this, and to reduce the amount of error handling code you write, you can use AWS Step Functions to create a serverless workflow that supports function error handling. Regardless of whether the error is a function exception created by the developer (e.g., file not found), or unpredicted (e.g., out of memory), you can configure Step Functions to respond with conditional logic based on the type of error that occurred. By separating your workflow logic from your business logic in this way, you can modify how your workflow responds to errors without changing the business logic of your Lambda functions.

In this tutorial, you will design and run a serverless workflow using AWS Step Functions that will gracefully handle these errors. You’ll create an AWS Lambda function which will mock calls to a RESTful API and return various response codes and exceptions. Then, you’ll use AWS Step Functions to create a state machine with Retry and Catch capabilities that responds with different logic depending on the exception raised.

This tutorial requires an AWS account

There are no additional charge for AWS Step Functions or AWS Lambda. The resources you create in this tutorial are Free Tier eligible. 

More about the Free Tier >>

Step 1. Create a Lambda Function to Mock an API

In this step, you will create a Lambda function that will mock a few basic API interactions. The Lambda function raises exceptions to simulate responses from a fictitious API, depending on the error code that you provide as input in the event parameter.

a.  Open the AWS Management Console, so you can keep this step-by-step guide open. When the screen loads, enter your user name and password to get started. Next, type Lambda in the search bar and select Lambda to open the service console.


( click to enlarge )

b. Choose Create a function.


( click to enlarge )

c. Leave Author from scratch selected. Next, configure your Lambda function as follows:

For Name, type MockAPIFunction.
For Runtime, choose Python 3.6.
For Role, select Create custom role.

A new IAM window will open. Leave the Role name as lambda_basic_execution and click Allow. You will automatically be returned back to the Lambda console.

Click Create function.


( click to enlarge )

d. On the MockAPIFunction screen, scroll down to the Function code section. In this tutorial, you'll create a function that uses the programming model for authoring Lambda functions in Python. In the code window, replace all of the code with the following, then choose Save.

class TooManyRequestsException(Exception): pass
class ServerUnavailableException(Exception): pass
class UnknownException(Exception): pass

def lambda_handler(event, context):
    statuscode = event["statuscode"]    
    if statuscode == "429":
        raise TooManyRequestsException('429 Too Many Requests')
    elif statuscode == "503":
        raise ServerUnavailableException('503 Server Unavailable')
    elif statuscode == "200":
        return '200 OK'
        raise UnknownException('Unknown error')

( click to enlarge )

e. Once your Lambda function is created, scroll to the top of the window and note its Amazon Resource Name (ARN) in the upper-right corner of the page. Amazon Resource Names (ARNs) uniquely identify AWS resources, and help you track and use AWS items and policies across AWS services and API calls. We require an ARN when you need to reference a specific resource from Step Functions.


( click to enlarge )

Step 2. Create an AWS Identity and Access Management (IAM) Role

AWS Step Functions can execute code and access other AWS resources (for example, data stored in Amazon S3 buckets). To maintain security, you must grant Step Functions access to these resources using AWS Identity and Access Management (IAM).

a. In another browser window, navigate to the AWS Management Console and type IAM in the search bar. Click IAM to open the service console.


( click to enlarge )

b. Click Roles, then choose Create Role.


( click to enlarge )

c. On the Select type of trusted entity page, under AWS service, select Step Functions from the list, and then choose Next: Permissions.


( click to enlarge )

d. On the Attach permissions policy page, choose Next: Review.



( click to enlarge )

e. On the Review page, type step_functions_basic_execution for Role name and click Create role.


( click to enlarge )

f. Your new IAM role is created and appears in the list beneath the IAM role for your Lambda function.


( click to enlarge )

Step 3. Create a Step Functions State Machine

Now that you’ve created your simple Lambda function that mocks an API response, you can create a Step Functions state machine to call the API and handle exceptions.

In this step, you’ll use the Step Functions console to create a state machine that uses a Task state with a Retry and Catch field to handle the various API response codes. You’ll use a Task state to invoke you mock API Lambda function, which will return the API status code you provide as input into your state machine.

a. Open the AWS Step Functions console. On the Create a state machine page, select Author from scratch. In the Details section, name your state machine MyAPIStateMachine, and then select I will use an existing role.


( click to enlarge )

b. Next, you will design a state machine that will take different actions depending on the response from your mock API. If the API can’t be reached, the workflow will try again. Retries are a helpful way to address transient errors. The workflow also catches different exceptions thrown by the mock API.

Replace the contents of the State machine definition section with the following code:

  "Comment": "An example of using retry and catch to handle API responses",
  "StartAt": "Call API",
  "States": {
    "Call API": {
      "Type": "Task",
      "Resource": "arn:aws:lambda:REGION:ACCOUNT_ID:function:FUNCTION_NAME",
      "Next" : "OK",
      "Comment": "Catch a 429 (Too many requests) API exception, and resubmit the failed request in a rate-limiting fashion.",
      "Retry" : [ {
        "ErrorEquals": [ "TooManyRequestsException" ],
        "IntervalSeconds": 1,
        "MaxAttempts": 2
      } ],
      "Catch": [ 
          "ErrorEquals": ["TooManyRequestsException"],
          "Next": "Wait and Try Later"
        }, {
          "ErrorEquals": ["ServerUnavailableException"],
          "Next": "Server Unavailable"
        }, {
          "ErrorEquals": ["States.ALL"],
          "Next": "Catch All"
    "Wait and Try Later": {
      "Type": "Wait",
      "Seconds" : 1,
      "Next" : "Change to 200"
    "Server Unavailable": {
      "Type": "Fail",
      "Cause": "The server is currently unable to handle the request."
    "Catch All": {
      "Type": "Fail",
      "Cause": "Unknown error!",
      "Error": "An error of unknown type occurred"
    "Change to 200": {
      "Type": "Pass",
      "Result": {"statuscode" :"200"} ,
      "Next": "Call API"
    "OK": {
      "Type": "Pass",
      "Result": "The request has succeeded.",
      "End": true

( click to enlarge )

c. Find the “Resource” line in the “Call API” Task state (line 7). To update this ARN to the ARN of the mock API Lambda function you just created, click on the ARN text and then select the ARN from the list.


( click to enlarge )

d. Click the refresh button beside the visual workflow pane to have Step Functions create a state machine diagram that corresponds to the workflow you just designed. After reviewing the visual workflow, click Create state machine.


( click to enlarge )

Step 4. Test your Error Handling Workflow

To test your error handling workflow, you will invoke your state machine to call your mock API by providing the error code as input.

a. Click Start execution.


( click to enlarge )

b. A new execution dialog box appears, where you can enter input for your state machine. You will play the part of the API, and supply the error code that we want the mock API to return. Replace the existing text with the code below, then choose Start execution:

    "statuscode": "200"

( click to enlarge )

c. On the Execution details screen, click Input to see the input you provided your state machine. Next, click Output to view the result of your state machine execution. You can see that the workflow interpreted statuscode 200 as a successful API call.


( click to enlarge )

d. Under Visual workflow, you can see the execution path of each execution, shown in green in the workflow. Click on the "Call API" Task state and then expand the Input and Output fields in the Step details screen.

You can see that this Task state successfully invoked your mock API Lambda function with the input you provided, and captured the output of that Lambda function, “200 OK”.


( click to enlarge )

e. Next, click on the "OK" Task state in the visual workflow. Under Step details you can see that the output of the previous step (the Call API Task state) has been passed as the input to this step. The OK state is a Pass state, which simply passed its input to its output, performing no work. Pass states are useful when constructing and debugging state machines.


( click to enlarge )

Step 5. Inspect the Execution of your State Machine

a. Scroll to the top of the Execution details screen and click on MyAPIStateMachine.


( click to enlarge )

b. Click on Start execution again, and this time provide the following input and then click Start execution.

    "statuscode": "503"

( click to enlarge )

c. In the Execution event history section, expand each execution step to confirm that your workflow behaved as expected. We expected this execution to fail, so don’t be alarmed! You’ll notice that:

  1. Step Functions captured your Input
  2. That input was passed to the Call API Task state
  3. The Call API Task state called your MockAPIFunction using that input
  4. The MockAPIFunction executed
  5. The MockAPIFunction failed with a ServerUnavailableException
  6. The catch statement in your Call API Task state caught that exception
  7. The catch statement failed the workflow
  8. Your state machine completed its execution

( click to enlarge )

d. Next, you’ll simulate a 429 exception. Scroll to the top of the Execution details screen and click on MyAPIStateMachine. Click on Start execution, provide the following input, and click Start execution:

    "statuscode": "429"

( click to enlarge )

e. Now you’ll inspect the retry behavior of your workflow. In the Execution event history section, expand each execution step once more to confirm that Step Functions tried calling the MockAPILambda function two more times, both of which failed. At that point, your workflow transitioned to the Wait and Try Later state (shown in the image on the right), in the hopes that the API was just temporarily unresponsive.

Next, the Wait state used brute force to change the response code to 200, and your workflow completed execution successfully. That probably wouldn’t be how you handled a 429 exception in a real application, but we’re keeping things simple for the sake of the tutorial.


( click to enlarge )

f. Run one more instance of your workflow, and this time, provide a random API response that is not handled by your state machine:

    "statuscode": "999"

Inspect the execution again using the Execution event history. When complete, click on MyAPIStateMachine once more. In the Executions pane, you can see the history of all executions of your workflow, and step into them individually as you like.


( click to enlarge )

Step 6. Terminate your Resources

In this step you will terminate your AWS Step Functions and AWS Lambda related resources.

Important: Terminating resources that are not actively being used reduces costs and is a best practice. Not terminating your resources can result in a charge.

a. At the top of the AWS Step Functions console window, click on State machines.


( click to enlarge )

b. In the State machines window, click on MyAPIStateMachine and select Delete. Confirm the action by selecting Delete state machine in the dialog box. Your state machine will be deleted in a minute or two once Step Functions has confirmed that any in process executions have completed.


( click to enlarge )

c. Next, you’ll delete your Lambda functions. Click Services in the AWS Management Console menu, then select Lambda.


( click to enlarge )

d. In the Functions screen, click on your MockAPIFunction, select Actions, and then Delete. Confirm the deletion by clicking Delete again.


( click to enlarge )

e. Lastly, you’ll delete your IAM roles. Click Services in the AWS Management Console menu, then select IAM.


( click to enlarge )

f. Select both of the IAM roles that you created for this tutorial, then click Delete role. Confirm the delete by clicking Yes, Delete on the dialog box.

You can now sign out of the AWS Management console.


( click to enlarge )


You have used AWS Step Functions and AWS Lambda to create an error handling workflow for a network API. Using AWS Lambda, you can run code for virtually any type of application or backend service - all with zero administration. Just upload your code and Lambda takes care of everything required to run and scale your code with high availability.

Combining AWS Step Functions with AWS Lambda makes it simple to orchestrate AWS Lambda functions for serverless applications. Step Functions allows you to control complex workflows using Lambda functions without the underlying application managing and orchestrating the state. You can also use Step Functions for microservices orchestration using compute resources such as Amazon EC2 and Amazon ECS.