AWS Compute Blog

Simplifying developer experience with variables and JSONata in AWS Step Functions

This post is written by Uma Ramadoss, Principal Specialist SA, Serverless and Dhiraj Mahapatro, Principal Specialist SA, Amazon Bedrock

AWS Step Functions is introducing variables and JSONata data transformations. Variables allow developers to assign data in one state and reference it in any subsequent steps, simplifying state payload management without the need to pass data through multiple intermediate states. With JSONata, an open source query and transformation language, you now perform advanced data manipulation and transformation, such as date and time formatting and mathematical operations.

This blog post explores the powerful capabilities of these new features, delving deep into simplifying data sharing across states using variables and reducing data manipulation complexity through advanced JSONata expressions.

Overview

Customers choose Step Functions to build complex workflows that involve multiple services such as AWS Lambda, AWS Fargate, Amazon Bedrock, and HTTP API integrations. Within these workflows, you build states to interface with these various services, passing input data and receiving responses as output. While you can use Lambda functions for date, time, and number manipulations beyond Step Functions’ intrinsic capabilities, these methods struggle with increasing complexity, leading to payload restrictions, data conversion burdens, and more state changes. This affects the overall cost of the solution. You use variables and JSONata to address this.

To illustrate these new features, consider the same business use case from the JSONPath blog, a customer onboarding process in the insurance industry. A potential customer provides basic information, including names, addresses, and insurance interests, while signing up. This Know-Your-Customer (KYC) process starts a Step Functions workflow with a payload containing these details. The workflow decides the customer’s approval or denial, followed by sending a notification.

{
  "data": {
    "firstname": "Jane",
    "lastname": "Doe",
    "identity": {
      "email": "jdoe@example.com",
      "ssn": "123-45-6789"
    },
    "address": {
      "street": "123 Main St",
      "city": "Columbus",
      "state": "OH",
      "zip": "43219"
    },
    "interests": [
      {"category": "home", "type": "own", "yearBuilt": 2004, "estimatedValue": 800000},
      {"category": "auto", "type": "car", "yearBuilt": 2012, "estimatedValue": 8000},
      {"category": "boat", "type": "snowmobile", "yearBuilt": 2020, "estimatedValue": 15000},
      {"category": "auto", "type": "motorcycle", "yearBuilt": 2018, "estimatedValue": 25000},
      {"category": "auto", "type": "RV", "yearBuilt": 2015, "estimatedValue": 102000},
      {"category": "home", "type": "business", "yearBuilt": 2009, "estimatedValue": 500000}
    ]
  }
}

The original workflow diagram illustrates the workflow without new features, while the new workflow diagram shows the workflow built by applying variables and JSONata. Access the workflows in the GitHub repository from the main (original workflow) and jsonata-variables (new workflow) branches.

Image of Original Workflow.

Figure 1: Original Workflow

Image of New Workflow.

Figure 2: New Workflow

Setup

Follow the steps in the README to create this state machine and cleanup once testing is complete.

Simplifying data sharing with variables

Variables allow you to instantiate or assign state results to a variable that is referenced in future states. In a single state, you assign multiple variables with different values, including static data, results of a state, JSONPath or JSONata expressions, and intrinsic functions. The following diagram illustrates how variables are assigned and used inside a state machine:

Image of Variable assignment and scope.

Figure 3: Variable assignment and scope

Variable scope

In Step Functions, variables have a scope similar to programming languages. You define variables at different levels, with inner scope and outer scope. Inner scope variables are defined inside map, parallel, or nested workflows and these variables are only accessible within their specific scope. Alternatively, you set outer scope variables at the top level. Once assigned, these variables can be accessed from any downstream state irrespective of their order of execution in the future. However, as of the release of this blog, distributed map state cannot reference variables in outer scopes. The user guide on variable scope elaborates on these edge cases.

Variable assignment and usage
To set a variable’s value, use the special field Assign. The JSONata part of this blog post further down explains the purpose of {%%}.

"Assign": {
  "inputPayload": "{% $states.context.Execution.Input %}",
  "isCustomerValid": "{% $states.result.isIdentityValid and $states.result.isAddressValid %}"
} 

Use a variable by writing a dollar sign ($) before its name.

{
  "TableName": "AccountTable",
  "Item": {
    "email": {
      "S": "{% $inputPayload.data.email %}"
    },
    "firstname": {
      "S": "{% $inputPayload.data.firstname %}"
    },....
} 

Simplifying data manipulations with JSONata

JSONata is a lightweight query and transformation language for Json data. JSONata offers more capabilities compared to JSONPath within Step Functions.

Setting QueryLanguage to “JSONata” and using {%%} tags for JSONata expressions allows you to leverage JSONata within a state machine. Apply this configuration at the top level of the state machine or at each task level. JSONata at the task level gives you fine-grained control of choosing JSONata vs JSONPath. This approach is valuable for complex workflows where you want to simplify a subset of states with JSONata and continue to use JSONPath for the rest. JSONata provides you with more functions and operators than JSONPath and intrinsic functions in Step Functions. Activating the QueryLanguage attribute as JSONata at the state machine level disables JSONPath, therefore, restricting the use of InputPath, ParametersResultPath, ResultSelector, and OutputPath. Instead of these JSONPath parameters, JSONata uses Arguments and Output.

Optimizing simple states

One of the first things to notice in the new state machine is that the Verification process does not use Lambda functions anymore as seen in the following comparison:

Image of Lambda functions replaced with Pass states.

Figure 4: Lambda functions replaced with Pass states

In the previous approach, a Lambda function is used to validate email and SSN using regular expressions:

const ssnRegex = /^\d{3}-?\d{2}-?\d{4}$/;
const emailRegex = /^[a-zA-Z0-9._-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,4}$/;

exports.lambdaHandler = async event => {
  const { ssn, email } = event;
  const approved = ssnRegex.test(ssn) && emailRegex.test(email);

  return {
    statusCode: 200,
    body: JSON.stringify({ 
      approved,
      message: `identity validation ${approved ? 'passed' : 'failed'}`
    })
  }
};

With JSONata, you define regular expressions directly in the state machine’s Amazon States Language (ASL). You use a Pass state and $match() from JSONata to validate the email and the SSN.

{
  "StartAt": "Check Identity",
   "States": {
    "Check Identity": {
      "Type": "Pass",
      "QueryLanguage": "JSONata",
      "End": true,
      "Output": {
        "isIdentityValid": "{% $match($states.input.data.identity.email, /^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\\.[a-zA-Z]{2,}$/) and $match($states.input.data.identity.ssn, /^(\\d{3}-?\\d{2}-?\\d{4}|XXX-XX-XXXX)$/) %}"
      }
    }
   }
}

The same applies to validate the address inside a Pass state using sophisticated JSONata string functions like $length, $trim, $each, and $not from JSONata:

{
  "StartAt": "Check Address",
  "States": {
    "Check Address": {
      "Type": "Pass",
      "QueryLanguage": "JSONata",
      "End": true,
      "Output": {
        "isAddressValid": "{% $not(null in $each($states.input.data.address, function($v) { $length($trim($v)) > 0 ? $v : null })) %}"
      }
    }
  }
}

When using JSONata, $states becomes a reserved variable.

Result aggregation

Previously with JSONPath, using an expression outside of a Choice state was not available. That is not the case anymore with JSONata. The parallel state, in the example, gathers identity and address verification results from each sub-step. You merge the results into a boolean variable isCustomerValid.

"Verification": {
  "Type": "Parallel",
  "QueryLanguage": "JSONata",
  ...
  "Assign": {
    "inputPayload": "{% $states.context.Execution.Input %}",
    "isCustomerValid": "{% $states.result.isIdentityValid and $states.result.isAddressValid %}"
  },
  "Next": "Approve or Deny?"
}

The crucial part to note here is the access to results via $states.result and use of AND boolean-operator inside {%%}. This ultimately makes the downstream Choice state, which uses this variable, simpler. Operators in JSONata give you flexibility to write expressions like these wherever possible, which reduces the need of a compute layer to process simple data transformations.

Additionally, the Choice state becomes simpler to use with flexible JSONata operators and expressions, as long as the expressions within {%%} result in a true or false value.

"Approve or Deny?": {
  "Type": "Choice",
  "QueryLanguage": "JSONata",
  "Choices": [
    {
      "Next": "Add Account",
      "Condition": "{% $isCustomerValid %}"
    }
  ],
  "Default": "Deny Message"
}

Intrinsic functions as JSONata functions

Step Functions provides built-in JSONata functions to enable parity with Step Functions’ intrinsic functions. The DynamoDB putItem step shows how you use $uuid() that has the same functionality as States.UUID() intrinsic function. You also get JSONata specific functions on date and time. The following state shows the use of $now() to get the current timestamp as ISO-8601 as a string before inserting this item to the DynamoDB table.

"Add Account": {
  "Type": "Task",
  "QueryLanguage": "JSONata",
  "Resource": "arn:aws:states:::dynamodb:putItem",
  "Arguments": {
    "TableName": "AccountTable",
    "Item": {
      "PK": {
        "S": "{% $uuid() %}"
      },
      "email": {
        "S": "{% $inputPayload.data.identity.email %}"
      },
      "name": {
        "S": "{% $inputPayload.data.firstname & ' ' & $inputPayload.data.lastname  %}"
      },
      "address": {
        "S": "{% $join($each($inputPayload.data.address, function($v) { $v }), ', ') %}"
      },
      "timestamp": {
        "S": "{% $now() %}"
      }
    }
  },
  "Next": "Interests"
}

Notice that you don’t apply the .$ notation in S.$ anymore as JSONata expressions reduces developer pain while building state machine ASL. Explore the additional JSONata functions accessible within Step Functions.

Advanced JSONata

JSONata’s flexibility stems from its pre-built functions, higher-order functions support, and functional programming constructs. With JSONPath, you used the advanced expressions "InputPath": "$..interests[?(@.category==home)]" to filter Home insurance related interests from the interests array. JSONata does much more than filtering. For example, you look for home insurance interests, totalAssetValue of the category type as home, and refer to existing fields like name and email as JSONata variables:

(
    $e := data.identity.email;
    $n := data.firstname & ' ' & data.lastname;
    
    data.interests[category = 'home']{
      'customer': $n,
      'email': $e,
      'totalAssetValue': $sum(estimatedValue),
      category: {type: yearBuilt}
    }
)

The result JSON will be:

{
  "customer": "Jane Doe",
  "email": "jdoe@example.com",
  "totalAssetValue": 1400000,
  "home": {
    "own": 2004,
    "business": 2009
  }
}

By following these steps, you ascend one level by collecting all of the insurance interests and their aggregated results. Notice that the category filter is no longer present.

(
    $e := data.identity.email;
    $n := data.firstname & ' ' & data.lastname;
    
    data.interests{
      'customer': $n,
      'email': $e,
      'totalAssetValue': $sum(estimatedValue),
      category: {type: yearBuilt}
    }
)

which results in:

{
  "customer": "Jane Doe",
  "email": "jdoe@example.com",
  "totalAssetValue": 1549000,
  "home": {
    "own": 2004,
    "business": 2009
  },
  "auto": {
    "car": 2012,
    "motorcycle": 2018,
    "RV": 2015
  },
  "boat": {
    "snowmobile": 2020
  }
}

Discovering complex expressions

Use the JSONata playground with your sample data to discover detailed and complex expressions that fit your requirements. The following is an example of using the JSONata playground:

Image of JSONata playground.

Figure 5: JSONata playground

Considerations

Variable Size

The maximum size of a single variable is 256Kib. This limit helps you bypass the Step Functions payload size restriction by letting you store state outputs in separate variables. While each individual variable can be up to 256Kib in size, the total size of all variables within a single Assign field cannot exceed 256Kib. Use Pass states to workaround this limitation, however, the total size of all stored variables cannot exceed 10MiB per execution.

Variable visibility

Variables are a powerful mechanism to simplify the data sharing across states. Prefer them over ResultPath, OutputPath or JSONata’s Output fields because of their ease of use and flexibility. There are two situations where you might still use Output. First, you can’t access inner-scoped variables in the outer scope. In these cases, fields in Output can help share data between different workflow levels. Second, when sending a response from the final state of the workflow, you may need to use fields in Output fields. The following transition diagram from JSONPath to JSONata provides additional details:

Image of Transition from JSONPath to JSONata.

Figure 6: Transition from JSONPath to JSONata

Additionally, variables assigned to a specific state are not accessible in that same state:

"Assign Variables": {
  "Type": "Pass",
  "Next": "Reassign Variables",
  "Assign": {
    "x": 1,
    "y": 2
  }
},
"Reassign Variables": {
  "Type": "Pass",
  "Assign": {
    "x": 5,
    "y": 10,
      ## The assignment will fail unless you define x and y in a prior state.
      ## otherwise, the value of z will be 3 instead of 15.
    "z": "{% $x+$y %}"
  },
  "Next": "Pass"
}

Best practices

Step Functions’ validation API provides semantic checks for workflows, allowing for early problem identification. To ensure safe workflow updates, it’s best to combine the validation API with versioning and aliases for incremental deployment.

Multi-line expressions in JSONata are not valid JSON. Therefore, use a single line as string delimited by a semicolon “;” where the last line returns the expression.

Mutually exclusive

Use of QueryLanguage type is mutually exclusive. Do not mix JSONPath/intrinsic functions and JSONata during variable assignments. For example, the below task fails because the variable b uses JSONata, whereas c uses an intrinsic function.

"Store Inputs": {
  "Type": "Pass",
  "QueryLanguage": "JSONata"
  "Assign": {
    "inputs": {
      "a": 123,
      "b": "{% $states.input.randomInput %}",
      "c.$": "States.MathRandom($.start, $.end)"
    }
  },
  "Next": "Average"
}

To use variables with JSONPath, set the QueryLanguage to JSONPath or remove this attribute from the task definition.

Conclusion

With variables and JSONata, AWS Step Functions now elevates the developer’s experience to write elegant workflows with simpler code in Amazon States Language (ASL) that matches with the normal programming paradigm. Developers can now build faster and write cleaner code by cutting out extra data transformation steps. These capabilities can be used in both new and existing workflows, giving you the flexibility to upgrade from JSONPath to JSONata and variables.

Variables and JSONata are available at no additional cost to customers in all the AWS regions where AWS Step Functions is available. For more information, refer to the user guide for JSONata and variables, as well as the sample application in the jsonata-variables branch.

To expand your serverless knowledge, visit Serverless Land.