AWS HPC Blog

Using AWS Batch Console Support for Step Functions Workflows

Last year, we published the Genomics Secondary Analysis Using AWS Step Functions and AWS Batch solution as a companion solution to the Genomics Data Transfer, Analytics, and Machine Learning Using AWS Services whitepaper. Since then, many customers have used the secondary analysis solution to automate their bioinformatics pipelines in AWS. A common pain point expressed by customers is that it is difficult to associate Batch Jobs with tasks in bioinformatics workflows orchestrated using Step Functions. Customers have to navigate between the Batch and Step Functions console and correlate jobs with tasks to track down the root cause for a job failure or make changes to a task in a workflow. This can be particularly challenging if you have hundreds to thousands of jobs and workflows running.

With the launch of the Step Functions Batch Console Integration, you can now visualize where and how your Batch jobs are composed into workflows without leaving the Batch console. This allows you to navigate more easily between your Batch jobs, the workflows they are involved in, and add their workflow executions, bringing together two core AWS services to streamline management of your business-critical workflows.

In the case of the secondary analysis solution, this integration will remove some of the friction switching between consoles when running workflows. In this blog post, we will show you how this integration works in practice.

Walkthrough of using the new Step Functions Batch Console integration

The secondary analysis solution creates a scalable environment in AWS to develop, build, deploy, and run genomics secondary analysis workflows, for example, processing raw whole genome sequences into variant calls. This solution includes an AWS Step Functions state machine that implements a simple genomics variant calling workflow using BWA-MEM, SAMtools, and BCFtools. The workflow aligns raw FASTQ files to a reference sequence, and call variants on a specified list of chromosomes using dynamic parallelism. The workflow is defined using Amazon States Language (ASL) and deployed as an AWS Step Functions state machine.

Let’s open the solution Step Functions state machine in the Batch console and modify the align task to include the sample ID and sample name in the read header in the aligned output file, the Binary Alignment Map (BAM).

Prerequisites

For this walkthrough, you should have the following prerequisites:

  • An AWS account
  • The secondary analysis solution setup in an AWS account. For this walkthrough, you will need to deploy the Genomics Secondary Analysis Using AWS Step Functions and AWS Batch solution within your account. Click on the Launch in the AWS Console button to launch the solution AWS CloudFormation template in your account. Once you do, you will be able to view the solution workflow in the new console integration, and proceed with the instructions.

Open your AWS Step Functions state machine definition in the AWS Batch management console

To edit the solution workflow in the Batch management console, click on Workflow Orchestration under related services.

Then select your workflow, in my case the solution workflow is called GenomicsWorkflowCode-WorkflowSimple-152R19BFQY256. This takes me to the workflow definition. From this screen you have the option to view your workflow, edit the workflow or execute the workflow.

View your AWS Step Functions state machine in the AWS Batch console

Clicking on View in Step Functions takes you to state machine page in Step Functions for your workflow.

Edit your AWS Step Functions state machine in the AWS Batch console

You can also click on Edit to open the state machine definition in the Step Functions console for your workflow. Let’s modify the Command in the BwaMem task to include -R "@RG\\tID:${SAMPLE_ID}\\tSM:${SAMPLE_NAME}" in the state machine definition. Also add SAMPLE_NAME as an environment variable. Then save the changes.

Execute your AWS Step Functions state machine in the AWS Batch console

You can also execute the state machine by clicking on Execute for the state machine in the Batch console. This takes you to the execution page for your state machine in the Step Functions console. Notice that we are passing the SAMPLE_ID and SAMPLE_NAME as parameters. Click Start Execution to run the state machine.

You can see how BWA was called in the log stream for the bwa-mem job. It should report out the command that was run:

[command]: bwa mem -t 8 -p -o NIST7035.sam -R "@RG\tID:NIST7035\tSM:Sample 1" Homo_sapiens_assembly38.fasta NIST7035_*1*.fastq.gz

Cleaning up

To avoid incurring future charges, delete the solution by deleting the solution setup stack. You can learn more about deleting the solution in the Uninstalling the Solution section of the Genomics Secondary Analysis Using AWS Step Functions and AWS Batch implementation guide.

Conclusion

The new AWS Batch support for Step Functions workflows makes it easy to edit and execute Step Functions state machines from the Batch console. You can learn more about the AWS Batch management console support for Step Functions workflows in the Orchestrate AWS Batch jobs documentation.

Ryan Ulaszek

Ryan Ulaszek

Ryan Ulaszek is a Solutions Architect specializing in Life Sciences. He works with startup customers around the world to architect solutions on AWS. His passion is working at the intersection of science and technology. In his spare time, he enjoys camping and spending time with his wife and two kids.