AWS for Industries

fMRI Data Preprocessing on AWS using AFNI and AWS HealthOmics

Functional magnetic resonance imaging (fMRI) is an indirect measure of neural activity that uses magnetic gradients to measure blood oxygenation in the brain. Many studies use fMRI data to non-invasively measure brain activity across psychology, cognitive neuroscience, and to better understand diseases that impact brain function. These studies can produce terabytes of data that require preprocessing to analyze. This preprocessing can be completed using a mix of storage and compute options as demonstrated in our blog post on Data Preprocessing on AWS using fMRI prep. Scientists who do not want to manage their infrastructure and the scaling of that infrastructure can use AWS HealthOmics.

In this blog post, I demonstrate how to run Analysis of Functional NeuroImages (AFNI) on AWS HealthOmics to process fMRI data. I walk through how to run an AFNI process on fMRI data step-by-step.

Overview of solution

In this walk-though, users upload data in NIfTI format to S3. Users create a Nextflow workflow to orchestrate the running of AFNI. AWS HealthOmics pulls data from S3 and a docker image from Amazon Elastic Container Registry (ECR). AWS HealthOmics manages infrastructure provisioning, runs AFNI to preprocess the data from S3 and writes processed data to a S3 bucket.

figure one_S3 bucket architectureWalkthrough

In this post, I will walk you through:

  • Creating a HealthOmics Nextflow workflow to run AFNI
  • Running the workflow with data from Amazon S3
  • Viewing the processed fMRI data

Prerequisites

For this walkthrough, you should have the following prerequisites:

HealthOmics Workflows

AWS HealthOmics has two components that I will be using for this walk-through:

  • HealthOmics workflows, used to process and analyze data
  • HealthOmics runs, a mechanism for running workflows individually or as part of a group

Creating a HealthOmics workflow and run
Let’s create a HealthOmics workflow to orchestrate the running of AFNI. These workflows are defined using workflow languages like Nextflow. I’ll use the AWS CLI to create and run our workflow.

Let’s define our workflow below. Create a main.nf file with the following Nextflow code to create a workflow and AFNI process.

//run AFNI image and call proc script
process afni {
    container params.afni_image
    publishDir "/mnt/workflow/pubdir"
    input:
    path niftifile
    output:
    path '*'
    script:
    """
    afni_proc.py -dsets ${niftifile}
    tcsh -xef proc.SUBJ 2>&1 | tee output.proc.SUBJ
    """
}
workflow {
    data = Channel.fromPath(params.input_file)
    afni(data)
} 

Zip the file you created and create a workflow definition.

#Create workflow
zip workflow_def main.nf
aws omics create-workflow \
  --name afni \
  --region us-east-1 \
  --engine NEXTFLOW \
  --definition-zip fileb://workflow_def.zip \
  --parameter-template '{"input_file": {"description":"input test file to copy"}, "afni_image": {"description":"docker image to copy"}}'

A successful response looks like:

{
    "arn": "arn:aws:omics:us-east-1:123456789012:workflow/1234567",
    "id": "1234567",
    "status": "CREATING",
    "tags": {}
}

You can also view the workflow you created in the HealthOmics console:

figure two_HealthOmics console

Let’s run the workflow we just created to process our fMRI data

#Run workflow
aws omics start-run \
  –-workflow-id created_workflow_id \
  –-role-arn created_role_arn \
  –output-uri s3://bucketpath \
  –-parameters ‘{“input_file”:s3://bucketpath, “afni_image”: ecr_uri}’

A successful response looks like:

{
    "arn": "arn:aws:omics:us-east-1:123456789123:run/1234567",
    "id": "1234567",
    "status": "PENDING",
    "tags": {},
    "uuid": "ab12345c-678d-e91f-2gh3-4i52j67k89lm",
    "runOutputUri": "s3://bucketpath"
}

Navigating to the AWS HealthOmics console we can see infrastructure provisioning and our AFNI function running and processing our data.

figure three_AFNI function running and processing our dataClick “View Cloudwatch logs” to confirm your run is starting.

figure four_Cloudwatch logs

When working with AFNI, you may want to debug your script execution within the docker container. You can view printed errors and messaging using task logging. This can be very useful for confirming AFNI functions are being fed the correct input type and making sure you’re working with anatomical or time-series data as you’re intending.

From the Run detail page, select the “Tasks” tab. This will have a list of the tasks that the workflow Is running or has completed. Find the task named “afni” and click “View Logstream”.

 

Figure 5_logstream

From here you can see all the output you typically see when working with AFNI from your local terminal.

figure 6_output

Congrats! Once the process is finished you can view your processed results in your S3 bucket.

Conclusion

There are many ways to run AFNI on AWS. AWS HealthOmics provides a seamless way to orchestrate AFNI data preprocessing pipelines without managing compute and storage infrastructure. As you grow more comfortable with AWS HealthOmics you can begin to automate data ingestion and build on the data processing pipeline we created today.

To learn more, see the AWS HealthOmics page, AWS HealthOmics User Guide, and Healthcare & Life Sciences on AWS.

Dana Owens

Dana Owens

Dana Owens is a Startups Solutions Architect for AWS and is passionate about helping developers build bioinformatics pipelines on AWS. She has multiple years of experience working with AWS customers and specializes in healthcare and life sciences and working with graph databases.