Faster Genome Search via AWS X-Ray

There are many arguments for building a serverless architecture. Chief among them, users only pay for the computing power they use. Organizations can begin by identifying and eliminating the bottlenecks in their workflow. For example, CSIRO Bioinformatics in Sydney, Australia, used AWS X-Ray to systematically evaluate their serverless architecture and, by doing so, improved their runtime by 80%.

How CSIRO got started

The team at CSIRO Bioinformatics built an innovative solution to one of its most compelling genomic research problems using the AWS serverless architecture pattern. They built a web-based tool called GT-Scan2 (shown above), which helps researchers spot potential edit points in genomic samples for CRISPR-Cas9 research. Because the search implements a burstable workload, the CSIRO team created its solution using AWS – for maximum speed and scale at the lowest cost. Their implementation was summarized and published in Jeff Barr’s blog post here.

The AWS solution

The architecture diagram below shows how CSIRO used AWS Lambda, along with Amazon DynamoDB, Amazon API Gateway, Amazon Simple Storage Service (Amazon S3), and Amazon Simple Notification Service (SNS), to build an architecture for its complex research workflow.

GT-scan2 architecture with the more performant tuscan Regressor Lambda

Fine-tuning performance

As GT-Scan2 became available to the bioinformatics research community, the team wanted to make sure it’s performance was as fast as possible. A review of individual Amazon CloudWatch logs for each of the running services in their workflow did not produce any actionable information to be able to optimize for speed. When the AWS X-Ray log aggregator and visualizer service became available, they decided to see whether it could help them better understand the overhead of each service in GT-Scan2.

Screenshot of the X-Ray dashboard listing the average runtimes.

Identifying the bottleneck

Using the X-Ray service map visualization view shown above, they were able to spot the Lambda function that was causing the bottleneck in their application. At an average of 42 seconds per execution, this Lambda function was running significantly slower than the rest of the workflow. Using the X-Ray drilldown functionality, they examined the CloudWatch logs associated with this particular Lambda function, and replaced it with one that did not rely on a slow external dependency. They also managed to extend the functionality to cover another slow-running Lambda function.

The team plotted the average execution time for each of the running Lambda instances as observed in the X-Ray dashboard, which informed the replacement of two Lambda functions (WuCRISPR and sgRNAScorer) with a more performant alternative (tuscan Regressor).

The Result

The team updated the Lambda application code and redeployed the updated version. This resulted in an improved average execution time for each GT-Scan2 search of 80% over the previous version. After the Lambda rewrite, the team used the X-Ray service map to trace and verify that the changes they made had resulted in a consistent performance improvement. In addition, they added custom X-Ray annotations to understand application flow and component execution, such as file downloads, with even more granularity. See the output from one trace, including a custom annotation (write_job_to_db), below.

As the AWS Cloud increasingly offers more options for cloud architectures, organizations should be encouraged to test tools like AWS Lambda for application monitoring during deployments to better understand application performance.

Acknowledgment: Blog written with input from Denis Bauer and Aidan O’Brien (CSIRO) as well as Lynn Langit.

AWS Public Sector Blog