AWS Architecture Blog

Rostislav Markov

Author: Rostislav Markov

Rostislav is principal architect with AWS Professional Services. As technical leader in AWS Industries, he works with AWS customers and partners on their cloud transformation programs. Outside of work, he enjoys spending time with his family outdoors, playing tennis, and skiing.

Genomics workflows, Part 7: analyze public RNA sequencing data using AWS HealthOmics

Genomics workflows process petabyte-scale datasets on large pools of compute resources. In this blog post, we discuss how life science organizations can use Amazon Web Services (AWS) to run transcriptomic sequencing data analysis using public datasets. This allows users to quickly test research hypotheses against larger datasets in support of clinical diagnostics. We use AWS […]

This visual summarizes the cost prediction and model training processes. Users request cost predictions for future workflow runs on a web frontend hosted in AWS Amplify. The frontend passes the requests to an Amazon API Gateway endpoint with Lambda integration. The Lambda function retrieves the suitable model endpoint from the DynamoDB table and invokes the model via the Amazon SageMaker API. Model training runs on a schedule and is orchestrated by an AWS Step Functions state machine. The state machine queries training datasets from the DynamoDB table. If the new model performs better, it is registered in the SageMaker model registry. Otherwise, the state machine sends a notification to an Amazon Simple Notification Service topic stating that there are no updates.

Genomics workflows, Part 6: cost prediction

Genomics workflows run on large pools of compute resources and take petabyte-scale datasets as inputs. Workflow runs can cost as much as hundreds of thousands of US dollars. Given this large scale, scientists want to estimate the projected cost of their genomics workflow runs before deciding to launch them. In Part 6 of this series, […]

Intelligent search bot for enterprise document management systems

Simplify document search at scale with intelligent search bot on AWS

Enterprise document management systems (EDMS) manage the lifecycle and distribution of documents. They often rely on keyword-based search functionality. However, it increasingly becomes hard to discover documents as such repositories grow to tens of thousands of items. In this blog, we discuss how Amazon Web Services (AWS) built an intelligent search bot on top of […]

Serverless data archiving and retrieval

Reduce archive cost with serverless data archiving

For regulatory reasons, decommissioning core business systems in financial services and insurance (FSI) markets requires data to remain accessible years after the application is retired. Traditionally, FSI companies either outsourced data archiving to third-party service providers, which maintained application replicas, or purchased vendor software to query and visualize archival data. In this blog post, we […]

Automated benchmarking of genomics workflows

Genomics workflows, Part 5: automated benchmarking

Launching and running genomics workflows can take hours and involves large pools of compute instances that process data at a petabyte scale. Benchmarking helps you evaluate workflow performance and discover faster and cheaper ways of running them. In practice, performance evaluations happen irregularly because of the associated heavy lifting. In this blog post, we discuss […]

Solution architecture for S3 Glacier object restore

Genomics workflows, Part 4: processing archival data

Genomics workflows analyze data at petabyte scale. After processing is complete, data is often archived in cold storage classes. In some cases, like studies on the association of DNA variants against larger datasets, archived data is needed for further processing. This means manually initiating the restoration of each archived object and monitoring the progress. Scientists […]

Workflow manager for genomics workflows

Genomics workflows, Part 3: automated workflow manager

Genomics workflows are high-performance computing workloads. Life-science research teams make use of various genomics workflows. With each invocation, they specify custom sets of data and processing steps, and translate them into commands. Furthermore, team members stay to monitor progress and troubleshoot errors, which can be cumbersome, non-differentiated, administrative work. In Part 3 of this series, […]

Solution architecture for Snakemake with Tibanna on AWS

Genomics workflows, Part 2: simplify Snakemake launches

Genomics workflows are high-performance computing workloads. In Part 1 of this series, we demonstrated how life-science research teams can focus on scientific discovery without the associated heavy lifting. We used regenie for large genome-wide association studies. Our design pattern built on AWS Step Functions with AWS Batch and Amazon FSx for Lustre. In Part 2, […]

Solution overview for automating regenie workflows on AWS

Genomics workflows, Part 1: automated launches

Genomics workflows are high-performance computing workloads. Traditionally, they run on-premises with a collection of scripts. Scientists run and manage these workflows manually, which slows down the product development lifecycle. Scientists spend time to administer workflows and handle errors on a day-to-day basis. They also lack sufficient compute capacity on-premises. In Part 1 of this series, […]