AWS Architecture Blog
Tag: Genomics
Genomics workflows, Part 7: analyze public RNA sequencing data using AWS HealthOmics
Genomics workflows process petabyte-scale datasets on large pools of compute resources. In this blog post, we discuss how life science organizations can use Amazon Web Services (AWS) to run transcriptomic sequencing data analysis using public datasets. This allows users to quickly test research hypotheses against larger datasets in support of clinical diagnostics. We use AWS […]
Genomics workflows, Part 6: cost prediction
Genomics workflows run on large pools of compute resources and take petabyte-scale datasets as inputs. Workflow runs can cost as much as hundreds of thousands of US dollars. Given this large scale, scientists want to estimate the projected cost of their genomics workflow runs before deciding to launch them. In Part 6 of this series, […]
Genomics workflows, Part 5: automated benchmarking
Launching and running genomics workflows can take hours and involves large pools of compute instances that process data at a petabyte scale. Benchmarking helps you evaluate workflow performance and discover faster and cheaper ways of running them. In practice, performance evaluations happen irregularly because of the associated heavy lifting. In this blog post, we discuss […]
Genomics workflows, Part 4: processing archival data
Genomics workflows analyze data at petabyte scale. After processing is complete, data is often archived in cold storage classes. In some cases, like studies on the association of DNA variants against larger datasets, archived data is needed for further processing. This means manually initiating the restoration of each archived object and monitoring the progress. Scientists […]
Genomics workflows, Part 3: automated workflow manager
Genomics workflows are high-performance computing workloads. Life-science research teams make use of various genomics workflows. With each invocation, they specify custom sets of data and processing steps, and translate them into commands. Furthermore, team members stay to monitor progress and troubleshoot errors, which can be cumbersome, non-differentiated, administrative work. In Part 3 of this series, […]
Genomics workflows, Part 2: simplify Snakemake launches
Genomics workflows are high-performance computing workloads. In Part 1 of this series, we demonstrated how life-science research teams can focus on scientific discovery without the associated heavy lifting. We used regenie for large genome-wide association studies. Our design pattern built on AWS Step Functions with AWS Batch and Amazon FSx for Lustre. In Part 2, […]
Genomics workflows, Part 1: automated launches
Genomics workflows are high-performance computing workloads. Traditionally, they run on-premises with a collection of scripts. Scientists run and manage these workflows manually, which slows down the product development lifecycle. Scientists spend time to administer workflows and handle errors on a day-to-day basis. They also lack sufficient compute capacity on-premises. In Part 1 of this series, […]
Architecture Monthly Magazine: Genomics
The field of genomics has made huge strides in the last 20 years. Genomics organizations and researchers are rising to the many challenges we face today, and seeking improved methods for future needs. Amazon Web Services (AWS) provides an array of services that can help the genomics industry with securely handling and interpreting genomics data, […]