AWS Architecture Blog
Empower your teams with modern architecture governance
Modern cloud architectures demand a new governance approach. In this post, we show how collaborative architecture oversight can transform team performance through automation, self-service platforms, and distributed decision-making.
Genomics workflows, Part 7: analyze public RNA sequencing data using AWS HealthOmics
Genomics workflows process petabyte-scale datasets on large pools of compute resources. In this blog post, we discuss how life science organizations can use Amazon Web Services (AWS) to run transcriptomic sequencing data analysis using public datasets. This allows users to quickly test research hypotheses against larger datasets in support of clinical diagnostics. We use AWS […]
Genomics workflows, Part 6: cost prediction
Genomics workflows run on large pools of compute resources and take petabyte-scale datasets as inputs. Workflow runs can cost as much as hundreds of thousands of US dollars. Given this large scale, scientists want to estimate the projected cost of their genomics workflow runs before deciding to launch them. In Part 6 of this series, […]
Simplify document search at scale with intelligent search bot on AWS
Enterprise document management systems (EDMS) manage the lifecycle and distribution of documents. They often rely on keyword-based search functionality. However, it increasingly becomes hard to discover documents as such repositories grow to tens of thousands of items. In this blog, we discuss how Amazon Web Services (AWS) built an intelligent search bot on top of […]
Reduce archive cost with serverless data archiving
For regulatory reasons, decommissioning core business systems in financial services and insurance (FSI) markets requires data to remain accessible years after the application is retired. Traditionally, FSI companies either outsourced data archiving to third-party service providers, which maintained application replicas, or purchased vendor software to query and visualize archival data. In this blog post, we […]
Genomics workflows, Part 5: automated benchmarking
Launching and running genomics workflows can take hours and involves large pools of compute instances that process data at a petabyte scale. Benchmarking helps you evaluate workflow performance and discover faster and cheaper ways of running them. In practice, performance evaluations happen irregularly because of the associated heavy lifting. In this blog post, we discuss […]
Genomics workflows, Part 4: processing archival data
Genomics workflows analyze data at petabyte scale. After processing is complete, data is often archived in cold storage classes. In some cases, like studies on the association of DNA variants against larger datasets, archived data is needed for further processing. This means manually initiating the restoration of each archived object and monitoring the progress. Scientists […]
Genomics workflows, Part 3: automated workflow manager
Genomics workflows are high-performance computing workloads. Life-science research teams make use of various genomics workflows. With each invocation, they specify custom sets of data and processing steps, and translate them into commands. Furthermore, team members stay to monitor progress and troubleshoot errors, which can be cumbersome, non-differentiated, administrative work. In Part 3 of this series, […]
Genomics workflows, Part 2: simplify Snakemake launches
Genomics workflows are high-performance computing workloads. In Part 1 of this series, we demonstrated how life-science research teams can focus on scientific discovery without the associated heavy lifting. We used regenie for large genome-wide association studies. Our design pattern built on AWS Step Functions with AWS Batch and Amazon FSx for Lustre. In Part 2, […]
Genomics workflows, Part 1: automated launches
Genomics workflows are high-performance computing workloads. Traditionally, they run on-premises with a collection of scripts. Scientists run and manage these workflows manually, which slows down the product development lifecycle. Scientists spend time to administer workflows and handle errors on a day-to-day basis. They also lack sufficient compute capacity on-premises. In Part 1 of this series, […]









