Skip to main content

Overview

This Guidance shows how you can run Galaxy software on AWS, allowing you to benefit from the ease-of-use of the Galaxy platform while using purpose-built services from AWS for the needed undifferentiated heavy-lifting, without compromising your security or data integrity. Galaxy is an open-source web application where you can run data intensive jobs for biomedical research through a graphical web interface. With the AWS native services for data storage and compute, this Guidance shows how you can optimize the end-to-end platform of Galaxy when uploading, managing, and analyzing large datasets.

How it works

These technical details feature an architecture diagram to illustrate how to effectively use this solution. The architecture diagram shows the key components and their interactions, providing an overview of the architecture's structure and functionality step-by-step.

Well-Architected Pillars

The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.

This Guidance uses services that allow you full visibility into your workloads through monitoring and logging, while also providing you with reliable, stable, and dependable applications. For example, with CloudWatch, you gain observability with metrics, personalized dashboards, and logs, in addition to alerts that are defined from metrics throughout this Guidance, so you can monitor the health of your workloads and minimize the impact from incidents. Also, Amazon EKS clusters can identify unhealthy containers and replace them automatically with new containers, so that your workloads are available to respond to incidents and events.

Read the Operational Excellence whitepaper 

By default, all incoming connections to Galaxy originate from the public Internet and are directed to the Galaxy server through a publicly accessible Application Load Balancer. Alternatively, this Guidance can be configured to use an internal Application Load Balancer in a private subnet, where traffic is routed through a virtual private network (VPN) connection or through AWS Direct Connect. In both cases, compute resources are deployed within private subnets and are not directly accessible from the public Internet. Galaxy handles application-level authentication and authorization through its own user management or through Active Directory Federation Service (AD FS).

Read the Security whitepaper 

To implement a reliable application-level architecture, the individual components of this Guidance are deployed as loosely coupled Kubernetes pods. Also, the message broker is the fully managed service Amazon MQ, which, in our default configuration, includes a standby server. Finally, the shared filesystem is provided through Amazon EFS and is highly available, as is the database provided through Aurora Serverless.

Read the Reliability whitepaper 

Amazon EKS is an AWS native service, and this Guidance focuses on cost-efficient ways to deploy and configure it with selected resources so that you can achieve a reliable Kubernetes application with high availability and low operational costs. The architecture for Amazon EKS is spanned across multiple Availability Zones for high availability. While some traffic will exist between subnets deployed into Availability Zones, its latency should not make any significant performance impact.

Amazon EFS is designed to provide serverless, fully elastic file storage that allows you to share file data without the need to provision or manage storage capacity and performance. It provides a Portable Operating System Interface (POSIX) file system with the necessary performance for bioinformatic workloads.

Read the Performance Efficiency whitepaper 

A significant factor for data transfer costs within Amazon EKS clusters are calls to Kubernetes services from external clients going through Application Load Balancers. The data transfer costs when calling services are mapped to communications between pods running in different Availability Zones.

Due to the highly configurable autoscaling minimum, maximum, and desired number of compute nodes, along with their corresponding Amazon Elastic Compute Cloud (Amazon EC2) parameters, resources are efficiently managed.

Finally, serverless architectures have a pay-per-value pricing model and scale based on demand. This includes the Aurora Serverless database and Amazon EFS. We recommend you tag AWS resources that belong to a project programmatically, and then create custom reports in AWS Cost Explorer using the tags to visualize and monitor costs.

Read the Cost Optimization whitepaper 

By choosing the right sized instances, you use only the resources you need, thereby reducing unnecessary emissions. Also, by using services with dynamic scaling, you minimize the environmental impact of the backend services, and ensure scaling of compute resources based on your website needs. Additionally, the use of fully managed services, such as Amazon EFS, minimizes the required resources.

Read the Sustainability whitepaper 

Deploy with confidence

Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs. 

Go to sample code

Disclaimer

The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.