Guidance for Scale-Out Computing on AWS
Overview
How it works
This architecture diagram shows how to accelerate the product development process.
Deploy with confidence
Everything you need to launch this Guidance in your account is right here
We'll walk you through it
Dive deep into the implementation guide for additional customization options and service configurations to tailor to your specific needs.
Let's make it happen
Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
Operational Excellence
OpenSearch Service automatically ingests and retains critical cluster and job metadata, enabling long-term data analysis and business recommendations. Amazon CloudWatch monitors HPC and visualization node metrics in near real-time, empowering the detection of anomalies and optimization of system performance. Visualization of job information, including runtime, license utilization, pricing, and resource allocation, optimizes compute infrastructure.
Security
Scoped IAM policies help ensure minimum required permissions for a secure environment. Multiple Amazon EC2 security groups limit network traffic and enhance protection. Sensitive information, such as HTTPS certificates and directory service credentials, is securely stored in ACM and Secrets Manager, respectively. If single sign-on (SSO) is enabled, SAML authentication is offloaded to Amazon Cognito, providing a secure and scalable authentication solution.
Reliability
ELB distributes traffic across multiple Availability Zones, enhancing the reliability of HPC and virtual desktop infrastructure (VDI) workloads. Deployment of the virtual private clouds (VPCs) with multiple subnets provides high availability and access to Amazon EC2 capacity, mitigating the risk of capacity constraints that could impact tightly coupled jobs.
Performance Efficiency
Optimal AWS infrastructure, including compute, storage, and networking, accommodates the unique performance requirements of computer-aided engineering (CAE) simulations. Elastic Fabric Adapter (EFA) optimizes inter-node latency communication for large-scale HPC workloads. High-performance or parallel file systems, such as Amazon FSx for Lustre, handle I/O-intensive workloads. Leveraging the high-performance remote display protocol of Amazon DCV helps you optimize existing experience with graphically intensive workloads, such as CAD.
Cost Optimization
AWS Budgets provides guardrails to prevent over-provisioning of compute and storage resources beyond the allocated budget threshold. This service is tightly integrated with HPC job submission queues, so that allocated budget per queue or project cannot exceed customer-defined thresholds. AWS cost allocation tags provide administrators with visibility into current spend at the project, team, user, or service level to help ensure accurate accounting across AWS resources.
Sustainability
Amazon EFS automatically transitions infrequent access data to a lower storage tier, reducing your system footprint and associated costs. EC2 Auto Scaling Groups replace persistent EC2 instances, minimizing wasted compute. Additionally, the breadth of Amazon EC2 compute options allows you to optimize per application, further reducing your carbon footprint.
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages