AWS HPC Blog

Building a secure and compliant HPC environment on AWS following NIST SP 800-223

Building a secure and compliant HPC environment on AWS following NIST SP 800-223High performance computing (HPC) systems are essential for scientific research, engineering simulations, and data-intensive workloads that require immense computational power. However, securing these powerful systems poses unique challenges due to their complex architectures, strict performance requirements, and sometimes the pressure to balance security with computational efficiency.

The National Institute of Standards and Technology (NIST) Special Publication 800-223 (NIST SP 800-223), titled High-Performance Computing Security: Architecture, Threat Analysis, and Security Posture, provides a comprehensive guide for addressing the security challenges of HPC systems. This publication outlines a reference architecture, threat analysis, security postures, and recommendations for securing HPC environments.

In this post, we explore how to use AWS to build a secure and compliant HPC environment following the guidelines set out in NIST SP 800-223. We’ll walk through the key components, security considerations, and steps involved in deploying a zone-based HPC architecture on AWS.

The NIST SP 800-223 reference architecture

NIST SP 800-223 introduces a zone-based reference architecture for HPC systems, dividing it into four functional zones:

  1. Access zone – The entry point for users and external connections.
  2. Management zone – Responsible for system administration, monitoring, and control.
  3. High performance computing zone – Where computational resources (compute nodes) reside.
  4. Data storage zone – Handles data storage, movement, and archival for HPC workloads.

This architecture in Figure 1 serves as a foundation for threat analysis, security posture discussions, and the implementation of appropriate security controls. The NIST SP 800-223 solution included in the deployment section scales to highly-available, multi-Availability Zone architectures by horizontally scaling VPC subnets.

Figure 1 - The 4-zone segmentation architecture for HPC following NIST SP 800-223.

Figure 1 – The 4-zone segmentation architecture for HPC following NIST SP 800-223.

Key security characteristics and requirements

AWS shared responsibility model

When deploying an HPC environment on AWS, it’s crucial to understand the AWS shared responsibility model, which outlines the security responsibilities shared between AWS and the customer. This model helps ensure that security measures are adequately implemented and maintained across your cloud environments.

In the context of the secure HPC environment discussed in this blog, the shared responsibility model applies as follows:

  • AWS is responsible for the security of the underlying infrastructure, including the physical data centers, hardware, virtualization layers, and the managed services used in the solution, such as Amazon VPC, AWS KMS, and AWS CloudTrail.
  • The customer is responsible for securing the components deployed within your AWS environment, including Amazon EC2 instances, Amazon EFS and Amazon FSx for Lustre file systems, Amazon S3 buckets, and the AWS ParallelCluster configuration. This involves validating or implementing appropriate security controls, such as IAM policies, security groups, encryption, and monitoring.
  • Additionally, the customer is responsible for ensuring that the applications, data, and workloads running on the HPC cluster are secure and compliant with any relevant industry standards or regulations.

By adhering to the model, you can leverage the security capabilities provided by AWS while maintaining the necessary controls and practices to secure your HPC environment effectively. This collaborative approach ensures that the security of your HPC workloads is comprehensively addressed, reducing the risk of potential threats and vulnerabilities.

Security of HPC systems

Before delving into the implementation details it’s essential to understand the unique security characteristics and requirements of HPC systems, as outlined in NIST SP 800-223.

HPC users prioritize performance, so security mechanisms must have a tolerable performance impact. Different HPC applications and data may also have significantly different security sensitivities and policies. Often, we need quite granular access control to segment data access between different research groups.

In addition, open source and self-developed software is wide spread in the HPC community. These types of software could introduce supply chain risks and potential quality issues, but more often open-source tools are the bedrock of massive efforts in the community to get the most performance out of available hardware.

AWS offers robust security solutions that address these challenges without compromising performance. For strict segmentation between clusters, the “Securing HPC on AWS – isolated clusters” blog provides valuable insights and examples of how customers can achieve the strictest segmentation and isolation between clusters. Customers can also reference “The plumbing: best-practice infrastructure to facilitate HPC on AWS” blog which discusses the Landing Zone Accelerator on AWS, an open-source solution that helps satisfy and document compliance requirements for programs like HIPAA, FISMA, FedRAMP, NIST 800-53, NIST 800-171, and CMMC.

By running your HPC on AWS and following NIST SP 800-223 guidelines, organizations can build HPC environments that are both secure and high-performing, protecting computational resources, data, and intellectual property without sacrificing the speed and efficiency demanded by HPC workloads.

Building a secure HPC environment on AWS

Access zone

The access zone serves as the entry point for users and external connections to the HPC environment. To secure this zone on AWS, consider the following steps:

  1. Use Amazon Virtual Private Cloud (VPC) to isolate the access zone from the internet and other AWS resources.
  2. Implement AWS WAF and AWS Shield, if deploying web-based front-ends, to protect against web application vulnerabilities and distributed denial-of-service (DDoS) attacks.
  3. Use AWS Secrets Manager for secure storage and rotation of authentication credentials.
  4. Implement AWS Identity and Access Management (IAM) or Active Directory for granular user access control and multi-factor authentication (MFA).

Management zone

The management zone is responsible for system administration, monitoring, and control of the HPC environment. To secure this zone on AWS, follow these steps:

  1. Use IAM and AWS Security Token Service (AWS STS) for secure access and privilege management.
  2. Implement AWS Config for continuous monitoring and compliance checking of AWS resource configurations.
  3. Use AWS CloudTrail for auditing and monitoring of management activities.
  4. Use AWS Systems Manager for secure patch management and configuration automation.

High performance computing zone

The high performance computing zone houses the computational resources (compute nodes) responsible for running HPC workloads. To secure this zone on AWS, consider the following steps:

  1. Use AWS Nitro Enclaves or AWS Graviton instances for hardware-based isolation and protection against side-channel attacks
  2. Implement Amazon CloudWatch for monitoring and alerting on resource consumption and performance metrics.
  3. Use AWS Batch or AWS ParallelCluster for secure and scalable job scheduling and management.

Data storage zone

The data storage zone handles data storage, movement, and archiving for HPC workloads. To secure this zone on AWS, follow these steps:

  1. Use Amazon Elastic File System (Amazon EFS) or Amazon FSx for Lustre for high performance, scalable, and secure parallel file systems.
  2. Use Amazon Simple Storage Service (Amazon S3) for pre-loading of raw campaign data or for long-term archival of data.
  3. Implement AWS Key Management Service (AWS KMS) for encryption of data at rest and in transit.
  4. Use AWS Backup for secure and centralized data backup and recovery solutions.

General recommendations

In addition to the zone-specific security measures, NIST SP 800-223 provides general recommendations for securing HPC environments.

We recommend implementing AWS Config rules and AWS Security Hub for continuous compliance monitoring and security best practice checks. Learn more about AWS Config Rules and AWS Security Hub. We always propose you use IAM for least privilege access and role-based access control.

You can use AWS CloudTrail for auditing and logging of API calls and activities across all AWS services., and implement VPC Flow Logs and AWS Network Firewall for network traffic monitoring and control.

Finally, use AWS Artifact to access AWS security and compliance reports and online agreements.

Solution overview

Our provided solution consists of six main components, which we’ve provided as AWS CloudFormation Templates, and deployed in the HPC Recipes Library on GitHub. If you’re not familiar with the Recipe Library, you can read our post announcing it in 2023.

Network template: This template creates a secure, multi-zone HPC Virtual Private Cloud (VPC) environment following NIST guidelines. The VPC is divided into four security zones (Access, Management, Storage, and Compute) across multiple Availability Zones, with carefully configured networking components including Internet and NAT Gateways, VPC Endpoints, and security groups.

Security template: This template creates security groups and their associated ingress rules for a HPC environment, specifically for Login, Head, and Compute nodes within a previously created VPC. It establishes controlled SSH access to the Login and Head nodes, allows communication between nodes for Slurm (a workload manager), and enables management access from a designated Management Zone, all while leveraging security group IDs and CIDR ranges imported from a separate networking stack.

Storage template (optional): This template creates a comprehensive storage infrastructure for a HPC environment, including S3 buckets for campaign and archival storage, FSx for Lustre file system, and Amazon EFS. It sets up necessary security groups and ingress rules for these storage services, integrating with an existing network stack and allowing conditional deployment of each storage component based on user preferences. The template also includes options for data lifecycle management and encryption, ensuring a secure and flexible storage solution for HPC workloads.

Slurm Database template (optional): This template sets up a MySQL database in Amazon RDS for Slurm accounting in a HPC environment. It creates necessary security groups, configures a subnet group, generates a secure password using AWS Secrets Manager, and deploys an RDS instance with specified configurations and security measures. The template also provides options for data retention and exports important database information as stack outputs for use in other parts of the HPC infrastructure.

Active Directory template (optional): This template sets up an AWS Managed Microsoft Active Directory (MAD) for user management in a HPC environment. It generates a secure admin password using AWS Secrets Manager, deploys the Active Directory in specified subnets of an existing VPC, and provides options for resource retention. The template exports key information about the Active Directory, including the domain name, admin password secret ARN, and DNS IP addresses, for use in other parts of the HPC infrastructure.

ParallelCluster template (optional): This template deploys an AWS ParallelCluster environment for HPC. It configures a cluster with a head node and two compute queues (CPU and GPU), integrates with previously set up networking, security, storage, and Active Directory resources, and sets up shared storage using FSx for Lustre, EFS, and EBS. The template also includes monitoring and logging configurations, and allows for customization of various cluster parameters, providing a comprehensive and flexible HPC infrastructure on AWS.

By deploying these resources together, you can create a secure and scalable HPC environment that aligns with the NIST SP 800-223 recommendations.

Prerequisites

Before you begin, ensure that you have met some important prerequisites.

You’ll, of course, need an AWS account with appropriate permissions to create and manage the required resources.

It’s useful to have the AWS Command Line Interface (AWS CLI) installed and configured on your local machine. For guidance, see Install or update to the latest version of the AWS CLI.

Since you’re planning to deploy a cluster, it’s helpful to have the AWS ParallelCluster (pcluster) CLI installed and configured on your local machine, see Install the AWS ParallelCluster CLI.

Deployment

We’ve deployed the AWS CloudFormation Templates, along with a detailed implementation guide in the HPC Recipes Library. Follow the instructions linked below closely to rapidly deploy your environment.

https://github.com/aws-samples/aws-hpc-recipes/blob/main/recipes/pcluster/nist_800_223

Security considerations

The deployed HPC cluster adheres to the NIST SP 800-223 guidelines by implementing several, key security measures.

Network segmentation and access control is achieved by using the VPC, subnets, and security groups to provide network isolation and granular access control between the different functional zones (access zone, management zone, compute zone, and data storage zone).

Data protection is managed for sensitive data by encrypting it at rest and in transit using the AWS Key Management Service (AWS KMS). Storage resources like Amazon FSx for Lustre and Amazon Simple Storage Service (Amazon S3) buckets are configured with appropriate encryption and access controls.

We use IAM to manage user identities, roles, and permissions, enabling least-privilege access and role-based access control.

Monitoring and auditing is provided by AWS CloudTrail, AWS Config, and AWS Security Hub which we’ve configure for continuous compliance checking of the HPC environment.

Slurm accounting is supported by a dedicated Amazon Relational Database Service (Amazon RDS) for MySQL database which we’ve provisioned for Slurm accounting to ensure secure and centralized job tracking and resource management.

Conclusion

By following the NIST SP 800-223 recommendations and using the security services and features provided by AWS, organizations can build a secure and compliant HPC environment in the cloud while balancing performance requirements and security needs.

AWS offers a wide range of services and tools that you can tailor to meet your own unique security challenges for your HPC systems, enabling organizations to protect their computational resources, data, and intellectual property while maintaining the high performance demanded by your work.

To get started, check out the Recipes Library repo on GitHub. And reach out to us at ask-hpc@amazon.com if you want to engage an AWS solution architect from your account team to help you build a scalable, secure, and compliant environment for your needs.

Chris Riddle

Chris Riddle

Chris is a senior solutions architect at Amazon Web Services (AWS) and supports R1 university customers. With 20-plus years of experience in technology and a decade in higher education, Chris helps researchers leverage AWS for their artificial intelligence/machine learning (AI/ML) and high-performance computing (HPC) workloads. His customers design and implement scalable, secure, and cost-effective solutions that accelerate their research and innovation.