The automated lab creation got you up and running quickly, but behind the scenes you will uncover a lot of flexibility to optimize the environment for different use cases. In this section, you’ll learn how to take advantage of this flexibility and apply that to update your cluster dynamically.

Topics Covered
  • Software and application installation
  • How to choose the right EC2 instance types for your HPC workload
  • Understanding and controlling Auto Scaling

Software applications can be installed by logging onto the head node via SSH. CfnCluster offers a variety of software packages that support HPC applications including openMPI, languages and compilers. There are multiple ways you can install your applications, depending on the application characteristics and your preference as an administrator. A common method can be installing them under /efs/apps or /shared folders, which are mounted as NFS shares on all nodes.

Once your applications are installed (and licensed), you can use the EnginFrame Service Editor to publish them to the user community, as you did in the first chapter on this learning path.


AWS offers a wide variety of EC2 instance family types, generations and sizing to service very different workload types, with on-demand pricing that varies from a few cents to several US$ per hour.

To choose an instance type, start with the specific needs of the application. Applications vary in their requirements for: number of compute cores, processor speed, memory requirements, storage needs, networking specifications, and cost.

Families are classified according to the processor type, amount of memory, storage, and network connectivity available. The c family or the “compute” family is most often recommended for HPC workloads. Instance types within the family usually have, approximately, the same memory to vCPU ratio. A vCPU is a hyper-threaded processor. Typically, two hyper-threaded cores perform like one physical core. Within each family there can be multiple generations. For example, the c family of instances types includes the c3 and c4 instances. The added number indicates the generation of the instance type.

The c4.8xlarge (haswell) instance is very popular for parallel HPC applications. It has ~60GiB of memory and 18 cores. Each family comes in multiple sizes. For example, a compute instance of half the size of the c4.8xlarge is the c4.4xlarge. The hourly price is also approximately half.

An instance can be stopped at any time and restarted with a different instance type. This ability makes it easy to choose the optimum instance type for HPC workloads.

Other popular instance types for HPC workloads are detailed in the chart below:

Instance Type

vCPU

Memory

(GiB)

 Storage

(GB)

Networking Performance

Physical Processor

Clock Speed (GHz)

EBS

Opt

c4.8xlarge

36

60

EBS Only

10 Gigabit

Intel Xeon E5-2666 v3

2.9

Yes

c3.8xlarge

32

60

2 x 320 SSD

10 Gigabit

Intel Xeon E5-2680 v2

2.8

No

m4.10xlarge

40

160

EBS Only

10 Gigabit

Intel Xeon E5-2676 v3

2.4

Yes

m4.16xlarge

64

256

EBS Only

20 Gigabit

Intel Xeon E5-2686 v4

2.3

Yes

p2.16xlarge

64

732

EBS Only

20 Gigabit

Intel Xeon E5-2686 v4

2.3

Yes

x1.32xlarge

128

1,952

2 x 1,920 SSD

20 Gigabit

Intel Xeon E7-8880 v3

2.3

Yes

r3.8xlarge

32

244

2 x 320 SSD

10 Gigabit

Intel Xeon E5-2670 v2

2.5

No

To see a list of all available instance types, see Amazon EC2 instance types. Once you identify the right instance type for tour workload, check the difference between its on-demand and current spot pricing. If the kind of workload you have can easily cope with the potential reclaim of instances, using Spot can usually be a very effective way to reduce the HPC budget.

If you want to modify the characteristics of your cluster, an easy way to do that dynamically is using the “Update Stack” feature offered in the CloudFormation console. Be aware that some of the changes might temporarily disrupt the functionality of the cluster. Unless you are comfortable with the consequences, make sure to reconfigure your stack parameters only when there is no workload on the cluster.

Try this feature on the DefaultCluster Stack by following the instructions below:

  • Instructions: Changing the instance type on your current cluster

    1. Open the CloudFormation Console by clicking here.
    2. Right click on EnginFrame-DefaultCluster-##yourValues##. Select Update Stack.
    3. Select Use current template and select Next.
      • Note: By selecting Use current template, the CloudFormation wizard will offer you the ability to modify a larger number of parameters compared to the main stack, giving you access to the broader flexibility of CfnCluster.
    4. Make the following edits to the configuration:
      • compute_instance_type: Select the instance type of your choosing to update your compute nodes.
      • cluster_type: Edit the selection to spot. Then, set a spot_price to a value you are comfortable bidding for based on the current market price.
    5. Select Next.
    6. Proceed through the next steps of the wizard until you are presented with a preview of the changes, and confirm by selecting Update to update your stack. Depending on the breadth of changes made, the update can take several minutes to propagate.

Auto Scaling is a feature of Amazon EC2 that helps you maintain application availability and allows you to scale your EC2 capacity up or down automatically according to conditions you define. You can use Auto Scaling to help ensure that you are running your desired number of Amazon EC2 instances. 

The cluster size in this lab is continually monitored and changed by the EC2 Auto Scaling feature. If you know the optimal cluster size for your workload (for example, with a multi-node MPI job), it may be faster and more efficient to pre-provision the right number of nodes instead of letting the cluster grow incrementally.

You can set up your Auto Scaling cluster size by following the instructions below:

  • Instructions: Changing the Auto Scaling parameters on your cluster

    1. Open the Amazon EC2 Auto Scaling Group console by clicking here.
    2. You will see that Default Cluster created one Auto Scaling group for the Compute Fleet. The console provides you information about the current number of Instances in the group, the minimum and maximum limits, and the desired number of instances. 
    3. The desired number of instances is automatically influenced by the Scaling Policies active on the cluster, but you can also change its value manually. To change the Desired size or other Auto Scaling parameters, right click on the Auto Scaling group and select Edit.
    4. The bottom panel updates with Auto Scaling details. You can edit the number of Load Balancers, Desired instances, the minimum and maximum number of instances (nodes), and more. When complete, select Save.
      • Note: If the desired number of instances is different from the current number of instances, CloudFormation will create or remove instances to reach the desired number.
    HPC-opt

    (click to enlarge)

    HPC-opt