In this Learning Path, you will go through the steps to launch a fully functional HPC environment, read about important concepts, review steps to optimize your environment, and terminate the environment.

This learning path consists of the following topics:

  • Launch a sample HPC environment
  • Security and authentication
  • Storage and networking
  • Cost implications
  • HPC Workload optimization
  • Resource Cleanup 

To complete this learning path, you will need:

  • AWS Account*

*Accounts that have been created within the last 24 hours might not yet have access to servers required for this learning path.

*Your use of AWS services are subject to your applicable agreement(s) with AWS, including the AWS Service Terms.

With the step-by-step directions below, you will set up a fully-working HPC environment accessible via web browser and use it to create a sample job submission services.

What you'll accomplish:

*Your use of EnginFrame is subject to the EnginFrame end user license agreement, which can be found here for US residents and here for non-US residents. Please note that the cluster you launch uses EnginFrame's 90-day evaluation license.

 

Time to complete: 90 Minutes

Cost to complete: It will cost you less than $0.75/hour to use the AWS resources created in the sample HPC environment you launch, if you follow the recommended configurations. If you use larger (or more) EC2 instances as part of your cluster or require large network traffic during testing, your costs will increase.
  • Step 1. Create an Amazon EC2 Key Pair

    You will need an Amazon EC2 key pair in the region you will install the HPC system. While the HPC environment will be accessible over HTTPS using a self-generated certificate, you must have an Amazon Elastic Compute Cloud (Amazon EC2) key pair to connect to the nodes in your cluster over a secure channel using the Secure Shell (SSH) protocol and to gain privileged sudo access on the nodes.
     
    If you already have an AWS account and an EC2 key pair, skip to Step 2.

    1. Open the Amazon EC2 console at https://console.aws.amazon.com/ec2/.
    2. In the navigation pane on the left, under NETWORK & SECURITY, select Key Pairs.
    3. Select Create Key Pair.
    4. In the Key pair name field of the Create Key Pair dialog box, enter a name for your key pair and select Create.
    5. The private key file is automatically downloaded by your browser. Save the private key file in a safe place.
      • Important: this is the only chance for you to save the private key file. You will need to provide the name of your key pair when you launch an instance and the corresponding private key each time you connect to the instance. For more details on creating a key pair, including from the command line, see: Creating a Key Pair.
  • Step 2. Create the HPC Environment

    You will create your HPC environment using an AWS CloudFormation template, which is an automated deployment template that provisions AWS services or applications (called “stacks”). You can deploy the CloudFormation template by following the instructions below.

    1. Choose the region you want to launch your HPC environment in and launch your environment by selecting a button below:

    1. In the first page of the CloudFormation template wizard, accept the defaults and select Next.
    2. You will specify the details of your HPC environment. Enter the following values below, then select Next.
    Field and Recommended Value Notes
    Stack Name: Enter a name for your stack. Throughout this tutorial, we will use the value EnginFrame.  
    Access From: 0.0.0.0/0 (Public Internet)

    The Public Internet is recommended for this tutorial. For added security, you can select the Private Network or Specific IP.

    Domain Admin Password:  Enter a password for domain access.  
    EnginFrame Admin Password: Enter a password for the EnginFrame admin user.  
    Key Pair Name: Select the key pair you created in Step 1.  
    Master Instance Type: m4.large

    The Master Instance type serves as the cluster's head node. For more information, see How to choose the right EC2 instance types for your HPC workload.

    Compute Instance Type: c4.2xlarge The Compute Instance type serves as the cluster's compute nodes. For more information, see How to choose the right EC2 instance types for your HPC workload.
    Initial Cluster Size: 2 This is the initial number of EC2 instances to launch as compute nodes within the cluster.
    Maximum Cluster Size: 4 This is the maximum number of EC2 compute nodes (excluding the head node) that can be launched in the cluster.
    Maintain initial size: false When this value is set to "true", the system maintains the initial cluster size even if there is no job running; otherwise, the cluster can shrink to only the head node, when no jobs are running or queued.
    Job Scheduler: slurm This is the resource manager and queuing system to use in the cluster.
    1. In the Options page, accept the defaults and select Next.
    2. In the Review page, review your configurations and check the checkbox under Capabilities. This authorizes AWS CloudFormation to deploy of all required resources for your HPC environment. Select Create.
    1. You are on a CloudFormation page that lists of all your stacks and their status. The CloudFormation template you launched created two stacks: EnginFrame and EnginFrame-DefaultCluster--##yourValues##.
      • EnginFrame is the main stack has the name you provided, and includes all the infrastructure components used by one or more HPC Clusters (which includes the Virtual Private Cloud, Elastic Load Balancer, Elastic File System and Directory Service).
      • EnginFrame-DefaultCluster-##yourValues## is connected to the main stack, which uses the AWS ParallelCluster template to create an elastic cluster with EnginFrame installed on the head node; you can have multiple clusters connected to the same main stack, sharing all the infrastructure above.

     

    Wait until your stacks change status from CREATE_IN_PROGRESS to CREATE_COMPLETE. This will take between 30 and 40 minutes. You may need to click the refresh button to see the progress of your deployment.

    HPC-1e

    (Click to enlarge)

    HPC-1e
  • Step 3. Connect to the HPC Environment

    In this step, you will connect to your newly created HPC environment with EnginFrame, a web-based HPC portal. You will also create job submission services that include an editor that administrators typically use to deliver easy-to-use HPC services to their users.


    1. Select the checkbox is next to EnginFrame, the main stack you created (not EnginFrame-DefaultCluster). Select the Outputs tab. Select the EnginFrameURL value.
      • Note: Because the stack creates a self-signed certificate for the Load Balancer, your browser will request you to accept a security warning.
    HPC-3a

    (Click to enlarge)

    HPC-3a
    1. You will be prompted to authenticate. For your usename, enter efadmin. Enter the password you provided as parameters to the CloudFormation template in Step 2. Upon successful login, you are connected to your HPC cluster via EnginFrame.
  • Step 4. Create a simple service for job submission

    In this step, you will use the EnginFrame service editor to create a simple service that allows a user to upload a file from his/her desktop and submit a job to compress it using Gzip.


    1. In the EnginFrame console navigation bar, select Admin’s Portal.
    HPC-4a

    (Click to enlarge)

    HPC-4a
    1. In the left navigation menu, select Manage > Services.
    2. Select New. Select Create empty Batch Service and confirm with the Create button.
    3. Under Properties, rename the title of the service to Gzip service.
    4. Under Options > File, select Single File Upload.
    5. Under Properties, rename the ID to INPUTFILE. Select Submit.
    1. Select the Job Script tab. Add the following line at the end of the sample script:

    gzip “${INPUTFILE}”  

    HPC-4g

    (Click to enlarge)

    HPC-4g
    1. Select close. In the Services panel, select Save.
    2. Test your run by clicking Test Run.
      • A new browser tab will open and you can check if the service behaves as expected.
      • For example, it should allow you to upload a file you select on your local machine. Upon selecting submit, the service will create a job that compresses your file when there is at least one compute node available.
    3. When you are done testing, close the newly created browser tab to get back to the editor. Select Close.
    4. You will now publish the service. On the Gzip service line, click on the space for Availability. Select Publish.
    5. Select the radio button next to All Users and select Publish. You will see Availability change to all-users in the Services panel.
    6. Select User’s Portal on the top bar. The new service is now available under Services > Gzip service and allows any user to upload and compress as many files as they need.
  • Step 5. Create your first MPI Service

    For your first MPI service, a job consisting of a parallel version of hello world will be created and submitted with EnginFrame.


    1. In the EnginFrame console navigation bar, select Admin’s Portal.
    2. In the left navigation menu, select Manage > Services.
    3. Select New. Select Create empty Batch Service and confirm with the Create button.
    4. Under Properties, rename the title of the service to MPI Hello
    5. In Options > Basic, select Text and enter the following values:
      • ID: NUMCORES
      • Label: Number of cores
      • Default Value: 8
    6. Select Submit.

     

    1. Select the Action Script tab. Change the last line by adding the following parameters:
    applications.submit --stdout output.txt --submitopts " ${NUMCORES}"
    HPC-5g

    (Click to enlarge)

    HPC-5g
    1. Select the Job Script tab and add the following lines at the end of the sample script:

    #$ -pe mpi ${NUMCORES}

    cat << EOF > mpi_hello_world.c

    /*A Parallel Hello World Program*/
    #include <stdio.h>

    #include <stdlib.h>

    #include </usr/include/openmpi-x86_64/mpi.h>

     

    main(int argc, char **argv)

    {

    int step, node;

    MPI_Init(&argc,&argv);

    MPI_Comm_rank(MPI_COMM_WORLD, &node);

    for (step = 1; step < 5; step++) {

    printf("Hello World from Step %d on Node %d (%s)\n", step, node, getenv("HOSTNAME"));

    }

     MPI_Finalize();

    }

    EOF

    /usr/lib64/openmpi/bin/mpicxx mpi_hello_world.c -o hw.x

    /usr/lib64/openmpi/bin/mpirun -np ${NUMCORES} ./hw.x  

    Select Close.

    HPC-5h

    (Click to enlarge)

    HPC-5h
    1. Select Save, then select Test Run. This will open a new browser tab where you can select the number of parallel processes and select Submit.
    2. Select the Refresh button to see the status of your job update.
    3. Select the output.txt file to display the consolidated outputs from all parallel processes.
    4. Return to your previous browser tab. Select Close.
    5. Highlight MPI Hello and select the row. Select Publish.
    6. Select the radio button next to All Users and select Publish. You will see Availability change to all-users in the Services panel.

You have learned how to quickly create a simple HPC environment in AWS and two services that allows users to submit jobs on an elastic cluster. Next, you will review security and authentication concepts. You can also skip ahead to learn how to terminate your resources.