AWS Security Blog

How to Remove Single Points of Failure by Using a High-Availability Partition Group in Your AWS CloudHSM Environment

A hardware security module (HSM) is a hardware device designed with the security of your data and cryptographic key material in mind. It is tamper-resistant hardware that prevents unauthorized users from attempting to pry open the device, plug any extra devices in to access data or keys such as subtokens, or damage the outside housing. If any such interference occurs, the device wipes all information stored so that unauthorized parties do not gain access to your data or cryptographic key material. A high-availability (HA) setup could be beneficial because, with multiple HSMs kept in different data centers and all data synced between them, the loss of one HSM does not mean the loss of your data.

In this post, I will walk you through steps to remove single points of failure in your AWS CloudHSM environment by setting up an HA partition group. Single points of failure occur when a single CloudHSM device fails in a non-HA configuration, which can result in the permanent loss of keys and data. The HA partition group, however, allows for one or more CloudHSM devices to fail, while still keeping your environment operational.

Prerequisites

You will need a few things to build your HA partition group with CloudHSM:

  • 2 CloudHSM devices. AWS offers a free two-week trial. AWS will provision the trial for you and send you the CloudHSM information such as the Elastic Network Interface (ENI) and the private IP address assigned to the CloudHSM device so that you may begin testing. If you have used CloudHSM before, another trial cannot be provisioned, but you can set up production CloudHSM devices on your own. See Provisioning Your HSMs.
  • A client instance from which to access your CloudHSM devices. You can create this manually, or via an AWS CloudFormation template. You can connect to this instance in your public subnet, and then it can communicate with the CloudHSM devices in your private subnets.
  • An HA partition group, which ensures the syncing and load balancing of all CloudHSM devices you have created.

The CloudHSM setup process takes about 30 minutes from beginning to end. By the end of this how-to post, you should be able to set up multiple CloudHSM devices and an HA partition group in AWS with ease. Keep in mind that each production CloudHSM device you provision comes with an up-front fee of $5,000. You are not charged for any CloudHSM devices provisioned for a trial, unless you decide to move them to production when the trial ends.

If you decide to move your provisioned devices to your production environment, you will be billed $5,000 per device. If you decide to stop the trial so as not to be charged, you have up to 24 hours after the trial ends to let AWS Support know of your decision.

Solution overview

How HA works

HA is a feature of the Luna SA 7000 HSM hardware device AWS uses for its CloudHSM service. (Luna SA 7000 HSM is also known as the “SafeNet Network HSM” in more recent SafeNet documentation. Because AWS documentation refers to this hardware as “Luna SA 7000 HSM,” I will use this same product name in this post.) This feature allows more than one CloudHSM device to be placed as members in a load-balanced group setup. By having more than one device on which all cryptographic material is stored, you remove any single points of failure in your environment.

You access your CloudHSM devices in this HA partition group through one logical endpoint, which distributes traffic to the CloudHSM devices that are members of this group in a load-balanced fashion. Even though traffic is balanced between the HA partition group members, any new data or changes in data that occur on any CloudHSM device will be mirrored for continuity to the other members of the HA partition group. A single HA partition group is logically represented by a slot, which is physically composed of multiple partitions distributed across all HA nodes. Traffic is sent through the HA partition group, and then distributed to the partitions that are linked. All partitions are then synced so that data is persistent on each one identically.

The following diagram illustrates the HA partition group functionality.

Diagram of the HA partition group functionality

  1. Application servers send traffic to your HA partition group endpoint.
  2. The HA partition group takes all requests and distributes them evenly between the CloudHSM devices that are members of the HA partition group.
  3. Each CloudHSM device mirrors itself to each other member of the HA partition group to ensure data integrity and availability.

Automatic recovery

If you ever lose data, you want a hands-off, quick recovery. Before autoRecovery was introduced, you could take advantage of the redundancy and performance HA partition groups offer, but you were still required to manually intervene when a group member was lost.

HA partition group members may fail for a number of reasons, including:

  • Loss of power to a CloudHSM device.
  • Loss of network connectivity to a CloudHSM device. If network connectivity is lost, it will be seen as a failed device and recovery attempts will be made.

Recovery of partition group members will only work if the following are true:

  • HA autoRecovery is enabled.
  • There are at least two nodes (CloudHSM devices) in the HA partition group.
  • Connectivity is established at startup.
  • The recover retry limit is not reached (if reached or exceeded, the only option is manual recovery).

HA autoRecovery is not enabled by default and must be explicitly enabled by running the following command, which is found in Enabling Automatic Recovery.

>vtl haAdmin –autoRecovery –retry <count>

When enabling autoRecovery, set the –retry and –interval parameters. The –retry parameter can be a value between 0 and 500 (or -1 for infinite retries), and equals the number of times the CloudHSM device will attempt automatic recovery. The –interval parameter is in seconds and can be any value between 60 and 1200. This is the amount of time between automatic recovery tries that the CloudHSM will attempt.

Setting up two Production CloudHSM devices in AWS

Now that I have discussed how HA partition groups work and why they are useful, I will show how to set up your CloudHSM environment and the HA partition group itself. To create an HA partition group environment, you need a minimum of two CloudHSM devices. You can have as many as 16 CloudHSM devices associated with an HA partition group at any given time. These must be associated with the same account and region, but can be spread across multiple Availability Zones, which is the ideal setup for an HA partition group. Automatic recovery is great for larger HA partition groups because it allows the devices to quickly attempt recovery and resync data in the event of a failure, without requiring manual intervention.

Set up the CloudHSM environment

To set up the CloudHSM environment, you must have a few things already in place:

  • An Amazon VPC.
  • At least one public subnet and two private subnets.
  • An Amazon EC2 client instance (m1.small running Amazon Linux x86 64-bit) in the public subnet, with the SafeNet client software already installed. This instance uses the key pair that you specified during creation of the CloudFormation stack. You can find a ready-to-use Amazon Machine Image (AMI) in our Community AMIs. Simply log into the EC2 console, choose Launch Instance, click Community AMIs, and search for CloudHSM. Because we regularly release new AMIs when software updates, searching for CloudHSM will show all available AMIs for a region. Select the AMI with the most recent client version.
  • Two security groups, one for the client instance and one for the CloudHSM devices. The security group for the client instance, which resides in the public subnet, will allow SSH on port 22 from your local network. The security group for the CloudHSM devices, which resides in the private subnet, will allow SSH on port 22 and NTLS on port 1792 from your public subnet. These are both ingress rules (egress rules allow all traffic).
  • An Elastic IP address for the client instance.
  • An IAM role that delegates AWS resource access to CloudHSM. You can create this role in the IAM console:
    1. Click Roles and then click Create New Role.
    2. Type a name for the role and then click Next Step.
    3. Under AWS Service Roles, click Select next to AWS CloudHSM.
    4. In the Attach Policy step, select AWSCloudHSMRole as the policy. Click Next Step.
    5. Click Create Role.

We have a CloudFormation template available that will set up the CloudHSM environment for you:

  1. Go to the CloudFormation console.
  2. Choose Create Stack. Specify https://cloudhsm.s3.amazonaws.com/cloudhsm-quickstart.json as the Amazon S3 template URL.
  3. On the next two pages, specify parameters such as the Stack name, SSH Key Pair, Tags, and SNS Topic for alerts. You will find SNS Topic under the Advanced arrow. Then, click Create.

When the new stack is in the CREATION_COMPLETE state, you will have the IAM role to be used for provisioning your CloudHSM devices, the private and public subnets, your client instance with Elastic IP (EIP), and the security groups for both the CloudHSM devices and the client instance. The CloudHSM security group will already have its necessary rules in place to permit SSH and NTLS access from your public subnet; however, you still must add the rules to the client instance’s security group to permit SSH access from your allowed IPs. To do this:

  1. In the VPC console, make sure you select the same region as the region in which your HSM VPC resides.
  2. Select the security group in your HSM VPC that will be used for the client instance.
  3. Add an inbound rule that allows TCP traffic on port 22 (SSH) from your local network IP addresses.
  4. On the Inbound tab, from the Create a new rule list, select SSH, and enter the IP address range of the local network from which you will connect to your client instance.
  5. Click Add Rule, and then click Apply Rule Changes.

After adding the IP rules for SSH (port 22) to your client instance’s security group, test the connection by attempting to make a SSH connection locally to your client instance EIP. Make sure to write down all the subnet and role information, because you will need this later.

Create an SSH key pair

The SSH key pair that you will now create will be used by CloudHSM devices to authenticate the manager account when connecting from your client instance. The manager account is simply the user that is permitted to SSH to your CloudHSM devices. Before provisioning the CloudHSM devices, you create the SSH key pair so that you can provide the public key to the CloudHSM during setup. The private key remains on your client instance to complete the authentication process. You can generate the key pair on any computer, as long as you ensure the client instance has the private key copied to it. You can also create the key pair on Linux or Windows. I go over both processes in this section of this post.

In Linux, you will use the ssh-keygen command. By typing just this command into the terminal window, you will receive output similar to the following.

$ ssh-keygen
Generating public/private rsa key pair.
Enter file in which to save the key (/home/user/.ssh/id_rsa):
Enter passphrase (empty for no passphrase):
Enter same passphrase again:
Your identification has been saved in /home/user/.ssh/id_rsa.
Your public key has been saved in /home/user/.ssh/id_rsa.pub.
The key fingerprint is: df:c4:49:e9:fe:8e:7b:eb:28:d5:1f:72:82:fb:f2:69
The key's randomart image is:
+--[ RSA 2048]----+
|                 |
|             .   |
|            o    |
|           + .   |
|        S   *.   |
|         . =.o.o |
|          ..+ +..|
|          .o Eo .|
|           .OO=. |
+-----------------+
$

In Windows, use PuTTYgen to create your key pair:

  1. Start PuTTYgen. For Type of key to generate, select SSH-2 RSA.
  2. In the Number of bits in a generated key field, specify 2048.
  3. Click Generate.
  4. Move your mouse pointer around in the blank area of the Key section below the progress bar (to generate some randomness) until the progress bar is full.
  5. A private/public key pair has now been generated.
  6. In the Key comment field, type a name for the key pair that you will remember.
  7. Click Save public key and name your file.
  8. Click Save private key and name your file. It is imperative that you do not lose this key, so make sure to store it somewhere safe.
  9. Right-click the text field labeled Public key for pasting into OpenSSH authorized_keys file and choose Select All.
  10. Right-click again in the same text field and choose Copy.

The following screenshot shows what the PuTTYgen output will look like after you have created the key pair.

Screenshot of how the PuTTYgen output will appear

You must convert the keys created by PuTTYgen to OpenSSH format for use with other clients by using the following command.

ssh-keygen –i –f puttygen_key > openssh_key

The public key will be used to provision the CloudHSM device and the private key will be stored on the client instance to authenticate the SSH sessions. The public SSH key will look something like the following. If it does not, it is not in the correct format and must be converted using the preceding procedure.

ssh-rsa AAAAB3NzaC1yc2EAAAABIwAAAQEA6bUsFjDSFcPC/BZbIAv8cAR5syJMB GiEqzFOIEHbm0fPkkQ0U6KppzuXvVlc2u7w0mg
PMhnkEfV6j0YBITu0Rs8rNHZFJs CYXpdoPxMMgmCf/FaOiKrb7+1xk21q2VwZyj13GPUsCxQhRW7dNidaaYTf14sbd9A qMUH4UOUjs
27MhO37q8/WjV3wVWpFqexm3f4HPyMLAAEeExT7UziHyoMLJBHDKMN7 1Ok2kV24wwn+t9P/Va/6OR6LyCmyCrFyiNbbCDtQ9JvCj5
RVBla5q4uEkFRl0t6m9 XZg+qT67sDDoystq3XEfNUmDYDL4kq1xPM66KFk3OS5qeIN2kcSnQ==

Whether you are saving the private key on your local computer or moving it to the client instance, you must ensure that the file permissions are correct. You can do this by running the following commands (throughout this post, be sure to replace placeholder content with your own values). The first command sets the necessary permissions; the second command adds the private key to the authentication agent.

$ chmod 600 ~/.ssh/<private_key_file>
$ ssh-add ~/.ssh/<private_key_file>

Set up the AWS CLI tools

Now that you have your SSH key pair ready, you can set up the AWS CLI tools so that you may provision and manage your CloudHSM devices. If you used the CloudFormation template or the CloudHSM AMI to set up your client instance, you already have the CLI installed. You can check this by running at the command prompt: CloudHSM version. The resulting output should be “Version”: “3.0.5”. If you chose to use your own AMI and install the Luna SA software, you can install the CloudHSM CLI Tools with the following steps. The current version in use is 3.0.5.

$ wget https://s3.amazon.com/cloudhsm-software/CloudHsmCLI.egg
$ sudo easy_install-2.7 –s /usr/local/bin CloudHsmCLI.egg
$ cloudhsm version
{
      “Version”: “<version>”
}

You must also set up file and directory ownership for your user on the client instance and the Chrystoki.conf file. The Chrystoki.conf file is the configuration file for the CloudHSM device. By default, CloudHSM devices come ready from the factory for immediate housing of cryptographic keys and performing cryptographic processes on data, but must be configured to connect to your client instances:

  1. On the client instance, set the owner and write permission on the Chrystoki.conf file.
$ sudo chown <owner> /etc/Chrystoki.conf
$ sudo chmod +w /etc/Chrystoki.conf

The <owner> can be either the user or a group the user belongs to (for example, ec2-user).

  1. On the client instance set the owner of the Luna client directory:
$ sudo chown <owner> -R <luna_client_dir>

The <owner> should be the same as the <owner> of the Chrystoki.conf file. The <luna_client_dir> differs based on the version of the LunaSA client software installed. If these are new setups, use version 5.3 or newer; however, if you have older clients with version 5.1 installed, use version 5.1:

  • Client software version 5.3: /usr/safenet/lunaclient/
  • Client software version 5.1: /usr/lunasa/

You also must configure the AWS CLI tools with AWS credentials to use for the API calls. These can be set by config files, passing the credentials in the commands, or by instance profile. The most secure option, which eliminates the need to hard-code credentials in a config file, is to use an instance profile on your client instance. All CLI commands in this post are performed on a client instance launched with an IAM role that has CloudHSM permissions. If you want to set your credentials in a config file instead, you must remember that each CLI command should include –profile <profilename>, with <profilename> being the name you assigned in the config file for these credentials.See Configuring the AWS CloudHSM CLI Tools for help with setting up the AWS CLI tools.

You will then set up a persistent SSH tunnel for all CloudHSM devices to communicate with the client instance. This is done by editing the ~/.ssh/config file. Replace <CloudHSM_ip_address> with the private IP of your CloudHSM device, and replace <private_key_file> with the file location of your SSH private key created earlier (for example, /home/user/.ssh/id_rsa).

Host <CloudHSM_ip_address>
User manager
IdentityFile <private_key_file>

Also necessary for the client instance to authenticate with the CloudHSM partitions or partition group are client certificates. Depending on the LunaSA client software you are using, the location of these files can differ. Again, if these are new setups, use version 5.3 or newer; however, if you have older clients with version 5.1 installed, use version 5.1:

  • Linux clients
    • Client software version 5.3: /usr/safenet/lunaclient/cert
    • Client software version 5.1: /usr/lunasa/cert
  • Windows clients
    • Client software version 5.3: %ProgramFiles%SafeNetLunaClientcert
    • Client software version 5.1: %ProgramFiles%LunaSAcert

To create the client certificates, you can use the OpenSSL Toolkit or the LunaSA client-side vtl commands. The OpenSSL Toolkit is a program that allows you to manage TLS (Transport Layer Security) and SSL (Secure Sockets Layer) protocols. It is commonly used to create SSL certificates for secure communication between internal network devices. The LunaSA client-side vtl commands are installed on your client instance along with the Luna software. If you used either CloudFormation or the CloudHSM AMI, the vtl commands are already installed for you. If you chose to launch a different AMI, you can download the Luna software. After you download the software, run the command linux/64/install.sh as root on a Linux instance and install the Luna SA option. If you install the softtware on a Windows instance, run the command windows64LunaClient.msi to install the Luna SA option. I show certificate creation in both OpenSSL Toolkit and LunaSA in the following section.

OpenSSL Toolkit

      $ openssl genrsa –out <luna_client_cert_dir>/<client_name>Key.pem 2048
      $ openssl req –new –x509 –days 3650 –key <luna_client_cert_dir>/<client_cert>Key.pem –out <client_name>.pem

The <luna_client_cert_dir> is the LunaSA Client certificate directory on the client and the <client_name> can be whatever you choose.

LunaSA

      $ sudo vtl createCert –n <client_name>

The output of the preceding LunaSA command will be similar to the following.

Private Key created and written to:

<luna_client_cert_dir>/<client_name>Key.pem

Certificate created and written to:

<luna_client_cert_dir>/<client_name>.pem

You will need these key file locations later on, so make sure you write them down or save them to a file. One last thing to do at this point is create the client Amazon Resource Name (ARN), which you do by running the following command. Make note of the client ARN, because you will need this when registering your client instances with the HA partition group.

$ CloudHSM create-client –certificate-file <luna_client_cert_dir>/<client_name>.pem

{
      “ClientARN”: “<client_arn>”,
      “RequestId”: “<request_id>”
}

Also write down in a safe location the client ARN because you will need it when registering your client instances to the HA partition group.

Provision your CloudHSM devices

Now for the fun and expensive part. Always remember that for each CloudHSM device you provision to your production environment, there is an upfront fee of $5,000. Because you need more than one CloudHSM device to set up an HA partition group, provisioning two CloudHSM devices to production will cost an upfront fee of $10,000.

If this is your first time trying out CloudHSM, you can have a two-week trial provisioned for you at no cost. The only cost will occur if you decide to keep your CloudHSM devices and move them into production. If you are unsure of the usage in your company, I highly suggest doing a trial first. You can open a support case requesting a trial at any time. You must have a paid support plan to request a CloudHSM trial.

To provision the two CloudHSM devices, SSH into your client instance and run the following CLI command.

$ CloudHSM create-hsm 
--subnet-id <subnet_id> 
--ssh-public-key-file <public_key_file> 
--iam-role-arn <iam_role_arn> 
--syslog-ip <syslog_ip_address>

The response should resemble the following.

{
      “CloudHSMArn”: “<CloudHSM_arn>”,
      “RequestId”: “<request_id>”
}

Make note of each CloudHSM ARN because you will need them to initialize the CloudHSM devices and later add them to the HA partition group.

Initialize the CloudHSM devices

Configuring your CloudHSM devices, or initializing them as the process is formally called, is what allows you to set up the configuration files, certificate files, and passwords on the CloudHSM itself. Because you already have your CloudHSM ARNs from the previous section, you can run the describe-hsm command to get the EniId and the EniIp of the CloudHSM devices. Your results should be similar to the following.

$ aws CloudHSM describe-hsm --hsm-arn <CloudHSM_arn>
{    
     "EniId": "<eni_id>",   
     "EniIp": "<eni_ip>",    
     "CloudHSMArn": "<CloudHSM_arn>",    
     "IamRoleArn": "<iam_role_arn>",    
     "Partitions": [],    
     "RequestId": "<request_id>",    
     "SerialNumber": "<serial_number>",    
     "SoftwareVersion": "5.1.3-1",    
     "SshPublicKey": "<public_key_text>",    
     "Status": "<status>",    
     "SubnetId": "<subnet_id>",    
     "SubscriptionStartDate": "2014-02-05T22:59:38.294Z",    
     "SubscriptionType": "PRODUCTION",    
     "VendorName": "SafeNet Inc."
}

Now that you know the EniId of each CloudHSM, you need to apply the CloudHSM security group to them. This ensures that connection can occur from any instance with the client security group assigned. When a trial is provisioned for you, or you provision CloudHSM devices yourself, the default security group of the VPC is automatically assigned to the ENI.You must change this to the security group that permits ingress ports 22 and 1792 from your client instance.

To apply a CloudHSM security group to an EniId:

  1. Go to the EC2 console, and choose Network Interfaces in the left pane.
  2. Select the EniId of the CloudHSM.
  3. From the Actions drop-down list, choose Change Security Groups. Choose the security group for your CloudHSM device, and then click Save.

To complete the initialization process, you must ensure a persistent SSH connection is in place from your client to the CloudHSM. Remember the ~/.ssh/config file you edited earlier? Now that you have the IP address of the CloudHSM devices and the location of the private SSH key file, go back and fill in that config file’s parameters by using your favorite text editor.

Now, initialize using the initialize-hsm command with the information you gathered from the provisioning steps. The values in red in the following example are meant as placeholders for your own naming and password conventions, and should be replaced with your information during the initialization of the CloudHSM devices.

$CloudHSM initialize-hsm 
--hsm-arn <CloudHSM_arn> 
--label <label> 
--cloning-domain <cloning_domain> 
--so-password <so_password>

The <label> is a unique name for the CloudHSM device that should be easy to remember. You can also use this name as a descriptive label that tells what the CloudHSM device is for. The <cloning_domain> is a secret used to control cloning of key material from one CloudHSM to another. This can be any unique name that fits your company’s naming conventions. Examples could be exampleproduction or exampledevelopment. If you are going to set up an HA partition group environment, the <cloning_domain> must be the same across all CloudHSMs. The <so_password> is the security officer password for the CloudHSM device, and for ease of remembrance, it should be the same across all devices as well. It is important to use passwords and cloning domain names that you will remember, because they are unrecoverable and the loss of them means loss of all data on a CloudHSM device. For your use, we do supply a Password Worksheet if you want to write down your passwords and store the printed page in a secure place.

Configure the client instance

Configuring the client instance is important because it is the secure link between you, your applications, and the CloudHSM devices. The client instance opens a secure channel to the CloudHSM devices and sends all requests over this channel so that the CloudHSM device can perform the cryptographic operations and key storage. Because you already have launched the client instance and mostly configured it, the only step left is to create the Network Trust Link (NTL) between the client instance and the CloudHSM. For this, we will use the LunaSA vtl commands again.

  1. Copy the server certificate from the CloudHSM to the client.
$ scp –i ~/.ssh/<private_key_file> manager@<CloudHSM_ip_address>:server.pem
  1. Register the CloudHSM certificate with the client.
$ sudo vtl addServer –n <CloudHSM_ip_address> -c server.pem
New server <CloudHSM_ip_address> successfully added to server list.
  1. Copy the client certificate to the CloudHSM.
$ scp –i ~/.ssh/<private_key_file> <client_cert_directory>/<client_name>.pem manager@<CloudHSM_ip_address>:
  1. Connect to the CloudHSM.
$ ssh –i ~/.ssh/<private_key_file> manager@<CloudHSM_ip_address>
lunash:>
  1. Register the client.
lunash:> client register –client <client_id> -hostname <client_name>

The <client_id> and <client_name> should be the same for ease of use, and this should be the same as the name you used when you created your client certificate.

  1. On the CloudHSM, log in with the SO password.
lunash:> hsm login
  1. Create a partition on each CloudHSM (use the same name for ease of remembrance).
lunash:> partition create –partition <name> -password <partition_password> -domain <cloning_domain>

The <partition_password> does not have to be the same as the SO password, and for security purposes, it should be different.

  1. Assign the client to the partition.
lunash:> client assignPartition –client <client_id> -partition <partition_name>
  1. Verify that the partition assigning went correctly.
lunash:> client show –client <client_id>
  1. Log in to the client and verify it has been properly configured.
$ vtl verify
The following Luna SA Slots/Partitions were found:
Slot    Serial #         Label
====    =========        ============
1      <serial_num1>     <partition_name>
2      <serial_num2>     <partition_name>

You should see an entry for each partition created on each CloudHSM device. This step lets you know that the CloudHSM devices and client instance were properly configured.

The partitions created and assigned by the previous steps are for testing purposes only and will not be used in the HA partition group setup. The HA partition group workflow will automatically create a partition on each CloudHSM device for its purposes. At this point, you have created the client and at least two CloudHSM devices. You also have set up and tested for validity the connection between the client instance and the CloudHSM devices. The next step in to ensure fault tolerance by setting up the HA partition group.

Set up the HA partition group for fault tolerance

Now that you have provisioned multiple CloudHSM devices in your account, you will add them to an HA partition group. As I explained earlier in this post, an HA partition group is a virtual partition that represents a group of partitions distributed over many physical CloudHSM devices for HA. Automatic recovery is also a key factor in ensuring HA and data integrity across your HA partition group members. If you followed the previous procedures in this post, setting up the HA partition group should be relatively straightforward.

Create the HA partition group

First, you will create the actual HA partition group itself. Using the CloudHSM CLI on your client instance, run the following command to create the HA partition group and name it per your company’s naming conventions. In the following command, replace <label> with the name you chose.

$ CloudHSM create-hapg –group-label <label>

Register the CloudHSM devices with the HA partition group

Now, add the already initialized CloudHSM devices to the HA partition group. You will need to run the following command for each CloudHSM device you want to add to the HA partition group.

$ CloudHSM add-hsm-to-hapg 
--hsm-arn <CloudHSM_arn> 
--hapg-arn <hapg_arn> 
--cloning-domain <cloning_domain> 
--partition-password <partition_password> 
--so-password <so_password>

You should see output similar to the following after each successful addition to the HA partition group.

{
      “Status”: “Addition of CloudHSM <CloudHSM_arn> to HA partition group <hapg_arn> successful”
}

Register the client with the HA partition group

The last step is to register the client with the HA partition group. You will need the client ARN from earlier in the post, and you will use the CloudHSM CLI command register-client-to-hapg to complete this process.

$ CloudHSM register-client-to-hapg 
--client-arn <client_arn> 
--hapg-arn <hapg_arn>
{
      “Status”: “Registration of the client <client_arn> to the HA partition group <hapg_arn> successful”
}

After you register the client with the HA partition group, you have the client configuration file and the server certificates. You have already registered the client with the HA partition group, but now you have to actually assign it as well, which you do by using the get-client-configuration AWS CLI command.

$ CloudHSM get-client-configuration 
--client-arn <client_arn> 
--hapg-arn <hapg_arn> 
--cert-directory <server_cert_location> 
--config-directory /etc/

The configuration file has been copied to /etc/
The server certificate has been copied to <server_cert_location>

The <server_cert_location> will differ depending on the LunaSA client software you are using (I am including both the old and new versions):

  • Client software version 5.3: /usr/safenet/lunaclient/cert/server
  • Client software version 5.1: /usr/lunasa/cert/server

Lastly, to verify the client configuration, run the following LunaSA vtl command.

$ vtl haAdmin show

In the output, you will see a heading, HA Group and Member Information. Ensure that the number of group members equals the number of CloudHSM devices you added to the HA partition group. If the number does not match what you have provisioned, you might have missed a step in the provisioning process. Going back through the provisioning process usually repairs this. However, if you still encounter issues, opening a support case is the quickest way to get assistance.

Another way to verify the HA partition group setup is to check the /etc/Chrystoki.conf file for output similar to the following.

VirtualToken = {
   VirtualToken00Label = hapg1;
   VirtualToken00SN = 1529127380;
   VirtualToken00Members = 475256026,511541022;
}
HASynchronize = {
   hapg1 = 1;
}
HAConfiguration = {
   reconnAtt = -1;
   AutoReconnectInterval = 60;
   HAOnly = 1;

Summary

You have now completed the process of provisioning CloudHSM devices, the client instance for connection, and your HA partition group for fault tolerance. You can begin using an application of your choice to access the CloudHSM devices for key management and encryption. By accessing CloudHSM devices via the HA partition group, you ensure that all traffic is load balanced between all backing CloudHSM devices. The HA partition group will ensure that each CloudHSM has identical information so that it can respond to any request issued.

Now that you have an HA partition group set up with automatic recovery, if a CloudHSM device fails, the device will attempt to recover itself, and all traffic will be rerouted to the remaining CloudHSM devices in the HA partition group so as not to interrupt traffic. After recovery (manual or automatic), all data will be replicated across the CloudHSM devices in the HA partition group to ensure consistency.

If you have questions about any part of this blog post, please post them on the IAM forum.

– Tracy