Understanding AWS CloudHSM Cluster Synchronization

AWS CloudHSM provides fully managed, single-tenant hardware security modules (HSMs) in the AWS cloud. A CloudHSM cluster contains either one or multiple HSMs. Multiple HSMs support higher throughput levels for cryptographic operations and provide redundancy. For clusters with multiple HSMs, the CloudHSM service supports server-side automated synchronization of keys and policies. Users, however, are synchronized from the client-side and the synchronization is driven by configuration files which must be refreshed when the cluster size changes. If you do not refresh the configuration files, your CloudHSM user configurations could become unsynchronized and affect the ability of your CloudHSM cluster to provide consistent support of cryptographic information.

In this blog post, I’ll provide a general overview of a CloudHSM architecture, discuss the cluster synchronization process, build a CloudHSM environment, show how the cluster users can become unsynchronized, and then restore user synchronization to bring your cluster back to a consistent state to meet your needs for consistency and redundancy.

CloudHSM Architectural Overview

When you provision an HSM instance in CloudHSM, the HSM instance provides an elastic network interface (ENI) in your Amazon VPC while the HSM itself resides in a separate VPC managed by AWS CloudHSM. Your applications use the CloudHSM cluster ID to add or remove HSMs from the cluster and the ENI(s) of the HSM instance(s) to access the HSM instances.

You configure your cluster and its HSM instances using CloudHSM client software you deploy on Amazon EC2 instances in your VPC. You only need one such EC2 instance to manage a CloudHSM cluster, but it’s common to deploy additional EC2 instances in other availability zones to provide for client redundancy. Your applications communicate with the HSM instances using the client daemon. You manage and configure the cluster with command line tools including cloudhsm_mgmt_util, key_mgmt_util, and configure. An example of a CloudHSM architecture appears below.

Diagram of a 3-Node CloudHSM architecture

Figure 1: A 3-Node CloudHSM architecture

The diagram shows a three-node CloudHSM cluster deployed in the us-west-2 (Oregon) region with three Amazon EC2 instances with the CloudHSM software. The client in Availability Zone 2 is communicating with the cluster through the elastic network interfaces in each availability zone.

CloudHSM Synchronization Process

Having discussed the architecture of AWS CloudHSM, let’s turn our attention to the matter of cluster synchronization. There are three events that require synchronization: cluster expansion, key management operations, and user management operations. Let’s look at each of these in more detail.

Cluster Expansion

When you add an HSM to an existing cluster, AWS CloudHSM clones all users, keys, and policies from another HSM in the cluster. No additional steps are required on your part.

Key Management Operations

Key management with the key_mgmt_util tool uses the CloudHSM client to communicate with the HSM cluster. Additionally, a fallback, HSM-based synchronization protocol keeps keys in sync.

User Management

You perform user management tasks, such as adding users or changing passwords, using the cloudhsm_mgmt_util tool. This tool communicates directly with the HSMs, bypassing the client daemon. cloudhsm_mgmt_util uses its own configuration files to determine the HSMs that it should connect to within the cluster. These configuration files aren’t updated dynamically when HSM instances are added. To prevent user synchronization errors, you must update the configuration files before running cloudhsm_mgmt_util. You must also not add new HSM instances to the cluster while you’re using the tool. This helps ensure that no HSM instances are accidentally left out of user updates that would in turn result in user synchronization problems.

Again, these safeguards are only necessary when using cloudhsm_mgmt_util. For all other applications and utilities using CloudHSM, the client daemon automatically reconfigures itself as you add and remove HSM instances from your cluster. In the remainder of this post, I will build a CloudHSM infrastructure as shown in the above diagram. I’ll then show you how users on your CloudHSM instances can become unsynchronized, and how to restore proper synchronization.

Prerequisites and Assumptions

You’ll need to have an AWS account that allows you to provision Amazon VPCs, Amazon EC2 instances, and CloudHSMs.
I’ll use the us-west-2 (Oregon) region, but you can use any region that offers CloudHSM.
You’ll need an Amazon EC2 key pair in the region.
You should have a working knowledge of the services I’ve mentioned.

Important: You’ll incur charges for the resources used in this example. You can find the cost of each service on that service’s pricing page.

Building a CloudHSM Infrastructure

Create an Amazon VPC with subnets in the us-west-2a, us-west-2b, and us-east-2c availability zones. I’ll use the Amazon VPC Architecture Quick Start, which is an AWS CloudFormation template that will do this on your behalf. Make sure you select the correct region after you load the Quick Start. Select the following parameters:

Parameter	Value
Availability Zones	us-west-2a, us-west-2b, us-west-2c
Number of Availability Zones	3
Create private subnets	False
Create additional private subnets with dedicated network ACLs	False
Key pair name	The name of your Amazon EC2 key pair

Accept the default values for all other parameters.

Follow these instructions to create a CloudHSM cluster in your new VPC in the us-west-2a, us-west-2b and us-west-2c availability zones. Note that the cluster will not have any HSMs after it’s created.
Follow these instructions to initialize the cluster with an HSM in the us-west-2a availability zone. After the cluster is initialized, note the ENI IP address from the cluster details section in the console as shown here:

Figure 2: Details of CloudHSM Cluster
Launch an Amazon Linux or Ubuntu EC2 instance. From the instance dashboard, note the public IP of the instance as shown below.

Figure 3: Client Instance Details
Install the client software on the EC2 instance you launched in step 4.
Add the IP of the EC2 instance that you identified in step 4 to the security group you identified in step 3.
Activate the cluster. The activation instructions will guide you through connecting to the EC2 instance you launched in step 4. Remain logged into the EC2 instance following the activation of the cluster for the steps below.
While you are still logged into the EC2 instance you just launched, follow the steps below to add a crypto user named example_user to the cluster:
1. Ensure the CloudHSM daemon is stopped:
  
  $ sudo stop cloudhsm-client
2. Configure the IP address of the initial HSM using the ENI IP address from step 3:
  
  $ sudo /opt/cloudhsm/bin/configure –a 10.0.129.209
  
  Note: the configure tool updates two configuration files: one for the CloudHSM client, and the other for the cloudhsm_mgmt_util program that is used to administer users.
3. Start the CloudHSM client:
  
  $ sudo start cloudhsm-client
4. Ensure the cloudhsm_mgmt_util configuration file is up to date. We need to do this to ensure cloudhsm_mgmt_util is aware of all the HSM instances in the cluster:
  
  $ sudo /opt/cloudhsm/bin/configure –m
5. Connect to the HSM instances, enable end-to-end encryption, and log in to the HSM instances. Enabling end-to-end encryption encrypts the communication between cloudhsm_mgmt_util and the HSM to prevent interception of sensitive information such as passwords:
  
  $ /opt/cloudhsm/bin/cloudhsm_mgmt_util /opt/cloudhsm/etc/cloudhsm_mgmt_util.cfg
  
  aws-cloudhsm> enable_e2e
  
  aws-cloudhsm> loginHSM CO admin
  
  Figure 4: Connecting to a Single CloudHSM
  
  Note: The connection or log in is automatically executed on every HSM instance that cloudhsm_mgmt_util is aware of. Note also that for each of the commands that you enter, the cloudhsm_mgmt_util program identifies the IP address of the HSM to which it is communicating.
6. Add the user example_user and then confirm the addition by listing the users in the HSM:
  
  aws-cloudhsm> createUser CU example_user yourpassword
  
  aws-cloudhsm> listUsers
7. Use the quit command to log out and exit the program:
  
  aws-cloudhsm> quit
Now that we’ve added a user to the CloudHSM, let’s add a key so we can see how users and keys are synchronized as the cluster changes.
1. Start the key_mgmt_util program:
  
  $ /opt/cloudhsm/bin/key_mgmt_util
2. Log in to the HSM:
  
  Command: loginHSM –u CU –s example_user
3. Now, generate the key:
  
  Command: genSymKey –t 31 –s 32 –l aes256
4. Display the keys in the cluster:
  
  Command: findKey
  
  Figure 5: Looking Up a Key in a Single CloudHSM
  
  Notice that key_mgmt_util displays the node id to which it is communicating.
5. Use the exit command to leave the program:
  
  exit
Add another HSM to the cluster in the us-west-2b availability zone and note the ENI IP address from the cluster details section in the console, as shown here:

Figure 6: The ENI IP address
Update the cluster configuration files and use cloud_mgmt_util to examine the user configuration:

$ sudo stop cloudhsm-client

$ sudo /opt/cloudhsm/bin/configure –a 10.0.129.209

$ sudo start cloudhsm-client

$ sudo /opt/cloudhsm/bin/configure –m

$ /opt/cloudhsm/bin/cloudhsm_mgmt_util /opt/cloudhsm/etc/cloudhsm_mgmt_util.cfg

aws-cloudhsm> enable_e2e

aws-cloudhsm> loginHSM CO admin

Figure 7: Connecting to the 2-node CloudHSM cluster

Note that cloudhsm_mgmt_util now sends commands to both of the HSMs in the cluster. You can see the same thing when we list the users in the cluster.

Figure 8: Showing proper user synchronization across two CloudHSMs
Now, use key_mgmt_util to examine the keys:

Command: findKey

Figure 9: Showing that keys are properly synchronized across a 2-node CloudHSM cluster

This command confirms that when we added the second HSM, CloudHSM used cluster-initiated synchronization to load the users and keys into the new HSM.

The CloudHSM Cluster Users Become Unsynchronized

Start cloudhsm_mgmt_util and enable end-to-end encryption:

$ /opt/cloudhsm/bin/cloudhsm_mgmt_util /opt/cloudhsm/etc/cloudhsm_mgmt_util.cfg

aws-cloudhsm> enable_e2e

Figure 10: Connecting to the 2-node CloudHSM cluster
While cloudhsm_mgmt_util is left running, add a third HSM in us-west-2c through the console and note the ENI IP address, as shown here:

Figure 11: Connecting to the 2-node CloudHSM cluster
Going back to cloudhsm_mgmt_util, let’s add a user named newest_user to our cluster. Note that we have not exited cloudhsm_mgmt_util and refreshed its configuration file. So it’s still connected only to the first two HSM instances.

aws-cloudhsm> enable_e2e

aws-cloudhsm> loginHSM CO admin yourpassword

aws-cloudhsm> createUser CU newest_user yourpassword

Figure 12: Adding a User to only two nodes of a 3-node CloudHSM Cluster and breaking synchronization

The cloudhsm_mgmt_util command adds the user to the two HSMs it already knows about and had connected to. It doesn’t communicate with the newly added HSM.
Let’s fix this by exiting cloudhsm_mgmt_util. Refresh the configuration, and then run the management utility again.

$sudo stop cloudhsm-client

$sudo /opt/cloudhsm/bin/configure –a 10.0.129.209

$sudo start cloudhsm-client

$sudo /opt/cloudhsm/bin/configure –m

$ /opt/cloudhsm/bin/cloudhsm_mgmt_util /opt/cloudhsm/etc/cloudhsm_mgmt_util.cfgaws-cloudhsm> enable_e2e

aws-cloudhsm> loginHSM CO admin

You can now see cloudhsm_mgmt_util is communicating with all of the cluster nodes.

Figure 13: Connecting to a 3-node CloudHSM cluster
Let’s see what happens when we list the users:

aws-cloudhsm> listUsers

Figure 14: Showing that users are now unsynchronized

You can see from the results that one of the HSMs (server 1) is missing the user named newest_user. The reason this happened is that cloudhsm_mgmt_util was unaware of the HSM instance that was added while it was running (recall that cloudhsm_mgmt_util doesn’t use the cloudhsm_client daemon and, therefore, doesn’t get automatic cluster configuration updates).

Restoring User Synchronization to the CloudHSM Cluster

We now want to add the user newest_user to the single HSM (server 1) that is out of sync. Normally, cloudhsm_mgmt_util works in cluster mode and applies your commands to all HSMs in the cluster. Since we want to work on a single HSM, we’re going to enter the server command to tell cloudhsm_mgmt_util to work in server mode and apply our commands just to that one HSM.

In the server command below, we specify the number of the HSM that we want to change based on the figure above. In the createUser command, you must use the same password that you used in step 3 (in the section titled “The CloudHSM Cluster Users Become Unsynchronized”) on the other HSMs in the cluster so that all HSMs in the cluster have identical user names and passwords. After we make this change, we use the exit command to transition from server mode back to cluster mode.

aws-cloudhsm> server 1

server1> createUser CU newest_user yourpassword

exit

Figure 15: Adding a user to a single-node of a 3-node CloudHSM cluster
Now that we have transitioned back to cluster mode, let’s confirm that the HSM user tables are now synchronized by listing the users:

aws-cloudhsm> listUsers

Figure 16: Showing that users are now synchronized across the 3-node CloudHSM cluster
Let’s take a look at the keys using key_mgmt_util:

Command: loginHSM –u CU –s example_user –p yourpassword

Command: findKey

Figure 17: Showing that keys continued to be synchronized across a 3-node CloudHSM Cluster

You can see that CloudHSM kept the keys in sync because key synchronization is cluster-initiated. No additional actions are required on our part.

Conclusion

AWS CloudHSM provides the ability to create scalable clusters of HSM instances to support the high volumes of cryptographic operations and provide resiliency by supporting multiple availability zones. As mentioned, it’s important to be aware of the various modes of synchronization used in CloudHSM so that each HSM can provide consistent service. In particular, users are synchronized only by the client. Since cloudhsm_mgmt_util doesn’t rely on the client daemon to talk to HSM instances in your cluster, it doesn’t automatically update its configuration. By following the steps above and refreshing the configuration information before changing users or passwords, CloudHSM will keep users and passwords synchronized within the cluster and provide consistent responses to cryptographic operations if the level of redundancy within the HSM cluster changes.

If you have feedback about this blog post, submit comments in the Comments section below. If you have questions about this blog post, start a new thread on the Amazon CloudHSM forum or contact AWS Support.

Want more AWS Security news? Follow us on Twitter.