Disabling Intel Hyper-Threading Technology on Amazon Linux
This post is courtesy of Brian Beach, AWS Solutions Architect
Update – July 31, 2020: It has been brought to our attention that certain AWS EC2 instance types will have different delimiters in the thread_siblings_list depending on CPU architecture and Operating System (either a comma or a dash). To determine the delimiter for your instance run the following command:
$ cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list 0,4 1,5 2,6 or 0-1 0-1
Customers running high performance computing (HPC) workloads on Amazon Linux occasionally ask to disable the Intel Hyper-Threading Technology (HT Technology) that is enabled by default. In the pre-cloud world, this was usually performed by modifying the BIOS. That turned off HT Technology for all users, regardless of any possible benefits obtainable, for example, on I/O intensive workloads. With the cloud, HT Technology can be turned on or off, as is best for a particular application.
This post discusses methods for disabling HT Technology.
What is HT Technology?
According to Intel:
Hyper-Threading Technology makes a single physical processor appear as multiple logical processors. To do this, there is one copy of the architecture state for each logical processor, and the logical processors share a single set of physical execution resources.
For more information, see Hyper-Threading Technology Architecture and Microarchitecture.
This is best explained with an analogy. Imagine that you own a small business assembling crafts for sale on Amazon Handmade. You have a building where the crafts are assembled. Six identical workbenches are inside. Each workbench includes a hammer, a wrench, a screwdriver, a paintbrush, etc. The craftspeople work independently, to produce approximately six times the output as a single craftsperson. On rare occasions, they contend for resources (for example, waiting in a short line to pull some raw materials off the shelf).
As it turns out, business is booming and it’s time to expand production, but there is no more space to add workbenches. The decision is made to have two people work at each workbench. Overall, this is effective, because often there is still little contention between the workers—when one is using the hammer, the other doesn’t need it and is using the wrench. Occasionally one of the workers has to wait because the other worker is using the hammer. Contention for resources is somewhat higher in the single workbench case versus workbench-to-workbench.
HT Technology works in a way analogous to that small business. The building is like the processor, the workbenches like cores, and the workers like threads. Within the Intel Xeon processor, there are two threads of execution in each core. The core is designed to allow progress to be made on one thread (execution state) while the other thread is waiting for a relatively slow operation to occur. In this context, even cached memory is slow, much less main memory (many dozens or hundreds of clock cycles away). Operations involving things like disk I/O are orders of magnitude worse. However, in cases where both threads are operating primarily on very close (e.g., registers) or relatively close (first-level cache) instructions or data, the overall throughput occasionally decreases compared to non-interleaved, serial execution of the two lines of execution.
A good example of contention that makes HT Technology slower is an HPC job that relies heavily on floating point calculations. In this case, the two threads in each core share a single floating point unit (FPU) and are often blocked by one another.
In the case of LINPACK, the benchmark used to measure supercomputers on the TOP500 list, many studies have shown that you get better performance by disabling HT Technology. But FPUs are not the only example. Other, very specific workloads can be shown by experimentation to be slower in some cases when HT Technology is used.
Speaking of experimentation… before I discuss how to disable HT Technology, I’d like to discuss whether to disable it. Disabling HT Technology may help some workloads, but it is much more likely that it hinders others. In a traditional HPC environment, you have to decide to enable or disable HT Technology because you only have one cluster.
In the cloud, you are not limited to one cluster. You can run multiple clusters and disable HT Technology in some cases while leaving it enabled in others. In addition, you can easily test both configurations and decide which is best based on empirical evidence. With that said, for the overwhelming majority of workloads, you should leave HT Technology enabled.
Exploring HT Technology on Amazon Linux
Look at the configuration on an Amazon Linux instance. I ran the examples below on an m4.2xlarge, which has eight vCPUs. Note that each vCPU is a thread of an Intel Xeon core. Therefore, the m4.2xlarge has four cores, each of which run two threads, resulting in eight vCPUs. You can see this by running lscpu (note that I have removed some of the output for brevity).
[root@ip-172-31-1-32 ~]# lscpu CPU(s): 8 On-line CPU(s) list: 0-7 Thread(s) per core: 2 Core(s) per socket: 4 Socket(s): 1 NUMA node(s): 1 NUMA node0 CPU(s): 0-7
You can see from the output that there are four cores, and each core has two threads, resulting in eight CPUs. Exactly as expected. Furthermore, you can run lscpu –extended to see the details of the individual CPUs.
[root@ip-172-31-1-32 ~]# lscpu --extended CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE 0 0 0 0 0:0:0:0 yes 1 0 0 1 1:1:1:0 yes 2 0 0 2 2:2:2:0 yes 3 0 0 3 3:3:3:0 yes 4 0 0 0 0:0:0:0 yes 5 0 0 1 1:1:1:0 yes 6 0 0 2 2:2:2:0 yes 7 0 0 3 3:3:3:0 yes
Notice that the eight CPUs are numbered 0–7. You can see that the first set of four CPUs are each associated with a different core, and the cores are repeated for the second set of four CPUs. Therefore, CPU 0 and CPU 4 are the two threads in core 0, CPU 1 and CPU 5 are the two threads in core 1 and so on.
The Linux kernel, while generally agnostic on the topic and happy to schedule any thread on any logical CPU, is smart enough to know the difference between cores and threads. It gives preference to scheduling the first operating system threads per core. Only when all cores are busy does the kernel add OS threads to cores using the second hardware thread.
Disabling HT Technology at runtime
Now that you understand the relationship between CPUs and cores, you can disable the second set of CPUs using a process called “hotplugging” (in this case, you might say “hot-unplugging”). For more information, see CPU hotplug in the Kernel.
Each logical CPU is represented in the Linux “filesystem” at an entirely virtual location under /sys/devices/system/cpu/. This isn’t really part of the literal filesystem, of course. Linux designers unified the actual filesystem, as well as devices and kernel objects, into a single namespace with many common operations, including the handy ability to control the state of kernel objects by changing the contents of the “files” that represent them. Here you see directory entries for each CPU, and under each directory you find an “online” file used to enable/disable the CPU. Now you can simply write a 0 to that file to disable the CPU.
The kernel is smart enough to allow any scheduled operations to continue to completion on that CPU. The kernel then saves its state at the next scheduling event, and resumes those operations on another CPU when appropriate. Note the nice safety feature that Linux provides: you cannot disable CPU 0 and get an access-denied error if you try. Imagine what would happen if you accidentally took all the CPUs offline!
Disable CPU4 (the second thread in the first core in the m4.2xlarge). You must be the root user to do this.
[root@ip-172-31-1-32 ~]# echo 0 > /sys/devices/system/cpu/cpu4/online
Now you can run lscpu –extended again and see that CPU 4 is offline.
[root@ip-172-31-1-32 ~]# lscpu --extended CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE 0 0 0 0 0:0:0:0 yes 1 0 0 1 1:1:1:0 yes 2 0 0 2 2:2:2:0 yes 3 0 0 3 3:3:3:0 yes 4 - - - ::: no 5 0 0 1 1:1:1:0 yes 6 0 0 2 2:2:2:0 yes 7 0 0 3 3:3:3:0 yes
You can repeat this process for CPUs 5–7 to disable them all. However, this would be tedious on an x1.32xlarge with 128 vCPUs. You can use a script to disable them all.
#!/usr/bin/env bash for cpunum in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -s -d, -f2- | tr ',' '\n' | sort -un) do echo 0 > /sys/devices/system/cpu/cpu$cpunum/online done
Let me explain what that script does. First, it finds the total number of CPUs. Then, it loops over the second half of the CPUs. Finally, it disables each CPU in turn.
After running the script, you can run lscpu –extended and see that second half of your CPUs are disabled, leaving only one thread per core. Perfect!
[root@ip-172-31-1-32 ~]# lscpu --extended CPU NODE SOCKET CORE L1d:L1i:L2:L3 ONLINE 0 0 0 0 0:0:0:0 yes 1 0 0 1 1:1:1:0 yes 2 0 0 2 2:2:2:0 yes 3 0 0 3 3:3:3:0 yes 4 - - - ::: no 5 - - - ::: no 6 - - - ::: no 7 - - - ::: no
Disabling HT Technology at boot
At this point, you have a script to disable HT Technology, but the change is only persisted until the next reboot. To apply the script on every boot, you can add the script to rc.local or use cloud-init bootcmd.
Cloud-init is available on Amazon Linux (as well as most other Linux distributions available in EC2). It executes commands it finds in EC2 system metadata within the user-data part of the metadata namespace. Cloud-init includes a bootcmd module that executes a script each time the instance is booted. The following modified script runs on every boot:
#cloud-config bootcmd: - for cpunum in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -s -d, -f2- | tr ',' '\n' | sort -un); do echo 0 > /sys/devices/system/cpu/cpu$cpunum/online; done
When you launch an instance, add this for User data in step 3 of the wizard. The instance starts with HT Technology disabled and disables it on subsequent reboots.
An alternative approach is to edit grub.conf. Open /boot/grub/grub.conf in your favorite editor. Find the kernel directive and a maxcpus=X parameter where X is half the total number CPUs. Remember that Linux enumerates the first thread in each core followed by the second thread in each core. Therefore, limiting the number of CPUs to half of the total disables the second thread in each core just like you did with hotplugging earlier. You must reboot for this change to take effect.
Customers running HPC workloads on Amazon Linux cannot access the BIOS to disable HT Technology. This post described a process to disable the second thread in each core at runtime and during boot. In a future post, we examine similar solutions for configuring CPUs in Windows.
Note: this technical approach has nothing to do with control over software licensing, or licensing rights, which are sometimes linked to the number of “CPUs” or “cores.” For licensing purposes, those are legal terms, not technical terms. This post did not cover anything about software licensing or licensing rights.