AWS Compute Blog

Deploying a 4x4K, GPU-backed Linux desktop instance on AWS

Contributed by Amr Ragab, HPC Application Consultant, AWS Professional Services

AWS currently supports many managed des­ktop delivery mechanisms. Amazon WorkSpaces and Amazon AppStream 2.0 both deliver managed Windows-based machine images with GPU-backed instances. However, many desktop services and applications are better served through a Linux backed instance. Given the variety of Linux distributions as well as desktop managers, it can be valuable to have a generic solution for provisioning a Linux desktop on Amazon EC2.

A GPU-backed instance reduces the computational requirements from the client (local) machine, eliminating the need for a local discrete GPU to run graphical workloads. The framebuffer objects generated by the GPU are compressed when sent over the network, and decompressed by the local CPU resources. This allows clients to take advantage of the server GPU and display the high-resolution content on local thin clients, mobile devices, and low-powered desktops and laptops. Such GPU-backed Linux instances have been used for VFX rendering, computational drug discovery, and computational fluid dynamics (CFD) simulation use cases. An upcoming followup post details enabling this technology on the Windows platform.

Configuration

In this configuration, a client machine connects to the provisioned desktop (server) in the cloud. The server captures the framebuffer, which is sent in real time to the client machine over the network. Thus latency is an important metric to consider when provisioning this solution. I recommend choosing the nearest AWS Region (under 100 ms). Some customers may even prefer to install AWS Direct Connect.

Region Latency
US-East (Virginia) 18 ms
US East (Ohio) 31 ms
US-West (California) 77 ms
US-West (Oregon) 97 ms
Canada (Central) 29 ms
Europe (Ireland) 89 ms
Europe (London) 90 ms
Europe (Frankfurt) 108 ms
Asia Pacific (Mumbai) 197 ms
Asia Pacific (Seoul) 198 ms
Asia Pacific (Singapore) 288 ms
Asia Pacific (Sydney) 218 ms
Asia Pacific (Tokyo) 188 ms
South America (São Paulo) 138 ms
China (Beijing) 267 ms
AWS GovCloud (US) 97 ms

Source: http://www.cloudping.info/ from the Amazon offices located in Herndon, VA

Bandwidth requirements depend on the quality of the desktop experience as well as the desired resolution. Provision the backend Linux desktop instance with a 4096×2160 (4K) resolution. Depending on the specific G3 instance type selected, multi-GPU managed desktops give additional performance benefits. Each instance can also host multiple users, either in collaborative sessions, or with up to four independent 4K monitors. The GPU framebuffer memory used per session generally limits the number of sessions per managed desktop.

A smooth reliable experience depends on a low latency and high-bandwidth connection to the EC2 instance hosting the desktop. One of the benefits of using a multithreaded framebuffer reader is that only the defined block of the rendered desktop that is changing needs to be sent over the network. Full-screen redraws may be necessary only in rare cases. The minimum requirements for this 4K (3840×2160) configuration are as follows:

  • Bandwidth: 50 Mbps
  • Latency: < 30 ms
  • Jitter: < 5 ms

Deployment

Use RHEL/CentOS for the deployment. Except for DCV, this stack is compatible with Debian/Ubuntu distributions. Use the CentOS 7.5 Server AMI and install the NVIDIA/Xorg/KDE stack to create a fully functioning desktop environment with a max resolution of 16384 x 8640 (that is, 4x4K) at 60 Hz.

This stack contains the following software:

  • CentOS 7.5 Base
  • Xorg 1.19
  • NVIDIA Grid Driver 6.1 (for the G3 instance family)
  • KDE Desktop environment
  • VirtualGL
  • TurboVNC
  • NICE DCV

To make the most efficient use of the NVIDIA Tesla M60 framebuffer memory, disable the compositing features of the desktop manager. Other non-compositing desktop managers (such as XFCE, MATE, etc.) are supported as well. This ensures that the GPU is reserved for specific OpenGL API tasks for the application, and that the performance is not impacted by the desktop environment decorations.

Start up a CentOS 7.5 server desktop based on the latest AMI available in the closest Region:

Distributor ID:    CentOS
Description:       CentOS Linux release 7.5.1804 (Core)
Release:           7.5.1804
Codename:          Core

Now install the Xorg stack with the KDE desktop manager:

sudo yum install epel-release
sudo yum update
sudo yum groupinstall "Development Tools"
sudo yum install xorg-* kernel-devel dkms python-pip lsb
sudo pip install awscli
sudo yum groupinstall "KDE Plasma Workspaces"
sudo systemctl disable firewalld #AWS security groups will provide our firewall rules
# if there is a kernel update
sudo reboot

Download the NVIDIA Grid driver (6.1). For more information, see Installing the NVIDIA Driver on Linux Instances.

aws s3 cp --recursive s3://ec2-linux-nvidia-drivers/ .
chmod +x latest/NVIDIA-Linux-x86_64-390.57-grid.run
sudo .latest/NVIDIA-Linux-x86_64-390.57-grid.run
# register the driver with dkms, ignore errors associated with 32bit compatible libraries
sudo systemctl set-default graphical.target

Deposit the xorg.conf file in /etc/X11/xorg.conf:

Section "ServerLayout"
        Identifier     "X.org Configured"
        Screen      0  "Screen0" 0 0
        InputDevice    "Mouse0" "CorePointer"
        InputDevice    "Keyboard0" "CoreKeyboard"
EndSection
 
Section "Files"
        ModulePath   "/usr/lib64/xorg/modules"
        FontPath     "catalogue:/etc/X11/fontpath.d"
        FontPath     "built-ins"
EndSection
 
Section "Module"
        Load  "glx"
EndSection
 
Section "InputDevice"
        Identifier  "Keyboard0"
        Driver      "kbd"
EndSection
 
Section "InputDevice"
        Identifier  "Mouse0"
        Driver      "mouse"
        Option      "Protocol" "auto"
        Option      "Device" "/dev/input/mice"
        Option      "ZAxisMapping" "4 5 6 7"
EndSection
 
Section "Monitor"
        Identifier   "Monitor0"
        VendorName   "Monitor Vendor"
        ModelName    "Monitor Model"
        Modeline "3840x2160_60.00"  712.34  3840 4152 4576 5312  2160 2161 2164 2235  -HSync +Vsync
EndSection

 
Section "Device"
        Identifier  "Card0"
        Driver      "nvidia"
        Option "ConnectToAcpid" "0"
        BusID       "PCI:0:30:0"
EndSection
 
Section "Screen"
        Identifier "Screen0"
        Device     "Card0"
        Monitor    "Monitor0"
        SubSection "Display"
                Viewport   0 0
                Depth     24
        Modes    "4096x2160" "3840x2160"
        EndSubSection
EndSection

Reboot again and check that the nvidia-gridd service is running. You may notice errors. They can be safely ignored after the nvidia-gridd service successfully acquires a license.

[root@ip-10-0-125-164 ~]# systemctl status nvidia-gridd.service
● nvidia-gridd.service - NVIDIA Grid Daemon
   Loaded: loaded (/usr/lib/systemd/system/nvidia-gridd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2018-05-29 18:37:35 UTC; 39s ago
  Process: 863 ExecStart=/usr/bin/nvidia-gridd (code=exited, status=0/SUCCESS)
 Main PID: 881 (nvidia-gridd)
   CGroup: /system.slice/nvidia-gridd.service
           └─881 /usr/bin/nvidia-gridd
May 29 18:37:35 ip-10-0-125-164.ec2.internal systemd[1]: Starting NVIDIA Grid Daemon...
May 29 18:37:35 ip-10-0-125-164.ec2.internal nvidia-gridd[881]: Started (881)
May 29 18:37:35 ip-10-0-125-164.ec2.internal systemd[1]: Started NVIDIA Grid Daemon.
May 29 18:37:36 ip-10-0-125-164.ec2.internal nvidia-gridd[881]: Configuration parameter ( ServerAddress  FeatureType) not set
May 29 18:37:40 ip-10-0-125-164.ec2.internal nvidia-gridd[881]: Calling load_byte_array(tra)
May 29 18:37:41 ip-10-0-125-164.ec2.internal nvidia-gridd[881]: License acquired successfully (2)

You can confirm that 4K resolution is enabled by running the following command:

DISPLAY=:0 xrandr -q
Screen 0: minimum 8 x 8, current 4096 x 2160, maximum 16384 x 8640
DVI-D-0 connected primary 4096x2160+0+0 (normal left inverted right x axis y axis) 641mm x 400mm
2560x1600 59.86+
4096x2160 60.03*
3840x2160 60.00 

Finally, check that your underlying GL renderer is using the NVIDIA driver by querying glxinfo

DISPLAY=:0 glxinfo

OpenGL vendor string: NVIDIA Corporation
OpenGL renderer string: Quadro FX Tesla M60/PCIe/SSE2
OpenGL core profile version string: 4.5.0 NVIDIA 390.57
OpenGL core profile shading language version string: 4.50 NVIDIA
OpenGL core profile context flags: (none)
OpenGL core profile profile mask: core profile
OpenGL core profile extensions:
OpenGL version string: 4.6.0 NVIDIA 390.57
OpenGL shading language version string: 4.60 NVIDIA

At the time of publication, OpenGL 4.5 is enabled. Your applications can take advantage of that API for rendering.

To interact with the instance, install server-side desktop remote display software that can specifically take advantage of the 3D hardware acceleration. For example, AWS provides the NICE DCV platform.

DCV is an accelerated remote desktop framework that provides in-web browser desktop connections. DCV is supported in both Windows and Linux (RHEL/CentOS). In the Windows platform, OpenGL and DirectX are fully supported. DCV entitlement is free when provisioning on AWS. NICE DCV is also provided as a component to the AWS EnginFrame and myHPC solutions.

To install DCV, download the NICE DCV 2017 EL7 archive and Administrative Guide. After you extract the archive in the instance, you see a list of nice-* RPMS. You don’t have to worry about licensing, as the installer captures that the instance is running in AWS.

sudo yum localinstall nice-*
sudo systemctl enable dcvserver
sudo systemctl start dcvserver

When the DCV server starts, you have the option to create a single console session or multiple virtual sessions. You must assign a password for the CentOS user issued, by running the following command:

sudo passwd centos

Start the console session:

sudo dcv create-session --type=console --owner centos session1
sudo dcv list-sessions

The AWS security groups are enabled to allow TCP 8443 traffic to the instance. You see the DCV login portal and can interact with the instance. Other popular frameworks include the following:

You can also find plug and play images for managed desktops in the AWS Marketplace.

Optimization

Implement the changes outlined in the Optimizing GPU Settings (P2, P3, and G3 Instances) topic. You can turn off the autoboost feature and set the maximum graphics and memory clocks manually.

sudo nvidia-smi --auto-boost-default=0
sudo nvidia-smi -ac 2505,1177

Application testing

For testing, look at PyMOL (PyMOL Molecular Graphics System, Version 2.0 Schrödinger, LLC.). PyMOL is a standard commercial drug discovery application that is used for processing, and visualizing biochemical structures.  I used the opensource fork.

With the NVIDIA GRID licensing enabled earlier, PyMOL can take advantage of the Quadro features supplied by the Tesla M60. After it’s installed and loaded, you can confirm the functionality of the entire G3 instance software stack installed earlier:

PyMOL(TM) Molecular Graphics System, Version 2.1.0.
 Copyright (c) Schrodinger, LLC.
 All Rights Reserved.
 
    Created by Warren L. DeLano, Ph.D. 
 
    PyMOL is user-supported open-source software.  Although some versions
    are freely available, PyMOL is not in the public domain.
 
    If PyMOL is helpful in your work or study, then please volunteer 
    support for our ongoing efforts to create open and affordable scientific
    software by purchasing a PyMOL Maintenance and/or Support subscription.

    More information can be found at "http://www.pymol.org".
 
    Enter "help" for a list of commands.
    Enter "help <command-name>" for information on a specific command.

 Hit ESC anytime to toggle between text and graphics.

 Detected OpenGL version 2.0 or greater. Shaders available.
 Detected GLSL version 4.60.
 OpenGL graphics engine:
  GL_VENDOR:   NVIDIA Corporation
  GL_RENDERER: Quadro FX Tesla M60/PCIe/SSE2
  GL_VERSION:  4.6.0 NVIDIA 390.57
 Adapting to Quadro hardware.
 Detected 16 CPU cores.  Enabled multithreaded rendering.

In the PyMOL window, run “fetch 5ta3”, which is a 39k amino acid protein, under the 4K desktop environment. Rotating and translating the protein should be smooth and respond quickly to pointer events.

The PyMOL Gallery contains other representative examples that take advantage of various visualization and processing workflows. Also, you can find many demos (choose Wizard, Demo).

Under the Sculpting demo, you can show the pointer latency between the client and server.

Finally, look at ray tracing. From the PyMOL wiki, take a chemical structure and render each frame with ray tracing to produce a video. On the Tesla M60 with Quadro features enabled, the total render time was approximately 1 minute.

Scalability

As I mentioned previously, the framebuffer redirection protocols have a feature set to create multiple virtual sessions per node. A virtual session is not necessarily tied to a single user either. In other words, the number of independent virtual sessions is limited by the total amount of GPU frame buffer memory used in all sessions per GPU. Thus, it’s possible to scale horizontally by increasing the number of G3 instances, or vertically by using larger instance types in the G3 family.

Summary

The G3 instance type is purpose-built to provide a managed high-end professional graphics infrastructure for visual computing needs. With NICE DCV, we can take advantage of NVIDIA Quadro software features for a range of applications including drug discovery and VFX rendering. It should be noted that this deployment will also work with the new 8K and 12K monitors that are now commercially available. Connected with the AWS high performance network backbone, the instance can become an integral part of your graphics workload pipeline. Now, you can power-up and deliver your applications to teams working anywhere in the world.