亚马逊AWS官方博客

在 Ubuntu 22 EC2 实例上安装 NICE DCV

在这篇文章中,我们将学习如何在 Ubuntu 22 EC2 实例上安装和配置 NICE DCV。NICE DCV 是一种高性能远程显示协议,用于从远程机器提供流式远程桌面。

本内容前三个步骤参考了刘辛酉(Liu Xinyou)的博客。他解决了 NVIDIA 驱动程序、gdm 和 dcvserver 同步运作的问题。我在第四个步骤添加了一个完整的 dcv.conf 文件以供单用户访问并启用对 QUIC 协议的支持。

步骤 1:设置一个 EC2 实例

尽管 NICE DCV 在没有显卡的 EC2 实例上可以正常工作,但它主要是为了加速 3D 应用程序而设计的,因此我们将在 g4dn.2xlarge 实例(有 TESLA T4 显卡)上安装及测试它。

我们将使用标准的 Ubuntu 22 AMI 镜像:Ubuntu Server 22.04 LTS (HVM),SSD Volume Type。

在操作中请确保实例满足如下需求:

  1. 实例根卷(root volume)需要至少 50 GB,若需安装其他软件(如 Blender,Stable Diffusion 等),建议选择 100 GB 以上。
  2. 要确保安全组允许 TCP UDP 流量在 8443 端口进入
  3. 需将 IAM 策略添加到您的实例角色,确保 EC2 实例有权访问 NICE DCV 的 S3 许可证文件存储桶。策略文件模版如下:
    {
        "Version": "2012-10-17",
        "Statement": [
           {
               "Effect": "Allow",
               "Action": "s3:GetObject",
               "Resource": "arn:aws:s3:::dcv-license.region/*"
           }
        ]
    }
    

用你的实例所处的区域替换 region。若需了解更多 IAM 角色和策略,请查看 Amazon EC2 的 IAM 角色创建 IAM 策略添加和移除 IAM 身份权限

步骤 2:安装和配置先决条件

配置 X Server 安装桌面与桌面管理器

当你的实例已成功启动,请任选以下一种方法登录:

首先,更新下 Ubuntu 系统:

sudo apt-get update
sudo apt-get -y upgrade

更新后,安装 Ubuntu 桌面和 gdm3 桌面管理器:

sudo apt-get -y install ubuntu-desktop
sudo apt-get -y install gdm3

然后,验证 gdm3 是否为桌面管理器:

cat /etc/X11/default-display-manager

如果验证成功,上述命令将返回如下结果:

/usr/sbin/gdm3

返回如上结果“/usr/sbin/gdm3”,则说明验证失败,那么你需要卸载并重新安装 gdm3。

验证完成后,再次更新系统确保无遗漏:

sudo apt-get upgrade

因 NICE DCV 暂不支持 gdm3 默认打开的 Wayland,因此请按如下操作关闭 Wayland。

(1)打开编辑器:

sudo vim /etc/gdm3/custom.conf

(2)查找 [daemon] 部分,并取消 “WaylandEnable=false” 前的注释(删除 “#” ) :

[daemon]
# Uncomment the line below to force the login screen to use Xorg
WaylandEnable=false

(3)重启 gdm3:

sudo systemctl restart gdm3

配置 X Server

需确保开机或重启时 X Server 自动启动。可通过如下命令验证:

sudo systemctl get-default

若验证结果是“graphical.target”则表示一切准备就绪。若不是,运行如下命令修复:

sudo systemctl set-default graphical.target

验证成功后,启动 X Server:

sudo systemctl isolate graphical.target

接下来检查服务器是否正常运行:

ps aux | grep X | grep -v grep

若正常运行,可得到类似如下结果:

root        2048  0.1  0.2 294016 79776 tty1     Sl+  05:50   0:00 /usr/lib/xorg/Xorg vt1 -displayfd 3 -auth /run/user/133/gdm/Xauthority -nolisten tcp -background none -noreset -keeptty -novtswitch -verbose 3

安装 OpenGL 组件

安装 glxinfo 和 OpenGL,需用如下命令完成:

sudo apt-get -y install mesa-utils

确保支持 OpenGL 渲染:

sudo DISPLAY=:0 XAUTHORITY=$(ps aux | grep "X.*\-auth" | grep -v grep | sed -n 's/.*-auth \([^ ]\+\).*/\1/p') glxinfo | grep -i "opengl.*version"

若支持,可得到类似如下结果:

OpenGL core profile version string: 4.5 (Core Profile) Mesa 22.2.5
OpenGL core profile shading language version string: 4.50
OpenGL version string: 4.5 (Compatibility Profile) Mesa 22.2.5
OpenGL shading language version string: 4.50
OpenGL ES profile version string: OpenGL ES 3.2 Mesa 22.2.5
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

安装 NVIDIA 驱动

通常有 3 种类型的驱动可供选择:Tesla 驱动、GRID 驱动及游戏驱动(了解更多详情可参阅驱动安装文档)。

在安装 GRID 驱动之前、需要使用一下命令安装一些内核头文件(kernel header files):

sudo apt-get upgrade -y linux-aws

在安装过程中若看到类似“the current kernel version is not the expected version”的警告,请选择 “OK” ,这样重启之后将升级到新的内核版本。

然后,安装 gcc 并构建内核元素:

sudo apt-get install -y gcc make linux-headers-$(uname -r)

Ubuntu 通常使用‘nouveau’的开源驱动,需确保屏蔽该驱动:

cat << EOF | sudo tee --append /etc/modprobe.d/blacklist.conf
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv
EOF

屏蔽操作生效后,运行 cat /etc/modprobe.d/blacklist.conf。可得到类似如下结果输出:

# ugly and loud noise, getting on everyone's nerves; this should be done by a
# nice pulseaudio bing (Ubuntu: #77010)
blacklist pcspkr

# EDAC driver for amd76x clashes with the agp driver preventing the aperture
# from being initialised (Ubuntu: #297750). Blacklist so that the driver
# continues to build and is installable for the few cases where its
# really needed.
blacklist amd76x_edac
blacklist vga16fb
blacklist nouveau
blacklist rivafb
blacklist nvidiafb
blacklist rivatv

因输出结果将默认包含 Ubuntu 黑名单中的内容,因此需要在 GRUB 中禁用 nouveau。

首先,打开/etc/default/grub,在末尾添添加如下指令:

GRUB_CMDLINE_LINUX="rdblacklist=nouveau"

然后,更新 grub 配置:

sudo update-grub

在安装最新的 GRID 驱动程序之前,需安装 AWS CLI,以便于 S3 复制

sudo apt-get -y install awscli

请注意,以上命令仅在实例可访问 AWS S3 时才得以生效(建议通过 IAM 实例配置文件)。

下载并安装最新的 GRID 驱动程序:

aws s3 cp --recursive s3://ec2-linux-nvidia-drivers/latest/ .

下载完成后,需要重启 EC2 实例(因为如上 grub 命令重启后生效):

sudo reboot

然后,安装驱动程序:

chmod +x NVIDIA-Linux-x86_64*.run
sudo /bin/sh ./NVIDIA-Linux-x86_64*.run

若收到如下警告可忽略,选择“Continue installation”:

若被询问是否安装 32 位兼容性库,建议选择“YES”:

若收到关于缺少 EGL 供应商库支持的警告,选择“YES”:

驱动程序安装完成后,再次重启:

sudo reboot

然后,验证 NVIDIA 驱动程序是否正常工作:

nvidia-smi -q | head

在 g4dn 实例上,若输出如下结果,则说明验证成功:

==============NVSMI LOG==============

Timestamp                                 : Tue Feb 14 06:55:57 2023
Driver Version                            : 525.85.05
CUDA Version                              : 12.0

Attached GPUs                             : 1
GPU 00000000:00:1E.0
    Product Name                          : Tesla T4

接下来,禁用 GSP 以避免干扰新安装的 NVIDIA 驱动程序。创建 nvidia.conf 文件,并添加”options nvidia NVreg_EnableGpuFirmware=0″

sudo touch /etc/modprobe.d/nvidia.conf
echo "options nvidia NVreg_EnableGpuFirmware=0" | sudo tee --append /etc/modprobe.d/nvidia.conf

禁用操作完成后,重写 xorg.conf 以使用 NVIDIA 驱动程序。

(1)删除所有遗留的配置文件:

sudo rm -rf /etc/X11/XF86Config*

(2)更新 xorg.conf:

sudo nvidia-xconfig --preserve-busid --enable-all-gpus

(3)再次重启(以强制 X 重启):

sudo reboot

(4)重启后再次验证 OpenGL 是否正常工作:

sudo DISPLAY=:0 XAUTHORITY=$(ps aux | grep "X.*\-auth" | grep -v grep | sed -n 's/.*-auth \([^ ]\+\).*/\1/p') glxinfo | grep -i "opengl.*version"

若得到类似如下结果,则说明 NVIDIA 驱动程序验证成功:

OpenGL core profile version string: 4.6.0 NVIDIA 525.85.05
OpenGL core profile shading language version string: 4.60 NVIDIA
OpenGL version string: 4.6.0 NVIDIA 525.85.05
OpenGL shading language version string: 4.60 NVIDIA
OpenGL ES profile version string: OpenGL ES 3.2 NVIDIA 525.85.05
OpenGL ES profile shading language version string: OpenGL ES GLSL ES 3.20

步骤 3:安装 NICE DCV

首先,获取并安装安装程序包的 GPG 密钥:

wget https://d1uj6qtbmh3dt5.cloudfront.net/NICE-GPG-KEY
gpg --import NICE-GPG-KEY

拉取最新的安装程序包:

wget https://d1uj6qtbmh3dt5.cloudfront.net/nice-dcv-ubuntu2204-x86_64.tgz

解压:

tar -xvzf nice-dcv-ubuntu2204-x86_64.tgz

移动到刚解压的目录:

cd nice-dcv-*-ubuntu2204-x86_64

安装 DCV 服务器包和 xdcv:

sudo apt install ./nice-dcv-server*.deb
sudo apt install ./nice-xdcv*.deb

若需通过网络浏览器访问 DCV,应安装如下软件包:

sudo apt install ./nice-dcv-web-viewer*.deb

将新的 dcv 用户添加到 video 组:

sudo usermod -aG video dcv

确保 dcvserver 在启动时自动启动:

sudo systemctl enable dcvserver
sudo systemctl start dcvserver

当 dcvserver 正常运行时,sudo systemctl status dcvserver 应该返回类似这样的内容:

● dcvserver.service - NICE DCV server daemon
     Loaded: loaded (/lib/systemd/system/dcvserver.service; enabled; vendor preset: enabled)
     Active: active (running) since Tue 2023-02-14 09:43:52 UTC; 2min 8s ago
   Main PID: 1955 (dcvserver)
      Tasks: 5 (limit: 37952)
     Memory: 14.8M
        CPU: 108ms
     CGroup: /system.slice/dcvserver.service
             ├─1955 /bin/bash /usr/bin/dcvserver -d --service
             └─1956 /usr/lib/x86_64-linux-gnu/dcv/dcvserver --service

Feb 14 09:43:52 ip-172-31-18-18 systemd[1]: Starting NICE DCV server daemon...
Feb 14 09:43:52 ip-172-31-18-18 modprobe[1954]: modprobe: WARNING: Module eveusb not found in directory /lib/modules/5.15.0-1030-aws
Feb 14 09:43:52 ip-172-31-18-18 systemd[1]: Started NICE DCV server daemon.

步骤 4:准备登录

以用户 ubuntu (Ubuntu EC2 AMIs 的默认用户)身份登录,并需要为该用户设置密码:

sudo passwd ubuntu

编辑 dcv.conf 文件

最后,对 /etc/dcv/dcv.conf 进行更改,以便在启动时默认启动 DCV 会话(且确保我们同时监听接口,将会话分配给 ubuntu 用户等…)。

dcv.conf 文件分为几个部分,其标题如下:[section]。需要特别注意的部分和属性包括:

[session-management]
create-session = true

[session-management/automatic-console-session]
owner = "ubuntu"

[connectivity]
quic-listen-endpoints=['0.0.0.0:8443', '[::]:8443']
web-listen-endpoints=['0.0.0.0:8443', '[::]:8443']

enable-quic-frontend=true

创建名为 console 的默认会话,允许用户 ubuntu 通过网页客户端 https://your-ip-address:8443 和 DCV 查看器应用登录。

可直接复制并粘贴如下内容:

###############################################################################
## Section "license" contains properties to configure the license management
###############################################################################

[license]

# Property "license-file" specifies the path to a demo license file or the name 
# of the license server used by the rlm daemon, in the format port@host 
# (for example 5053@licserver).
# The port number must be the same as that specified in the HOST line of the
# license file.
# If empty or not specified, a default path to a demo license file will be
# used (e.g: /usr/share/dcv/license/license.lic). If the default file does not 
# exists a demo license will be used.
#license-file = ""

###############################################################################
## Section "log" contains properties to configure the DCV logging system
###############################################################################

[log]

# Property "level" contains the logging level used by DCV.
# Can be set to ERROR, WARNING, INFO or DEBUG (in ascending level of verbosity).
# If not specified, the default level is INFO
#level = "INFO"

###############################################################################
## Section "session-management" contains the properties of DCV session creation
###############################################################################

[session-management]

# Property "create-session" requests to automatically create a console session 
# (with ID "console") at DCV startup.
# Can be set to true or false.
# If not specified, no console session will be automatically created.
create-session = true

# Property "enable-gl-in-virtual-sessions" specifies whether to employ the 
# 'dcv-gl' feature (a specific license will be required).
# Allowed values: 'always-on', 'always-off', 'default-on', 'default-off'.
# If not specified, the default value is 'default-on'.
#enable-gl-in-virtual-sessions = "default-on"

###############################################################################
## Section "session-management/defaults" contains the default properties of DCV sessions
###############################################################################

[session-management/defaults]

# Property "permissions-file" specifies the path to the permissions file
# automatically merged with the permissions selected by the user for each session.
# If empty or absent, use the default file in /etc/dcv/default.perm.
#permissions-file = ""

###############################################################################
## Section "session-management.automatic-console-session" contains the properties 
## to be applied ONLY to the "console" session automatically created at server startup 
## when the create-session setting of section 'session-management' is set to true.
###############################################################################

[session-management/automatic-console-session]

# Property "owner" specifies the username of the owner of the automatically
# created "console" session.
owner = "ubuntu"

# Property "permissions-file" specifies the file that contains the permissions 
# to be used to check user access to DCV features.
# If empty, only the owner will have full access to the session.
#permissions-file = ""

# Property "max-concurrent-clients" specifies the maximum number of concurrent
# clients per session.
# If set to -1, no limit is enforced. Default value -1;
#max-concurrent-clients = -1

# Property "storage-root" specifies the path to the folder that will be used 
# as root-folder for file storage operations.
# The file storage will be disabled if the storage-root is empty or the folder 
# does not exist.
#storage-root = ""

###############################################################################
## Section "display" contains the properties of the dcv remote display
###############################################################################

[display]

# Property "target-fps" specifies the maximum allowed frames per second.
# A value of 0 means no limit. If not specified, or if set to a negative value,
# the target-fps value will be determined according to the server characteristics
# and the session type
#target-fps = 30

###############################################################################
## Section "connectivity" contains the properties of the dcv connection
###############################################################################

[connectivity]

quic-listen-endpoints=['0.0.0.0:8443', '[::]:8443']
web-listen-endpoints=['0.0.0.0:8443', '[::]:8443']

# Property "web-port" specifies on which TCP port the DCV server listens on.
# It must be a number between 1024 and 65535 representing an
# available TCP port on which the web server embedded in the DCV Server will
# listen for connection requests to serve HTTP(S) pages and WebSocket
# connections.
# If not specified, DCV will use port 8443.
#web-port=8444

# Property "web-url-path" specifies a URL path for the embedded web server.
# The path must start with /. For instance setting it to "/test/foo" means the
# web server will be reachable at https://host:port/test/foo.
# This property is especially useful when setting up a gateway that then
# routes each connection to a different DCV server.
# If not specified DCV uses "/", which means it will be reachable at
# https://host:port
#web-url-path="/dcv"

# Property "enable-quic-frontend" specifies whether the DCV server
# also enables the use of the QUIC transport for clients which support it.
# If not specified, DCV will not enable QUIC.
enable-quic-frontend=true

# Property "quic-port" specifies on which UDP port the DCV server listens.
# It must be a number between 1024 and 65535 representing an
# available UDP port on which the QUIC server embedded in the DCV Server will
# listen for requests connections by clients using the QUIC transport.
# If not specified, DCV will use port 8443.
#quic-port=8444

# Property "idle-timeout" specifies a timeout in minutes after which
# a client that does not send keyboard or mouse events is considered idle
# and hence disconnected.
# By default it is set to 60 (1 hour). Set to 0 to never disconnect
# idle clients.
#idle-timeout=120

###############################################################################
## Section "security" contains the properties related to authentication and security
###############################################################################

[security]

# Property "authentication" specifies the client authentication method used by
# the DCV server. Use 'system' to delegate client authentication to the
# underlying operating system. Use 'none' to disable client authentication and
# grant access to all clients.
#authentication="none"

# Property "pam-service-name" specifies the name of the PAM configuration file
# used by DCV. The default PAM service name is 'dcv' and corresponds with
# the /etc/pam.d/dcv configuration file. This parameter is only used if
# the 'system' authentication method is used.
#pam-service-name="dcv-custom"

# Property "auth-token-verifier" specifies an endpoint (URL) for an external
# the authentication token verifier. If empty or not specified, the internal
# authentication token verifier is used
#auth-token-verifier="https://127.0.0.1:8444"

更新 dcv.conf 后,需要再次重启:

sudo reboot

重启后可以正常使用 NICE DCV。建议在更新文件后再重启一次,以确保正常工作。建议拍摄您的 EC2 实例系统磁盘的快照,以便随时创建预配置的 NICE DCV 实例。

本篇作者

Jeremy Pedersen

亚马逊云科技解决方案架构师。他专注于机器学习和教育设备,尤其是 Amazon DeepRacer。在加入亚马逊云科技之前,他在阿里云工作了五年,先是担任解决方案架构师,后来成为技术培训师。