Kubernetes 节点弹性伸缩开源组件 Karpenter 实践：使用 Spot 实例进行成本优化

在之前的博客里我们介绍了Karpenter的基本概念和部署的详细步骤，在这个博客里我们会展开讨论如何利用Spot实例来进行进一步进行成本优化。

1.关于Spot实例

Spot实例是一种使用备用 EC2 容量的实例，以低于按需价格提供。由于Spot实例允许用户以极低的折扣请求未使用的 EC2 实例，这可能会显著降低用户的 Amazon EC2 成本。Spot实例的每小时价格称为 Spot 价格。每个可用区中的每种实例类型的 Spot 价格是由 Amazon EC2 设置的，并根据Spot实例的长期供求趋势逐步调整。只要容量可用，并且请求的每小时最高价超过 Spot 价格，Spot实例就会运行。如果 Amazon EC2 需要收回容量，或者 Spot 价格超过用户的请求的最高价，Amazon EC2 中断用户的Spot实例。

正常处理Spot实例中断的最佳方法是，设计应用程序以提供容错能力。为此，用户可以利用 EC2 实例rebalance recommendation和Spot实例interruption notification。其中，EC2 实例 rebalance recommendation 是一个新的信号，可在Spot实例处于较高的中断风险时通知用户。该信号可能比该两分钟的Spot实例interruption notification更早到达，从而让用户有机会主动管理Spot实例。

2.使用Karpenter管理Spot实例

接下来我们承接上一篇博客，继续以 Karpenter 0.6.3为例，介绍如何在Karpenter中进行Spot实例的管理。部署流程如下：

在上一篇博客中，我们设置了一个默认的On-demand Provisioner，并扩展了一个GPU推理应用程序。接下来我们修改启动实例类型为Spot和On-demand。

2.1 配置 Provisioner CRD

修改Provisioner属性

cat << EOF > provisioner.yaml
apiVersion: karpenter.sh/v1alpha5
kind: Provisioner
metadata:
 name: gpu
spec:
 requirements:
   - key: karpenter.sh/capacity-type
     operator: In
     values: ["spot","On-demand"] 
   - key: node.kubernetes.io/instance-type
     operator: In
     values: ["g4dn.xlarge", "g4dn.2xlarge"]
 taints:
   - key: nvidia.com/gpu
     effect: "NoSchedule"
 limits:
   resources:
     gpu: 100
 provider:
   subnetSelector:
     karpenter.sh/discovery: karpenter-cluster  
   securityGroupSelector:
     kubernetes.io/cluster/karpenter-cluster: owned  
 ttlSecondsAfterEmpty: 30
EOF

在上述 provisioner 的参数配置中，我们在 capacity-type 中配置了 Spot 和 On-demand 两种类型。在karpenter 选择资源时，会优先选择 Spot 实例类型，当 Spot 实例容量不足的时候，会尝试启动 On-demand 实例类型。

同时建议广泛的添加可选的实例类型以提高获得 Spot 实例的几率，因为对于 Spot 实例来说，karpenter 的分配策略是 capacity optimized prioritized，调用EC2 Fleet API优先选择底层可用容量最多的实例类型。

在前面的例子中，g4dn.4xlarge的Spot价格还低于g4dn.xlarge，因此可以加进来，以最大限度拿到便宜的Spot机器。关于Provisioner的详细配置可以参考官网上的文档。下面我们更新Provisioner:

kubectl apply -f provisioner.yaml

执行kubectl get provisioners查看provisioners部署情况：

执行kubectl describe provisioners查看配置

2.2 配置 AWS Node Termination Handler

在使用了Spot实例后，会面临实例回收的情况（或者现货价格超过用户请求的最高价格）。在这种情况下，可以利用Spot实例重平衡的特性来处理中断。在EC2发出重平衡信号后，会通知用户Spot实例存在较高的中断风险。此信号使用户有机会在现有或新的Spot实例之间，主动地重新平衡工作负载，而不必等待两分钟的Spot实例中断通知。

为了捕获这些信号并优雅的处理中断，我们将部署一个名为AWS Node Termination Handler。Node Termination Handler以两种不同的模式运行：队列模式和实例元数据模式。我们将使用实例元数据模式。在这种模式下，Node Termination Handler实例元数据服务监视器将运行一个非常小的Pod (DaemonSet) ，在每台主机上监控IMDS路径（如/spot或/events），并对相应节点的drain或cordon作出反应。

由于Karpenter本身不对Spot的rebalance recommendation和interruption notification信号进行额外处理，因此需要安装 Node Termination Handler 配合使用。通过下面命令可以利用 helm 来安装 AWS Node Termination Handler：

helm repo add eks https://aws.github.io/eks-charts
helm upgrade --install aws-node-termination-handler \
--namespace kube-system \
--set enableSpotInterruptionDraining="true" \
--set enableRebalanceMonitoring="true" \
--set enableRebalanceDraining="true" \
--set enableScheduledEventDraining="true" \
--set nodeSelector."karpenter\\.sh/capacity-type"=spot \
eks/aws-node-termination-handler

安装参数的意义如下：

enableRebalanceMonitoring

可以让 node interruption handler 同时监控 rebalance recommendation和 interruption notification信号

enableRebalanceDraining：

可以让 node interruption handler 收到 rebalance recommendation信号的时候 drain 实例而不仅仅 cordon 实例（其中 cordon 是默认的操作）

enableSpotInterruptionDraining：

可以对收到 interruption notification信号的时候 drain实例

enableScheduledEventDraining：

可以让 node interruption handler 在 EC2 instance scheduled event 的维护窗口开始之前 drain 实例

nodeSelector."karpenter\\.sh/capacity-type"=spot：

使用 node selector 让node interruption handler只部署在 spot 实例上，不需要部署在按需实例上，避免资源的浪费。

2.3 查看 AWS Node Termination Handler 的日志

我们可以配置 Fluentbit 将 AWS Node Termination Handler的日志发送到Cloudwatch Logs 中，以便后续测试过程中可以看到该Handler的具体处理细节，配置 Fluentbit 的详细步骤如下：

创建 amazon-cloudwatch 的命名空间

kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/cloudwatch-namespace.yaml

运行以下命令以创建一个名为 cluster-info 的 ConfigMap，该 ConfigMap 以集群名称和要向其发送日志的区域命名。将 cluster-name 和 cluster-region 分别替换为用户的集群的名称和区域。

ClusterName=cluster-name
RegionName=cluster-region
FluentBitHttpPort='2020'
FluentBitReadFromHead='Off'
[[ ${FluentBitReadFromHead} = 'On' ]] && FluentBitReadFromTail='Off'|| FluentBitReadFromTail='On'
[[ -z ${FluentBitHttpPort} ]] && FluentBitHttpServer='Off' || FluentBitHttpServer='On'
kubectl create configmap fluent-bit-cluster-info \
--from-literal=cluster.name=${ClusterName} \
--from-literal=http.server=${FluentBitHttpServer} \
--from-literal=http.port=${FluentBitHttpPort} \
--from-literal=read.head=${FluentBitReadFromHead} \
--from-literal=read.tail=${FluentBitReadFromTail} \
--from-literal=logs.region=${RegionName} -n amazon-cloudwatch

运行以下命令，将 Fluent Bit daemonset 下载并部署到集群中。

kubectl apply -f https://raw.githubusercontent.com/aws-samples/amazon-cloudwatch-container-insights/latest/k8s-deployment-manifest-templates/deployment-mode/daemonset/container-insights-monitoring/fluent-bit/fluent-bit.yaml

等待相关资源创建完毕后，默认情况下用户可以从名字为

/aws/containerinsights/Cluster_Name/dataplane 的 CloudWatch log group 中看到 AWS Node Termination Handler 的日志

2.4 监控 Spot 实例中断情况

为了监控 spot 实例的的中断情况，我们部署一个第三方的插件 Spot termination exporter 以便在 Prometheus 和 Grafana dashboard 中查看到集群中 spot 实例的中断情况。

首先我们在 EKS 集群中部署 Prometheus 和 Grafana dashboard，详细步骤参考如下：

添加 Prometheus 和 Grafana 的 helm repo

# add prometheus Helm repo
helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

# add grafana Helm repo
helm repo add grafana https://grafana.github.io/helm-charts

通过 helm 在 EKS 集群中创建 Prometheus

kubectl create namespace prometheus

helm repo add prometheus-community https://prometheus-community.github.io/helm-charts

helm install prometheus prometheus-community/prometheus \
    --namespace prometheus \
    --set alertmanager.persistentVolume.storageClass="gp2" \
    --set server.persistentVolume.storageClass="gp2"

配置 Grafana 数据源为 Prometheus

mkdir ${HOME}/environment/grafana

cat << EoF > ${HOME}/environment/grafana/grafana.yaml
datasources:
  datasources.yaml:
    apiVersion: 1
    datasources:
    - name: Prometheus
      type: prometheus
      url: http://prometheus-server.prometheus.svc.cluster.local
      access: proxy
      isDefault: true
EoF

通过 helm 创建 Grafana，这里我们配置通过 LoadBalancer 接入 Grafana dashboard

kubectl create namespace grafana

helm install grafana grafana/grafana \
    --namespace grafana \
    --set persistence.storageClassName="gp2" \
    --set persistence.enabled=true \
    --set adminPassword='EKS!sAWSome' \
    --values ${HOME}/environment/grafana/grafana.yaml \
    --set service.type=LoadBalancer

等待几分钟，让 ELB 启动，DNS 传播，ELB target 完成注册并通过健康检查，接下来用户可以通过如下命令看到 Grafana dashboard 的 URL

export ELB=$(kubectl get svc -n grafana grafana -o jsonpath='{.status.loadBalancer.ingress[0].hostname}')

echo "http://$ELB"

将上述 URL 复制粘贴到浏览器中即可出现登陆页面，登录时，使用用户名 admin 并通过运行以下命令获取密码：

kubectl get secret --namespace grafana grafana -o jsonpath="{.data.admin-password}" | base64 --decode ; echo

创建一个 dashboard，选择 Prometheus 作为数据源

接下来我们使用 helm 安装 Prometheus Spot termination exporter创建一个 dashboard，选择 Prometheus 作为数据源

helm install banzaicloud-stable/spot-termination-exporter --generate-name

等待 Spot termination exporter pods 创建完成，刷新 Grafana Prometheus dashboard, 查找 aws_instance_termination_imminent 指标即可监控 spot 实例的中断情况，更多关于 Spot termination exporter 的 Prometheus 指标，可以参考文档

2.5 部署示例应用与测试

参考之前的博客以部署示例的推理应用。增加 Pod 的数量为6个（kubectl scale deployment resnet –replicas 6），以便触发 Karpenter 进行扩容。通过如下命令查看 Karpenter Controller 日志：

kubectl logs -f -n karpenter -l app.kubernetes.io/name=karpenter -c controller

在 karpenter 的日志中发现在执行了Pod扩容后，karpenter会优先使用Spot，并且选择价格合适的实例类型。

另外，再来看看当Spot 实例容量不足时的情况。如下日志所示在尝试遍历所有可用区的 g4dn.8xlarge Spot 实例(为了验证效果，修改请求类型为g4dn.8xlarge)依然没有容量之后，karpenter会切换为启动 On-demand 实例。这也侧面印证了前面所说的，建议广泛的添加可选的实例类型以提高获得 Spot 实例的几率。

当有 Spot 实例需要被回收的时候，AWS node terminate handler 会接收到 rbr 信号并且开始 cordon&drain 相应的 Spot 实例。可以通过 fluent Bit 将 AWS node terminate handler 的日志推送到 cloudwatch log 上观察相应的日志

{ "log": "2022/04/12 07:03:13 INF Adding new event to the event store event={\"AutoScalingGroupName\":\"\",\"Description\":\"Rebalance recommendation received. Instance will be cordoned at 2022-04-12T07:03:12Z \\n\",\"EndTime\":\"0001-01-01T00:00:00Z\",\"EventID\":\"rebalance-recommendation-d487975a7f68b05f264cf06941208a5a76b1c2717ed99ee0f912122ee44406b2\",\"InProgress\":false,\"InstanceID\":\"\",\"IsManaged\":false,\"Kind\":\"REBALANCE_RECOMMENDATION\",\"NodeLabels\":null,\"NodeName\":\"ip-192-168-154-203.ap-southeast-1.compute.internal\",\"NodeProcessed\":false,\"Pods\":null,\"StartTime\":\"2022-04-12T07:03:12Z\",\"State\":\"\"}\n", "stream": "stderr", "az": "ap-southeast-1a", "ec2_instance_id":"i-00727e5dd268f477c" }
{ "log": "2022/04/12 07:03:14 INF Requesting instance drain event-id=rebalance-recommendation-d487975a7f68b05f264cf06941208a5a76b1c2717ed99ee0f912122ee44406b2 instance-id= kind=REBALANCE_RECOMMENDATION node-name=ip-192-168-154-203.ap-southeast-1.compute.internal\n", "stream": "stderr", "az": "ap-southeast-1a", "ec2_instance_id": "i-00727e5dd268f477c" }
{ "log": "2022/04/12 07:03:14 INF Pods on node node_name=ip-192-168-154-203.ap-southeast-1.compute.internal pod_names=[\"fluent-bit-nzwrh\",\"resnet-deployment-cdf59b549-h5qsk\",\"spot-termination-exporter-1642989993-spot-termination-expo9brtf\",\"aws-node-99txh\",\"aws-node-termination-handler-tg8kz\",\"kube-proxy-6965k\",\"prometheus-node-exporter-8dfqk\"]\n", "stream":"stderr", "az": "ap-southeast-1a", "ec2_instance_id": "i-00727e5dd268f477c" }
{ "log": "2022/04/12 07:03:14 INF Draining the node\n", "stream": "stderr", "az": "ap-southeast-1a", "ec2_instance_id": "i-00727e5dd268f477c" }
{ "log": "2022/04/12 07:03:14 ??? WARNING: ignoring DaemonSet-managed Pods: amazon-cloudwatch/fluent-bit-nzwrh, default/spot-termination-exporter-1642989993-spot-termination-expo9brtf, kube-system/aws-node-99txh, kube-system/aws-node-termination-handler-tg8kz, kube-system/kube-proxy-6965k, prometheus/prometheus-node-exporter-8dfqk\n", "stream": "stderr", "az":"ap-southeast-1a", "ec2_instance_id": "i-00727e5dd268f477c" }
{ "log": "2022/04/12 07:03:14 ??? evicting pod default/resnet-deployment-cdf59b549-h5qsk\n", "stream": "stderr", "az": "ap-southeast-1a", "ec2_instance_id": "i-00727e5dd268f477c" }
{ "log": "2022/04/12 07:03:40 INF event store statistics drainable-events=0 size=1\n", "stream":"stderr", "az": "ap-southeast-1a", "ec2_instance_id": "i-00727e5dd268f477c" }
{ "log": "2022/04/12 07:03:52 INF Node successfully cordoned and drained node_name=ip-192-168-154-203.ap-southeast-1.compute.internal reason=\"Rebalance recommendation received. Instance will be cordoned at 2022-04-12T07:03:12Z \\n\"\n", "stream": "stderr", "az": "ap-southeast-1a", "ec2_instance_id": "i-00727e5dd268f477c" }

更详细的信息可以参考：https://github.com/aws/aws-node-termination-handler

除此之外，也可以通过 Grafana dashboard 查看Prometheus Spot termination exporter 发布的 Prometheus 的指标来监控 Spot 实例被回收的情况

2.6 异常处理

异常情况1：On-demand实例不足

这时候karpenter 会一直尝试请求 On-demand实例，从karpenter的日志中可以看到如下信息：

2022-04-14T10:57:52.339Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:57:58.340Z INFO controller.provisioning Batched 1 pods in 1.000378197s {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:57:58.365Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g4dn.xlarge] {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:57:58.366Z DEBUG controller.provisioning Discovered caBundle, length 1066 {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:57:59.643Z DEBUG controller.provisioning InsufficientInstanceCapacity for offering { instanceType: g4dn.xlarge, zone: ap-southeast-1b, capacityType: spot }, avoiding for 45s {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:57:59.643Z ERROR controller.provisioning Could not launch node, launching instances, with fleet error(s), InsufficientInstanceCapacity: There is no Spot capacity available that matches your request.; UnfulfillableCapacity: Unable to fulfill capacity due to your request configuration. Please adjust your request and try again. {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:57:59.643Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:58:05.644Z INFO controller.provisioning Batched 1 pods in 1.000626178s {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:58:05.683Z DEBUG controller.provisioning Discovered subnets: [subnet-09be8e66a41d0f87a (ap-southeast-1a) subnet-037edf176cf52dcbf (ap-southeast-1c) subnet-0b31edb266e73b5e6 (ap-southeast-1b)] {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:58:05.688Z INFO controller.provisioning Computed packing of 1 node(s) for 1 pod(s) with instance type option(s) [g4dn.xlarge] {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:58:05.722Z DEBUG controller.provisioning Discovered security groups: [sg-0a13d8c5b0b02f968 sg-0b7a098f22845cab3] {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:58:05.724Z DEBUG controller.provisioning Discovered kubernetes version 1.21 {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:58:05.776Z DEBUG controller.provisioning Discovered ami ami-026f68ad2a2513c76 for query /aws/service/eks/optimized-ami/1.21/amazon-linux-2-gpu/recommended/image_id {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:58:05.776Z DEBUG controller.provisioning Discovered caBundle, length 1066 {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:58:05.816Z DEBUG controller.provisioning Discovered launch template Karpenter-karpenter-gpu-2332944817898789368 {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:58:06.093Z ERROR controller.provisioning Could not launch node, launching instances, with fleet error(s), UnfulfillableCapacity: Unable to fulfill capacity due to your request configuration. Please adjust your request and try again. {"commit": "7e79a67", "provisioner": "default"}

我们发现当 On-demand 实例没有容量的时候，会报如下错误

2022-04-14T10:58:06.093Z ERROR controller.provisioning Could not launch node, launching instances, with fleet error(s), UnfulfillableCapacity: Unable to fulfill capacity due to your request configuration. Please adjust your request and try again.

异常情况2：Spot 实例达到了账户的限制

从 karpenter的日志中可以看到诸如MaxSpotInstanceCountExceeded的报错信息，这时候需要请求提高Spot实例限制，详细参考文档[1]

2022-04-14T10:18:48.555Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:18:54.557Z INFO controller.provisioning Batched 2 pods in 1.000516471s {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:18:54.562Z INFO controller.provisioning Computed packing of 2 node(s) for 2 pod(s) with instance type option(s) [g4dn.xlarge g4dn.2xlarge g4dn.4xlarge] {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:18:58.487Z ERROR controller.provisioning Could not launch node, launching instances, with fleet error(s), MaxSpotInstanceCountExceeded: Max spot instance count exceeded; UnfulfillableCapacity: Unable to fulfill capacity due to your request configuration. Please adjust your request and try again. {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:18:58.487Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:19:04.489Z INFO controller.provisioning Batched 2 pods in 1.000242566s {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:19:04.553Z INFO controller.provisioning Computed packing of 2 node(s) for 2 pod(s) with instance type option(s) [g4dn.xlarge g4dn.2xlarge g4dn.4xlarge] {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:19:10.700Z ERROR controller.provisioning Could not launch node, launching instances, with fleet error(s), MaxSpotInstanceCountExceeded: Max spot instance count exceeded; UnfulfillableCapacity: Unable to fulfill capacity due to your request configuration. Please adjust your request and try again. {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:19:10.700Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:19:16.701Z INFO controller.provisioning Batched 2 pods in 1.000765374s {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:19:16.709Z INFO controller.provisioning Computed packing of 2 node(s) for 2 pod(s) with instance type option(s) [g4dn.xlarge g4dn.2xlarge g4dn.4xlarge] {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:19:22.535Z ERROR controller.provisioning Could not launch node, launching instances, with fleet error(s), MaxSpotInstanceCountExceeded: Max spot instance count exceeded; InsufficientInstanceCapacity: We currently do not have sufficient g4dn.xlarge capacity in the Availability Zone you requested (ap-southeast-1a). Our system will be working on provisioning additional capacity. You can currently get g4dn.xlarge capacity by not specifying an Availability Zone in your request or choosing ap-southeast-1b, ap-southeast-1c.; UnfulfillableCapacity: Unable to fulfill capacity due to your request configuration. Please adjust your request and try again. {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:19:22.535Z INFO controller.provisioning Waiting for unschedulable pods {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:19:28.536Z INFO controller.provisioning Batched 2 pods in 1.000634995s {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:19:28.542Z INFO controller.provisioning Computed packing of 2 node(s) for 2 pod(s) with instance type option(s) [g4dn.xlarge g4dn.2xlarge g4dn.4xlarge] {"commit": "7e79a67", "provisioner": "default"}
2022-04-14T10:19:33.182Z ERROR controller.provisioning Could not launch node, launching instances, with fleet error(s), MaxSpotInstanceCountExceeded: Max spot instance count exceeded; InsufficientInstanceCapacity: We currently do not have sufficient g4dn.xlarge capacity in the Availability Zone you requested (ap-southeast-1b). Our system will be working on provisioning additional capacity. You can currently get g4dn.xlarge capacity by not specifying an Availability Zone in your request or choosing ap-southeast-1a, ap-southeast-1c.; UnfulfillableCapacity: Unable to fulfill capacity due to your request configuration. Please adjust your request and try again. {"commit": "7e79a67", "provisioner": "default"}

3.小结

在这个博客里我们承接上一篇博客，继续以 Karpenter 0.6.3为例，介绍如何在Karpenter中进行Spot实例的管理。Spot实例允许用户以极低的折扣请求未使用的 EC2 实例，显著降低用户的 Amazon EC2 成本。Karpenter 默认使用 capacity optimized prioritized 策略，调用EC2 Fleet API优先选择底层可用容量最多的实例类型。配合 AWS Node Termination Handler 处理 Spot 实例的中断，再部署 Prometheus 以及 Fluentbit 处理 Spot 实例中断的相关指标和日志。

亚马逊AWS官方博客

Kubernetes 节点弹性伸缩开源组件 Karpenter 实践：使用 Spot 实例进行成本优化

1.关于Spot实例

2.使用Karpenter管理Spot实例

2.1 配置 Provisioner CRD

2.2 配置 AWS Node Termination Handler

2.3 查看 AWS Node Termination Handler 的日志

2.5 部署示例应用与测试

2.6 异常处理

异常情况1：On-demand实例不足

异常情况2：Spot 实例达到了账户的限制

3.小结

本篇作者

Learn

Resources

Developers

Help