为什么统一的 CloudWatch 代理程序没有将我的指标或日志事件推送到 CloudWatch?

4 分钟阅读
0

我已经在 Amazon Elastic Compute Cloud (Amazon EC2) 实例上配置了统一的 CloudWatch 代理程序,以便将指标和日志发布到 Amazon CloudWatch。但我无法在 CloudWatch 控制台中看到我的指标或日志。为什么统一的 CloudWatch 代理程序没有将我的指标和日志推送到 CloudWatch?

简短描述

统一的 CloudWatch 代理程序没有将您的指标或日志推送到 CloudWatch 的可能原因有几种。例如,您可能遇到权限或连接错误,导致代理程序无法发布您的指标。在查看统一 CloudWatch 代理程序日志时,您可能会看到以下错误:

  • 代理程序日志错误:没有连接到端点
  • 代理程序日志错误:权限不足

解决方法

**注意:**如果您在运行 AWS Command Line Interface (AWS CLI) 命令时收到错误消息,请确保您使用的是最新版本的 AWS CLI

查看统一 CloudWatch 代理程序的日志

使用代理程序日志文件帮助排查在使用统一 CloudWatch 代理程序包时遇到的问题。您可能会遇到以下常见问题之一:

您可能会在日志中看到以下错误之一:

代理程序日志错误:没有连接到端点

2021-08-30T04:07:46Z E! cloudwatch: code: RequestError, message: send request failed, original error: Post "https://monitoring.us-east-1.amazonaws.com/": dial tcp 172.31.11.121:443: i/o timeout
2021-08-30T04:07:46Z W! 210 retries, going to sleep 1m0s before retrying.
2021-08-30T04:07:46Z E! cloudwatch: code: RequestError, message: send request failed, original error: Post "https://monitoring.us-east-1.amazonaws.com/": dial tcp 172.31.11.121:443: i/o timeout
2021-08-30T04:07:46Z W! 211 retries, going to sleep 1m0s before retrying.

代理程序日志错误:权限不足

2021-08-30T02:15:45Z E! cloudwatch: code: AccessDenied, message: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: cloudwatch:PutMetricData, original error: 
2021-08-30T02:15:45Z W! 1 retries, going to sleep 400ms before retrying.
2021-08-30T02:15:46Z E! WriteToCloudWatch failure, err:  AccessDenied: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: cloudwatch:PutMetricData
    status code: 403, request id: f1171fd0-05b6-4f7d-bac2-629c8594c46e

确认与 CloudWatch 端点的连接

当流向 CloudWatch 的流量不应通过公共互联网传输时,您可以改用 VPC 终端节点。如果您使用的是 VPC 终端节点,请检查以下内容:

  • 如果您使用的是私有域名服务器,请确认 DNS 解析提供了准确的响应。
  • 确认 CloudWatch 端点已解析为私有 IP 地址。
  • 确认与 VPC 终端节点关联的安全组允许来自主机的入站流量。

1.    检查与指标端点的连接:

$ telnet monitoring.us-east-1.amazonaws.com 443
Trying 52.46.138.115...
Connected to monitoring.amazonaws.com.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

2.    检查与日志端点的连接:

$ telnet logs.us-east-1.amazonaws.com 443
Trying 3.236.94.218...
Connected to logs.us-east-1.amazonaws.com.
Escape character is '^]'.
^]
telnet> quit
Connection closed

3.    检查 VPC 终端节点是否解析为私有 IP 地址:

$ dig monitoring.us-east-1.amazonaws.com +short
172.31.11.121
172.31.0.13

查看统一 CloudWatch 代理程序的配置

代理程序配置文件详细列出了发布到 CloudWatch 的指标和日志。查看代理程序配置文件以确认是否包含要发布的日志和指标。

确认主机有发布指标和日志的权限

AWS 托管策略 CloudWatchAgentServerPolicyCloudWatchAgentAdminPolicy 可以帮助您部署统一的 CloudWatch 代理程序并检查您是否有正确的权限。使用这些策略作为参考,确保您的主机拥有正确的权限。

这些示例中的 AWS CLI 输出显示权限不足。

此代理程序启动命令输出显示没有 IAM 角色附加到 EC2 实例:

$ /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:CWT-Web-Server -s
****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source ssm:CWT-Web-Server --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Region: us-east-1
credsConfig: map[]
Error in retrieving parameter store content: NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Fail to fetch/remove json config: NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors

Fail to fetch the config!

此代理启动命令输出显示错误的 IAM 角色附加到 EC2 实例:

$ /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:CWT-Web-Server -s
****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source ssm:CWT-Web-Server --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Region: us-east-1
credsConfig: map[]
Error in retrieving parameter store content: AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: ssm:GetParameter on resource: arn:aws:ssm:us-east-1:123456789012:parameter/CWT-Web-Server
    status code: 400, request id: b85b0a7a-0fb1-47b4-924f-be8cf43a3b4d
Fail to fetch/remove json config: AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: ssm:GetParameter on resource: arn:aws:ssm:us-east-1:123456789012:parameter/CWT-Web-Server
    status code: 400, request id: b85b0a7a-0fb1-47b4-924f-be8cf43a3b4d

Fail to fetch the config!

在某些情况下,IAM 用户可能在命令行中。获取用户/角色命令返回与实例关联的 IAM 用户或角色:

$ aws sts get-caller-identity
{
    "UserId": "AROA123456789012ABCDE:i-0744de7c842d2c2ba",
    "Account": "123456789012",
    "Arn": "arn:aws:sts::123456789012:assumed-role/CloudWatchAgentServerRole/i-0744de7c842d2c2ba"
}

确认代理程序已正确启动

代理程序设计为通过 AWS CLI 启动,并以参数形式传递配置文件。使用这些有效的启动命令

Linux 命令:

- `$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:configuration-file-path`
- `$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c ssm:configuration-parameter-store-name`

Windows 命令:

- `& "C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c file:"C:\Program Files\Amazon\AmazonCloudWatchAgent\config.json"`
- `& "C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c ssm:configuration-parameter-store-name`

重要提示:不要从 Windows 控制面板启动代理程序。

确认代理程序正在运行

要发布指标和日志,代理程序必须处于运行状态。运行此命令以确认代理程序处于活动状态

$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
{
    "status": "running",
    "starttime": "2021-08-30T02:13:44+00:00",
    "configstatus": "configured",
    "cwoc_status": "stopped",
    "cwoc_starttime": "",
    "cwoc_configstatus": "not configured",
    "version": "1.247349.0b251399"
}

更新代理程序配置后重新启动代理程序

代理程序不会自动将更改注册到配置文件中。如果代理程序配置已更新为包括新的或不同的指标和日志,请使用以下命令重启代理程序:

$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a stop
****** processing cwagent-otel-collector ******
cwagent-otel-collector has already been stopped

****** processing amazon-cloudwatch-agent ******
Redirecting to /bin/systemctl stop amazon-cloudwatch-agent.service


$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:config.json
****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source file:config.json --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json.tmp
Start configuration validation...
/opt/aws/amazon-cloudwatch-agent/bin/config-translator --input /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json --input-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --output /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
2021/08/31 02:45:37 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json.tmp ...
Valid Json input schema.
I! Detecting run_as_user...
Configuration validation first phase succeeded
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
amazon-cloudwatch-agent has already been stopped
Redirecting to /bin/systemctl restart amazon-cloudwatch-agent.service

$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
{
  "status": "running",
  "starttime": "2021-08-31T02:45:37+0000",
  "configstatus": "configured",
  "cwoc_status": "stopped",
  "cwoc_starttime": "",
  "cwoc_configstatus": "not configured",
  "version": "1.247349.0b251399"
}

相关信息

如何安装和配置统一的 CloudWatch 代理程序,以便从我的 EC2 实例将指标和日志推送到 CloudWatch?

排查 CloudWatch 代理程序的问题

AWS 官方
AWS 官方已更新 2 年前