為什麼統一的 CloudWatch 代理程式不會將我的指標或日誌事件推送到 CloudWatch?

上次更新日期:2022 年 4 月 26 日

我已在我的 Amazon Elastic Compute Cloud (Amazon EC2) 執行個體上設定統一的 CloudWatch 代理程式,以將指標和日誌發佈到 Amazon CloudWatch。但我在 CloudWatch 主控台中看不到我的指標或日誌。為什麼統一的 CloudWatch 代理程式不會將我的指標和日誌推送到 CloudWatch?

簡短描述

統一的 CloudWatch 代理程式可能無法將您的指標或日誌推送到 CloudWatch,其原因有很多。例如,可能存在許可或連線錯誤,這會阻止代理程式發佈您的指標。當您檢閱統一的 CloudWatch 代理程式日誌時,您可能會看到如下錯誤:

  • 代理程式日誌錯誤:未連線到端點
  • 代理程式日誌錯誤:許可不足

解決方案

注意:如果您在執行 AWS Command Line Interface (AWS CLI) 命令時收到錯誤,請確保您使用的是最新的 AWS CLI 版本

檢閱統一的 CloudWatch 代理程式日誌

透過代理程式日誌檔案,幫助您解決使用統一的 CloudWatch 代理程式套件時遇到的問題。您可能遇到以下常見問題之一:

您可能會在日誌中看到以下錯誤之一:

代理程式日誌錯誤:未連線到端點

2021-08-30T04:07:46Z E! cloudwatch: code: RequestError, message: send request failed, original error: Post "https://monitoring.us-east-1.amazonaws.com/": dial tcp 172.31.11.121:443: i/o timeout
2021-08-30T04:07:46Z W! 210 retries, going to sleep 1m0s before retrying.
2021-08-30T04:07:46Z E! cloudwatch: code: RequestError, message: send request failed, original error: Post "https://monitoring.us-east-1.amazonaws.com/": dial tcp 172.31.11.121:443: i/o timeout
2021-08-30T04:07:46Z W! 211 retries, going to sleep 1m0s before retrying.

代理程式日誌錯誤:許可不足

2021-08-30T02:15:45Z E! cloudwatch: code: AccessDenied, message: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: cloudwatch:PutMetricData, original error: 
2021-08-30T02:15:45Z W! 1 retries, going to sleep 400ms before retrying.
2021-08-30T02:15:46Z E! WriteToCloudWatch failure, err:  AccessDenied: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: cloudwatch:PutMetricData
    status code: 403, request id: f1171fd0-05b6-4f7d-bac2-629c8594c46e

確認與 CloudWatch 端點的連線

當到 CloudWatch 的流量不應經過公有網際網路時,您可以改用 VPC 端點。如果您使用的是 VPC 端點,請檢查以下內容:

  • 如果您使用的是私有名稱伺服器,請確認 DNS 解析提供了準確的回應。
  • 確認 CloudWatch 端點解析為私有 IP 地址。
  • 確認與 VPC 端點關聯的安全群組允許來自主機的入站流量。

1.    檢查與指標端點的連線:

$ telnet monitoring.us-east-1.amazonaws.com 443
Trying 52.46.138.115...
Connected to monitoring.amazonaws.com.
Escape character is '^]'.
^]
telnet> quit
Connection closed.

2.    檢查與日誌端點的連線:

$ telnet logs.us-east-1.amazonaws.com 443
Trying 3.236.94.218...
Connected to logs.us-east-1.amazonaws.com.
Escape character is '^]'.
^]
telnet> quit
Connection closed

3.    檢查 VPC 端點是否解析為私有 IP 地址:

$ dig monitoring.us-east-1.amazonaws.com +short
172.31.11.121
172.31.0.13

檢閱統一的 CloudWatch 代理程式組態

代理程式組態檔案詳細説明發佈到 CloudWatch 的指標和日誌。檢閱代理程式組態檔案,確認包含要發佈的日誌和指標。

確認主機具有發佈指標和日誌的許可

AWS 受管政策 CloudWatchAgentServerPolicyCloudWatchAgentAdminPolicy 可幫助您部署統一的 CloudWatch 代理程式並檢查您是否擁有正確的許可。使用這些政策作為參考,確保您的主機擁有正確的許可。

這些範例中的 AWS CLI 輸出顯示許可不足。

此代理程式啟動命令輸出顯示沒有任何 IAM 角色附加到 EC2 執行個體:

$ /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:CWT-Web-Server -s
****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source ssm:CWT-Web-Server --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Region: us-east-1
credsConfig: map[]
Error in retrieving parameter store content: NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Fail to fetch/remove json config: NoCredentialProviders: no valid providers in chain. Deprecated.
    For verbose messaging see aws.Config.CredentialsChainVerboseErrors

Fail to fetch the config!

此代理程式啟動命令輸出顯示不正確的 IAM 角色附加到 EC2 執行個體:

$ /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:CWT-Web-Server -s
****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source ssm:CWT-Web-Server --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Region: us-east-1
credsConfig: map[]
Error in retrieving parameter store content: AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: ssm:GetParameter on resource: arn:aws:ssm:us-east-1:123456789012:parameter/CWT-Web-Server
    status code: 400, request id: b85b0a7a-0fb1-47b4-924f-be8cf43a3b4d
Fail to fetch/remove json config: AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: ssm:GetParameter on resource: arn:aws:ssm:us-east-1:123456789012:parameter/CWT-Web-Server
    status code: 400, request id: b85b0a7a-0fb1-47b4-924f-be8cf43a3b4d

Fail to fetch the config!

在某些情況下,IAM 使用者可能位於命令列。獲取使用者/角色命令會返回與執行個體關聯的 IAM 使用者或角色:

$ aws sts get-caller-identity
{
    "UserId": "AROA123456789012ABCDE:i-0744de7c842d2c2ba",
    "Account": "123456789012",
    "Arn": "arn:aws:sts::123456789012:assumed-role/CloudWatchAgentServerRole/i-0744de7c842d2c2ba"
}

確認代理程式是否正確啟動

代理程式設計為使用 AWS CLI 啟動,並將組態檔案作為引數傳遞。使用這些有效的啟動命令

Linux 命令:

- `$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:configuration-file-path`
- `$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c ssm:configuration-parameter-store-name`

Windows 命令:

- `& "C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c file:"C:\Program Files\Amazon\AmazonCloudWatchAgent\config.json"`
- `& "C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c ssm:configuration-parameter-store-name`

重要提示:不要從 Windows 控制面板啟動代理程式。

確認代理程式正在執行中

若要發佈指標和日誌,代理程式必須正在執行中。執行此命令,確認代理程式處於活動狀態

$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
{
    "status": "running",
    "starttime": "2021-08-30T02:13:44+00:00",
    "configstatus": "configured",
    "cwoc_status": "stopped",
    "cwoc_starttime": "",
    "cwoc_configstatus": "not configured",
    "version": "1.247349.0b251399"
}

更新代理程式組態後重新啟動代理程式

代理程式不會自動註冊對組態檔案的變更。如果更新代理程式組態以包含新的或不同的指標和日誌,請使用此命令重新啟動代理程式:

$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a stop
****** processing cwagent-otel-collector ******
cwagent-otel-collector has already been stopped

****** processing amazon-cloudwatch-agent ******
Redirecting to /bin/systemctl stop amazon-cloudwatch-agent.service


$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:config.json
****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source file:config.json --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json.tmp
Start configuration validation...
/opt/aws/amazon-cloudwatch-agent/bin/config-translator --input /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json --input-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --output /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
2021/08/31 02:45:37 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json.tmp ...
Valid Json input schema.
I! Detecting run_as_user...
Configuration validation first phase succeeded
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
amazon-cloudwatch-agent has already been stopped
Redirecting to /bin/systemctl restart amazon-cloudwatch-agent.service

$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
{
  "status": "running",
  "starttime": "2021-08-31T02:45:37+0000",
  "configstatus": "configured",
  "cwoc_status": "stopped",
  "cwoc_starttime": "",
  "cwoc_configstatus": "not configured",
  "version": "1.247349.0b251399"
}