為什麼統一的 CloudWatch 代理程式不會將我的指標或日誌事件推送到 CloudWatch?
上次更新日期:2022 年 4 月 26 日
我已在我的 Amazon Elastic Compute Cloud (Amazon EC2) 執行個體上設定統一的 CloudWatch 代理程式,以將指標和日誌發佈到 Amazon CloudWatch。但我在 CloudWatch 主控台中看不到我的指標或日誌。為什麼統一的 CloudWatch 代理程式不會將我的指標和日誌推送到 CloudWatch?
簡短描述
統一的 CloudWatch 代理程式可能無法將您的指標或日誌推送到 CloudWatch,其原因有很多。例如,可能存在許可或連線錯誤,這會阻止代理程式發佈您的指標。當您檢閱統一的 CloudWatch 代理程式日誌時,您可能會看到如下錯誤:
- 代理程式日誌錯誤:未連線到端點
- 代理程式日誌錯誤:許可不足
解決方案
注意:如果您在執行 AWS Command Line Interface (AWS CLI) 命令時收到錯誤,請確保您使用的是最新的 AWS CLI 版本。
檢閱統一的 CloudWatch 代理程式日誌
透過代理程式日誌檔案,幫助您解決使用統一的 CloudWatch 代理程式套件時遇到的問題。您可能遇到以下常見問題之一:
- 您遇到與所需 AWS 服務端點或 VPC 端點的連線問題。
- 您沒有正確的許可,無法對 CloudWatch 進行支援 API 呼叫。
- 本機檔案系統中不存在日誌檔案。
您可能會在日誌中看到以下錯誤之一:
代理程式日誌錯誤:未連線到端點
2021-08-30T04:07:46Z E! cloudwatch: code: RequestError, message: send request failed, original error: Post "https://monitoring.us-east-1.amazonaws.com/": dial tcp 172.31.11.121:443: i/o timeout
2021-08-30T04:07:46Z W! 210 retries, going to sleep 1m0s before retrying.
2021-08-30T04:07:46Z E! cloudwatch: code: RequestError, message: send request failed, original error: Post "https://monitoring.us-east-1.amazonaws.com/": dial tcp 172.31.11.121:443: i/o timeout
2021-08-30T04:07:46Z W! 211 retries, going to sleep 1m0s before retrying.
代理程式日誌錯誤:許可不足
2021-08-30T02:15:45Z E! cloudwatch: code: AccessDenied, message: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: cloudwatch:PutMetricData, original error:
2021-08-30T02:15:45Z W! 1 retries, going to sleep 400ms before retrying.
2021-08-30T02:15:46Z E! WriteToCloudWatch failure, err: AccessDenied: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: cloudwatch:PutMetricData
status code: 403, request id: f1171fd0-05b6-4f7d-bac2-629c8594c46e
確認與 CloudWatch 端點的連線
當到 CloudWatch 的流量不應經過公有網際網路時,您可以改用 VPC 端點。如果您使用的是 VPC 端點,請檢查以下內容:
- 如果您使用的是私有名稱伺服器,請確認 DNS 解析提供了準確的回應。
- 確認 CloudWatch 端點解析為私有 IP 地址。
- 確認與 VPC 端點關聯的安全群組允許來自主機的入站流量。
1. 檢查與指標端點的連線:
$ telnet monitoring.us-east-1.amazonaws.com 443
Trying 52.46.138.115...
Connected to monitoring.amazonaws.com.
Escape character is '^]'.
^]
telnet> quit
Connection closed.
2. 檢查與日誌端點的連線:
$ telnet logs.us-east-1.amazonaws.com 443
Trying 3.236.94.218...
Connected to logs.us-east-1.amazonaws.com.
Escape character is '^]'.
^]
telnet> quit
Connection closed
3. 檢查 VPC 端點是否解析為私有 IP 地址:
$ dig monitoring.us-east-1.amazonaws.com +short
172.31.11.121
172.31.0.13
檢閱統一的 CloudWatch 代理程式組態
代理程式組態檔案詳細説明發佈到 CloudWatch 的指標和日誌。檢閱代理程式組態檔案,確認包含要發佈的日誌和指標。
確認主機具有發佈指標和日誌的許可
AWS 受管政策 CloudWatchAgentServerPolicy 和 CloudWatchAgentAdminPolicy 可幫助您部署統一的 CloudWatch 代理程式並檢查您是否擁有正確的許可。使用這些政策作為參考,確保您的主機擁有正確的許可。
這些範例中的 AWS CLI 輸出顯示許可不足。
此代理程式啟動命令輸出顯示沒有任何 IAM 角色附加到 EC2 執行個體:
$ /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:CWT-Web-Server -s
****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source ssm:CWT-Web-Server --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Region: us-east-1
credsConfig: map[]
Error in retrieving parameter store content: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Fail to fetch/remove json config: NoCredentialProviders: no valid providers in chain. Deprecated.
For verbose messaging see aws.Config.CredentialsChainVerboseErrors
Fail to fetch the config!
此代理程式啟動命令輸出顯示不正確的 IAM 角色附加到 EC2 執行個體:
$ /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -c ssm:CWT-Web-Server -s
****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source ssm:CWT-Web-Server --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Region: us-east-1
credsConfig: map[]
Error in retrieving parameter store content: AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: ssm:GetParameter on resource: arn:aws:ssm:us-east-1:123456789012:parameter/CWT-Web-Server
status code: 400, request id: b85b0a7a-0fb1-47b4-924f-be8cf43a3b4d
Fail to fetch/remove json config: AccessDeniedException: User: arn:aws:sts::123456789012:assumed-role/cwagent/i-0744de7c842d2c2ba is not authorized to perform: ssm:GetParameter on resource: arn:aws:ssm:us-east-1:123456789012:parameter/CWT-Web-Server
status code: 400, request id: b85b0a7a-0fb1-47b4-924f-be8cf43a3b4d
Fail to fetch the config!
在某些情況下,IAM 使用者可能位於命令列。獲取使用者/角色命令會返回與執行個體關聯的 IAM 使用者或角色:
$ aws sts get-caller-identity
{
"UserId": "AROA123456789012ABCDE:i-0744de7c842d2c2ba",
"Account": "123456789012",
"Arn": "arn:aws:sts::123456789012:assumed-role/CloudWatchAgentServerRole/i-0744de7c842d2c2ba"
}
確認代理程式是否正確啟動
代理程式設計為使用 AWS CLI 啟動,並將組態檔案作為引數傳遞。使用這些有效的啟動命令。
Linux 命令:
- `$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:configuration-file-path`
- `$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c ssm:configuration-parameter-store-name`
Windows 命令:
- `& "C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c file:"C:\Program Files\Amazon\AmazonCloudWatchAgent\config.json"`
- `& "C:\Program Files\Amazon\AmazonCloudWatchAgent\amazon-cloudwatch-agent-ctl.ps1" -a fetch-config -m ec2 -s -c ssm:configuration-parameter-store-name`
重要提示:不要從 Windows 控制面板啟動代理程式。
確認代理程式正在執行中
若要發佈指標和日誌,代理程式必須正在執行中。執行此命令,確認代理程式處於活動狀態。
$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
{
"status": "running",
"starttime": "2021-08-30T02:13:44+00:00",
"configstatus": "configured",
"cwoc_status": "stopped",
"cwoc_starttime": "",
"cwoc_configstatus": "not configured",
"version": "1.247349.0b251399"
}
更新代理程式組態後重新啟動代理程式
代理程式不會自動註冊對組態檔案的變更。如果更新代理程式組態以包含新的或不同的指標和日誌,請使用此命令重新啟動代理程式:
$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a stop
****** processing cwagent-otel-collector ******
cwagent-otel-collector has already been stopped
****** processing amazon-cloudwatch-agent ******
Redirecting to /bin/systemctl stop amazon-cloudwatch-agent.service
$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -a fetch-config -m ec2 -s -c file:config.json
****** processing amazon-cloudwatch-agent ******
/opt/aws/amazon-cloudwatch-agent/bin/config-downloader --output-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --download-source file:config.json --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
Successfully fetched the config and saved in /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json.tmp
Start configuration validation...
/opt/aws/amazon-cloudwatch-agent/bin/config-translator --input /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.json --input-dir /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d --output /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml --mode ec2 --config /opt/aws/amazon-cloudwatch-agent/etc/common-config.toml --multi-config default
2021/08/31 02:45:37 Reading json config file path: /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.d/file_config.json.tmp ...
Valid Json input schema.
I! Detecting run_as_user...
Configuration validation first phase succeeded
/opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent -schematest -config /opt/aws/amazon-cloudwatch-agent/etc/amazon-cloudwatch-agent.toml
Configuration validation second phase succeeded
Configuration validation succeeded
amazon-cloudwatch-agent has already been stopped
Redirecting to /bin/systemctl restart amazon-cloudwatch-agent.service
$ sudo /opt/aws/amazon-cloudwatch-agent/bin/amazon-cloudwatch-agent-ctl -m ec2 -a status
{
"status": "running",
"starttime": "2021-08-31T02:45:37+0000",
"configstatus": "configured",
"cwoc_status": "stopped",
"cwoc_starttime": "",
"cwoc_configstatus": "not configured",
"version": "1.247349.0b251399"
}