亚马逊AWS官方博客

DX 维护通知全球自动化处理方案 — 基于 Severless 的跨账号/跨区域实践

前言

“我们的网管系统只能接收和处理带 Excel 附件的邮件,AWS PHD(Personal Health Dashboard) 只会发送纯文本/HTML 邮件,不符合要求。”

当字节跳动的运维团队提出这个需求时,我立刻意识到这不只是“发邮件”这么简单——它涉及跨账号/跨区域事件聚合、结构化数据生成、去重逻辑审计存档。本文将分享如何利用 AWS Serverless 服务,为客户构建一套全自动化的 Direct Connect 维护通知系统,实现 3 个账号、9+ 个区域的 DX 维护事件统一收集、集中处理和通知。该方案显著提升了运维效率,并保证了维护事件信息的准确性和及时性。

目录

  1. 业务背景与挑战
  2. 整体架构设计
  3. EventBridge 跨账号跨区域配置
  4. Lambda 处理函数实现
  5. DynamoDB 去重与存储设计
  6. SES 邮件发送 & Excel 附件生成
  7. 测试与验证
  8. 运维要点 & 成本概览
  9. 总结 & 未来扩展

业务背景与挑战

客户概况

字节跳动(ByteDance)是一家领先的互联网科技公司,成立于 2012 年。旗下拥有抖音、TikTok、今日头条、西瓜视频、飞书、Lark等产品,业务遍布全球多个国家和地区。

在字节跳动的全球化业务扩展和上云迁移过程中,保持网络服务的连续性与可审计性成为关键需求。为实现低时延、高可靠的混合云网络,字节跳动在 AWS 上部署了大量 Direct Connect(DX)专线,支撑内容分发、实时通信、数据分析以及企业协作等关键业务。

资源分布

账号 角色 DX 资源所在 Region
账号 A(123456789001 主账号 9 个 Region(us‑east‑1、eu‑west‑1、ap‑southeast‑1、ap‑northeast‑1、…)
账号 B(123456789002 子账号 仅 ap‑southeast‑1(新加坡)
账号 C(123456789003 子账号 仅 ap‑southeast‑1(新加坡)

网管系统(NOC)需求

  1. 邮件必须携带特定格式的 Excel 附件(工单编号、线路标识、维护时间、影响区域等),系统会自动解析并生成工单。
  2. 邮件标题、正文格式、附件字段与格式必须严格符合内部约定,否则解析失败会导致人工干预。

现状痛点

痛点 说明
AWS PHD 只能发送纯文本/HTML 无法直接生成 Excel,且不支持附件
跨账号、跨 Region 事件碎片化 每个账号、每个 Region 的维护通知都会单独发送,网管系统需要分别处理
去重困难 一个专线维护计划可能生成不同类型的通知,它们的eventArn相同,如果仅用eventArn去重,会错误地过滤掉后续可能的取消(CANCELLED通知
审计需求 必须永久保存所有通知记录,便于事后审计与追溯

目标

构建一套跨账号、跨区域、统一收集、结构化输出、可靠投递、永久审计的 DX 维护通知自动化系统。

整体架构设计

使用的 AWS 服务

服务 部署位置 用途
EventBridge 账号 A (us-east-1) 自定义 Bus,集中接收事件
Lambda 账号 A (us-east-1) 事件处理、Excel 生成、邮件发送
DynamoDB 账号 A (us-east-1) 事件去重、存储处理记录
SES 账号 A (us-east-1) 发送带附件的邮件
CloudWatch 账号 A (us-east-1) 监控指标和日志
EventBridge 各账号各 Region Default Bus,转发事件

四步流程说明:DX维护事件自动化处理流程

本系统通过 EventBridge、Lambda、DynamoDB 和 SES 实现跨账号、跨区域的 DX 维护事件自动化处理。整个流程分为以下四步:

步骤 1:事件路由 —— 跨账号/跨区域事件集中汇聚

所有 DX 维护事件(AWS Health 服务类型为 DIRECTCONNECT 的事件)从各账号各 Region 的 Default Bus 统一转发至主账号 us-east-1 的自定义 EventBridge Bus( BD-DX-Notification)。

  • 同账号同 Region:账号A中 us-east-1 Region 通过 BD‑DX‑Notification Bus Rule 直接转发至中心总线
  • 同账号跨 Region:账号A中除 us-east-1 外的8个 Region 通过 Default Bus Rule 转发至中心总线
  • 跨账号:账号B、C通过主账号中心总线的 Resource-Based Policy 获得写入权限,同样将事件发往主账号中心总线
  • 事件过滤:仅捕获 source: aws.healthdetail.service: DIRECTCONNECT 的事件,确保仅处理目标事件

此步骤确保事件统一入口,为后续集中处理提供数据基础。详见“EventBridge 跨账号跨区域配置”章节。

步骤 2:事件处理 —— Lambda 函数集中执行业务逻辑

事件进入中心 EventBridge Bus 后,由部署在账号A us-east-1 的 Lambda 函数处理,完成以下核心操作:

  • 事件校验:确认事件来源( aws.health )与事件类型(仅处理 SCHEDULED/CANCELLED/EMERGENCY )
  • 去重处理:以 eventArn#eventTypeCode 作为组合键,在 DynamoDB 中检查是否已处理
  • 信息结构化:提取并转换时间字段(UTC → UTC+8)、生成工单编号、封装 Excel 数据结构
  • 附件生成:调用 openpyxl Layer 生成符合网管系统格式的 Excel 附件
  • 邮件准备:组装邮件正文与附件路径,为发送做准备

此步骤为核心处理层,所有业务逻辑在此执行。详见“Lambda 处理函数实现”章节。

步骤 3:数据持久化与监控 —— DynamoDB 存储 + CloudWatch 上报

处理结果无论成功与否,均记录至 DynamoDB 表 dx-maintenance-events,同时上报关键指标与日志到 CloudWatch。

  • DynamoDB 记录字段:包括 eventDedupKey(主键)、原始事件 ARN、工单号、资源列表、区域、账号、邮件发送状态等。
  • DynamoDB TTL 关闭:满足审计需求,记录永久保留
  • CloudWatch 指标与日志:上报 Lambda 函数关键指标与日志,支持监控与告警

此步骤保障操作可追溯、状态可监控。详见“DynamoDB 去重与存储设计”及“运维要点 & 成本概览”。

步骤 4:通知投递 —— SES 发送带附件邮件至 NOC

Lambda 调用 SES 服务,将结构化邮件(含 Excel 附件)发送至 ByteDance NOC 指定收件人邮箱。

  • 邮件标题含工单编号,正文包含维护时间、区域、资源、事件类型等关键信息
  • 附件 Excel 文件表头保留 GMT+8 格式(历史兼容),内容字段与网管系统解析规则一致
  • 发送机制:采用指数退避 + 最大 3 次重试,提升送达率
  • 自动处理:邮件到达后由 NOC 系统自动解析和处理,完成运维闭环

此步骤实现整个自动化流程的最终交付。详见“SES 邮件发送 & Excel 附件生成”章节。

架构关键点

  1. 事件聚合层:集中式 EventBridge 总线设计
    • 所有 Direct Connect(DX)维护事件统一投递到主账号 us‑east‑1 的自定义总线 BD‑DX‑Notification,实现跨账号、跨 Region 的事件聚合
  1. 业务处理层:无状态 Lambda 引擎集中执行
    • Lambda 函数实现:事件校验 → 去重 → 工单编号生成 → Excel 生成 → SES 发送邮件(带附件) → DynamoDB 记录
  1. 状态持久层:审计记录 + 监控
    • DynamoDB:以 eventDedupKey 为主键写入完整记录(原始 ARN、工单号、资源列表、区域、账号、邮件发送状态等),TTL 关闭,实现永久保存
    • CloudWatch:上报指标并记录结构化日志,支撑实时监控与告警
  1. 通知投递层:SES 可靠邮件发送 + 自动解析闭环
    • 可靠投递 – SES:Lambda 调用 SES 发送带 Excel 附件的邮件,采用指数退避并最多重试 3 次,提升送达成功率
    • 闭环自动化:邮件送达后,ByteDance NOC 系统自动解析 Excel,生成工单并完成运维闭环,实现全链路零人工

后续章节将逐层展开各组件配置与代码细节,帮助读者复现完整架构。从 EventBridge 策略、Lambda 逻辑、DynamoDB 建表、到 SES 发送机制,均提供可落地的配置与代码示例。

EventBridge 跨账号跨区域配置

事件流向汇总

来源 路径 说明
账号 A 的 us-east-1(IAD) 账号 A IAD 自定义 Bus 同账号同区域
账号 A 的其它8个 Region 各 Region Default Bus → IAD 自定义 Bus 同账号跨区域
账号 B 的 SIN SIN Default Bus → 账号 A IAD 自定义 Bus 跨账号跨区域
账号 C 的 SIN SIN Default Bus → 账号 A IAD 自定义 Bus 跨账号跨区域

主账号(A)创建自定义 Bus

参数
Region us-east-1
Bus 名称 BD-DX-Notification

AWS Console → EventBridge → Buses → Create event bus 完成。

为自定义 Bus 添加跨账号写入权限

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Sid": "AllowCrossAccountPutEvents123456789002",
      "Effect": "Allow",
      "Principal": { "AWS": "arn:aws:iam::123456789002:root" },
      "Action": "events:PutEvents",
      "Resource": "arn:aws:events:us-east-1:123456789001:event-bus/BD-DX-Notification"
    },
    {
      "Sid": "AllowCrossAccountPutEvents123456789003",
      "Effect": "Allow",
      "Principal": { "AWS": "arn:aws:iam::123456789003:root" },
      "Action": "events:PutEvents",
      "Resource": "arn:aws:events:us-east-1:123456789001:event-bus/BD-DX-Notification"
    }
  ]
}

未来扩展时,只需再添加如下 7 行 JSON 即可把新账号纳入同一体系。

{
  "Sid": "AllowCrossAccountPutEvents12345678900x",
  "Effect": "Allow",
  "Principal": { "AWS": "arn:aws:iam::12345678900x:root" },
  "Action": "events:PutEvents",
  "Resource": "arn:aws:events:us-east-1:123456789001:event-bus/BD-DX-Notification"
}

自定义 Bus 上创建处理 Rule(指向 Lambda)

Event Pattern

{
  "detail-type": ["AWS Health Event"],
  "source": ["aws.health"],
  "detail": {
    "service": ["DIRECTCONNECT"]
  }
}

Target 为 Lambda ARN(dx-maintenance-handler)。

子账号(B、C)在 默认 Bus 上创建转发 Rule

Event Pattern(同上)

{
  "detail-type": ["AWS Health Event"],
  "source": ["aws.health"],
  "detail": {
    "service": ["DIRECTCONNECT"]
  }
}

Target:指向主账号的 Bus ARN:arn:aws:events:us-east-1:123456789001:event-bus/BD-DX-Notification

Lambda 处理函数实现

Lambda Layer – openpyxl

因为 AWS Lambda 环境不自带 openpyxl,我们把它打包成 Layer 并上传到 Lambda,在 Lambda 配置里 Add layer → openpyxl

环境变量

变量 示例 说明
SENDER_EMAIL dx-notify@example.com SES 已验证的发件人,必填
RECIPIENT_EMAILS noc@example.com,ops@example.com 逗号分隔的收件人列表,必填
DYNAMODB_TABLE_NAME dx-maintenance-events 去重表名,必填
DEDUP_WINDOW_DAYS 15 去重窗口(天),选填
TIMEZONE_OFFSET 8 本地时区偏移(UTC+8),选填
MAX_RETRY_ATTEMPTS 3 SES 发送重试次数,选填
MAINTENANCE_REASON Scheduled maintenance 附件中 “Reason” 字段默认值,选填

核心代码

处理流程解析(12 个阶段)

  1. 验证事件源

检查 source 是否为 aws.health,其他来源直接跳过。

  1. 验证事件类型

仅处理 SCHEDULED、CANCELLED、EMERGENCY 三种类型,其余类型跳过。

  1. 去重检查

eventArn#eventTypeCode 为组合键,在 DynamoDB 中检查是否已在 DEDUP_WINDOW_DAYS 时间窗口内处理过。

  • 关键设计eventArn 相同但 eventTypeCode 不同(如 SCHEDULED → CANCELLED),视为新事件处理,避免错过取消通知。
  1. 事件信息提取

从事件中提取:startTime、endTime、resources(受影响的 DX 连接 ID)、region、account(触发事件的区域与账号)等信息。

  1. 时间转换
  • 维护时间从 UTC 转换为本地时区(UTC+8)
  • 邮件发送时间同样以本地时区记录
  • Excel 表头保留 GMT+8 时间戳(历史兼容要求)
  1. 工单编号生成

生成唯一工单编号格式:AWS-MMDD-10位随机数(如 AWS-0815-1234567890),基于维护开始时间的月日。

  1. 维护类型判定
  • 如事件类型为 CANCELLED,标记为 “Cancel”
  • 其余事件类型,标记为 “New”
  1. 准备 Excel 数据结构

组装用于 Excel 的字段,包括:

  • 序号、邮件发送时间、维护起止、工单编号、资源列表、类型、区域、账号、事件 ARN 等
  • 保证字段顺序和命名与网管系统解析规则严格一致
  1. Excel 文件生成

使用 openpyxl 生成符合网管系统解析规范的 Excel 文件:

  • 表头使用 GMT+8 格式
  • 自动调整列宽,确保内容不截断
  • 临时文件存储,处理完成后自动清理
  1. 邮件发送
  • 调用 SES send_raw_email
  • 邮件标题含工单编号,正文含关键字段(维护时间、区域、资源、事件类型)
  • 附件为生成的 Excel 文件
  • 采用指数退避策略重试 3 次(2s, 4s, 8s),提升送达率
  1. 记录处理结果

无论邮件是否发送成功,均写入 DynamoDB 表:

  • 主键:eventDedupKey
  • 字段包含:事件 ARN、去重键、工单编号、资源、区域、账号、邮件状态、处理时间、尝试次数、版本号等
  • 关键字段:emailSent, processingAttempts, recordVersion=3, retentionPolicy=permanent
  • 禁用 TTL,满足长期审计要求
  1. 返回结果
  • 成功发送邮件 → 返回 200
  • 发送失败但记录成功 → 返回 206(Partial Content)
  • 结果含:工单编号、去重键、事件类型、邮件状态、时间戳、版本号

异常处理与监控

  • 全函数异常捕获:防止 Lambda 因未处理异常终止,导致事件丢失
  • 指标上报:Lambda 函数执行结果相关指标上报到 CloudWatch
  • 日志输出:关键步骤打点并上报日志到 CloudWatch,便于问题定位与审计追溯

部署建议

  • 将 openpyxl 打包为 Lambda Layer,保持函数包体积小于 50MB。
  • 确保 Lambda 执行角色具备权限:dynamodb:PutItem、dynamodb:GetItem、ses:SendRawEmail、cloudwatch:PutMetricData
  • 设置 Lambda 超时≥10秒,内存≥128MB(复杂资源列表可调至 512MB)。

本函数已通过多类事件测试,支持生产部署。如需快速部署,可直接复制【附录一】完整代码并设置对应环境变量。

DynamoDB 去重与存储设计

重要字段设计

字段 类型 说明
eventDedupKey String (PK) eventArn#eventTypeCode组合键
eventArn String 原始 PHD 事件 ARN
eventTypeCode String 事件类型(SCHEDULED / CANCELLED / EMERGENCY
emailStatus String sent/failed
resources List 受影响的 DX 资源
region、account String 事件所在区域 & 账号

不启用 TTL:审计需求要求所有记录永久保存,故在创建表时不勾选 TTL

控制台创建步骤

  1. 打开 DynamoDB → Create table
  2. Table name:dx-maintenance-events
  3. Partition key:eventDedupKey(String)
  4. Settings → CustomizeOn‑demand(按需付费)
  5. TTL:保持 关闭(默认)
  6. Create → 等待状态变为 Active

若已有旧表(PK 为eventArn),请删除后重新创建,因为 DynamoDB 不支持修改 Partition Key。

SES 邮件发送 & Excel 附件生成

SES 前置准备

步骤 操作
验证发件人 在 SES 控制台 → Email AddressesVerify a New Email Address(或域名)
解除 Sandbox 提交提升配额请求,解除收件人数量限制
IAM 权限 Lambda 执行角色需要 ses:SendRawEmail 权限

Excel 附件字段(保持 GMT+8 表头)

列名 示例
Sequence Number 1
Email SentTime (GMT+8) 2025/08/01 22:30:00
Maintenance StartTime (GMT+8) 2025/08/11 00:00:00
Maintenance EndTime (GMT+8) 2025/08/11 04:00:00
Maintenance Ticket Number AWS-0811-1234567890
Impacted Links dxcon-ffxusnwy, dxlag-fgyvf31s
Type New/Cancel
Vendor AWS
Region ap-southeast-1
Account 123456789002
Event ARN arn:aws:health:ap-southeast-1::event/DIRECTCONNECT/...

重试机制

为保障邮件送达率,系统在调用 SES 发送邮件时采用指数退避重试策略,最多重试 MAX_RETRY_ATTEMPTS 次(默认 3 次)。

测试与验证

使用以下四类 PHD 事件(计划、紧急、取消、完成)进行测试,下面列出了每类事件的预期行为:

# 事件类型 事件代码 预期 Lambda action 备注
1 计划维护 AWS_DIRECTCONNECT_MAINTENANCE_SCHEDULED processed(邮件New 正常计划维护
2 重复的计划维护 同上、相同 ARN skippedreason: duplicate 去重验证
3 紧急维护 AWS_DIRECTCONNECT_EMERGENCY_MAINTENANCE_SCHEDULED processed(邮件New 紧急维护
4 取消维护 AWS_DIRECTCONNECT_MAINTENANCE_CANCELLED processed(邮件Cancel 取消通知(即使 ARN 相同)
5 维护完成 AWS_DIRECTCONNECT_MAINTENANCE_COMPLETE skipped(reason: unmonitored) 不在监控列表,默认跳过

一个 DX 维护触发的自动邮件通知如下:

运维要点 & 成本概览

组件 计费方式 典型月费用(低频)
EventBridge 事件数(每月 ~100 条) <$1
Lambda 调用次数 & 执行时间(每月 ~100 次) <$1
DynamoDB On‑demand 写/读 + 存储(约 10 KB/条) <$1
SES 发送邮件(每封 ~10 KB) <$1
CloudWatch 日志 & 指标(低频) ~$2
总计 ≈ $5 /

常见问题

症状 可能原因 解决办法
邮件未送达 SES 仍在 Sandbox、发件人未验证 完成域/邮箱验证并提升配额
重复通知仍被发送 DynamoDB 表仍使用旧的eventArn作为 PK 确认已迁移到新表 eventDedupKey,或重新创建表
跨账号事件进不来 Bus Policy 漏写账号 ID 检查 ResourcePrincipal 是否匹配
Lambda 超时 大批资源导致 Excel 生成慢 调整 Lambda 内存(256 → 512 MB)或拆分资源列表

总结 & 未来扩展

方案价值

维度 收获
自动化 3 账号、9+ Region 的 DX 维护全链路零人工
统一管理 所有事件集中到主账号 us-east-1,便于审计与监控
可靠性 组合键去重 + SES 重试 + CloudWatch 监控
成本 Serverless 按需付费,月费用约 5 USD
可扩展性 新账号只需 7 行 Policy + 一条 Rule 即可加入全局体系

未来扩展

  1. 支持更多 AWS 服务(RDS、VPN、Transit Gateway 等)——只需在 EVENT_TYPE_LISTeventPattern 中添加对应 service
  2. 多渠道通知:在 Lambda 中加入 Slack、飞书 Lark,或 SNS 主题
  3. 可视化仪表盘:使用 CloudWatch Dashboard 或 QuickSight 展示每月维护次数、成功率、取消率等关键指标

附录

  1. Lambda 代码
import boto3
import json
import hashlib
import time
import os
import tempfile
import random
from datetime import timedelta, datetime, timezone
from typing import TypedDict, NotRequired, Optional
from botocore.exceptions import ClientError
from openpyxl import Workbook
from email.mime.multipart import MIMEMultipart
from email.mime.text import MIMEText
from email.mime.application import MIMEApplication

# ========================================
# 类型定义
# ========================================

class TimeInfo(TypedDict):
    iso8601: str
    unix_timestamp: int
    human_readable: str
    date_only: str
    year_month: str

class CheckResult(TypedDict):
    processed: bool
    details: NotRequired[dict]
    createdAt: NotRequired[str]
    ageInDays: NotRequired[float]

class EventDetails(TypedDict):
    eventTypeCode: str
    startTime: str
    endTime: str
    region: str
    account: str
    resources: list[str]
    ticketNumber: str
    emailStatus: str
    processingAttempts: int

# ========================================
# 配置项 - 全部从环境变量读取
# ========================================

EVENT_TYPE_LIST = [
    'AWS_DIRECTCONNECT_EMERGENCY_MAINTENANCE_SCHEDULED',
    'AWS_DIRECTCONNECT_MAINTENANCE_SCHEDULED',
    'AWS_DIRECTCONNECT_MAINTENANCE_CANCELLED'
]

SUPPORTED_EVENT_SOURCES = ['aws.health', 'bd.aws.health']

# 必需的环境变量
SENDER_EMAIL = os.environ.get('SENDER_EMAIL')
RECIPIENT_EMAILS_STR = os.environ.get('RECIPIENT_EMAILS')
DYNAMODB_TABLE_NAME = os.environ.get('DYNAMODB_TABLE_NAME')

# 可选的环境变量(带默认值)
DEDUP_WINDOW_DAYS = int(os.environ.get('DEDUP_WINDOW_DAYS', '15'))
DATE_FORMAT = os.environ.get('DATE_FORMAT', '%Y/%m/%d %H:%M:%S')
TIMEZONE_OFFSET = int(os.environ.get('TIMEZONE_OFFSET', '8'))  # UTC+8
MAINTENANCE_REASON = os.environ.get('MAINTENANCE_REASON', 'OS upgrade')
MAX_RETRY_ATTEMPTS = int(os.environ.get('MAX_RETRY_ATTEMPTS', '3'))

# 验证必需的环境变量
if not SENDER_EMAIL:
    raise ValueError("SENDER_EMAIL environment variable is required")
if not RECIPIENT_EMAILS_STR:
    raise ValueError("RECIPIENT_EMAILS environment variable is required")
if not DYNAMODB_TABLE_NAME:
    raise ValueError("DYNAMODB_TABLE_NAME environment variable is required")

RECIPIENT_EMAILS = [email.strip() for email in RECIPIENT_EMAILS_STR.split(',')]

# AWS 客户端初始化
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table(DYNAMODB_TABLE_NAME)
ses_client = boto3.client('ses')
cloudwatch = boto3.client('cloudwatch')

# ========================================
# 工具函数
# ========================================

def get_current_time_info() -> TimeInfo:
    now_utc = datetime.now(timezone.utc)
    return TimeInfo(
        iso8601=now_utc.strftime('%Y-%m-%dT%H:%M:%SZ'),
        unix_timestamp=int(now_utc.timestamp()),
        human_readable=now_utc.strftime('%Y-%m-%d %H:%M:%S UTC'),
        date_only=now_utc.strftime('%Y-%m-%d'),
        year_month=now_utc.strftime('%Y-%m')
    )

def get_current_time_local() -> str:
    now_utc = datetime.now(timezone.utc)
    now_local = now_utc + timedelta(hours=TIMEZONE_OFFSET)
    return now_local.strftime(DATE_FORMAT)

def get_dedup_window_timestamp() -> int:
    window_start = datetime.now(timezone.utc) - timedelta(days=DEDUP_WINDOW_DAYS)
    return int(window_start.timestamp())

def generate_dedup_key(event_arn: str, event_type_code: str) -> str:
    return f"{event_arn}#{event_type_code}"

def generate_ticket_number(start_time):
    month = "{:02d}".format(start_time.month)
    day = "{:02d}".format(start_time.day)
    month_day = "{}{}".format(month, day)
    random_number = random.randint(1000000000, 9999999999)
    # 组装票号
    ticket = "AWS-{}-{}".format(month_day, random_number)
    
    print("[TICKET] Generated ticket number: {}".format(ticket))
    print("[TICKET] Maintenance start time: {}".format(start_time.strftime('%Y-%m-%d %H:%M:%S')))
    print("[TICKET] Breakdown: MonthDay={}, RandomNumber={}".format(month_day, random_number))
    
    return ticket

def convert_time_with_timezone(time_str: str, 
                               input_format: str = "%a, %d %b %Y %H:%M:%S %Z",
                               output_format: str = None,
                               add_hours: int = 0) -> datetime:
    try:
        if output_format is None:
            output_format = DATE_FORMAT
            
        # 解析时间
        time_obj = datetime.strptime(time_str, input_format)
        
        # 添加时区偏移
        if add_hours != 0:
            time_obj = time_obj + timedelta(hours=add_hours)
        
        return time_obj
        
    except Exception as e:
        print(f"[ERROR] Time conversion failed: {e!r}")
        print(f"[ERROR] Input: {time_str}, Format: {input_format}")
        raise

def format_datetime(dt: datetime, format_str: str = None) -> str:
    """格式化datetime对象"""
    if format_str is None:
        format_str = DATE_FORMAT
    return dt.strftime(format_str)

# ========================================
# DynamoDB 操作
# ========================================

def check_event_processed_in_window(event_arn: str, event_type_code: str) -> CheckResult:
    try:
        dedup_key = generate_dedup_key(event_arn, event_type_code)
        
        print(f"[CHECK] Checking event dedup key: {dedup_key}")
        print(f"[CHECK] Event ARN: {event_arn}")
        print(f"[CHECK] Event Type: {event_type_code}")
        print(f"[CHECK] Dedup window: {DEDUP_WINDOW_DAYS} days")
        
        window_start_timestamp = get_dedup_window_timestamp()
        current_timestamp = int(time.time())
        
        window_start_date = datetime.fromtimestamp(window_start_timestamp, tz=timezone.utc)
        print(f"[CHECK] Window start: {window_start_date.strftime('%Y-%m-%d %H:%M:%S UTC')}")
        
        # 使用 dedup_key 作为查询键
        response = table.get_item(
            Key={'eventDedupKey': dedup_key},
            ConsistentRead=True
        )
        
        item = response.get('Item')
        
        if not item:
            print(f"[NEW] Event not found in DynamoDB (dedup_key: {dedup_key})")
            return CheckResult(processed=False)
        
        created_timestamp = int(item.get('createdAtTimestamp', 0))
        
        if created_timestamp >= window_start_timestamp:
            created_at = item.get('createdAt', 'Unknown')
            age_seconds = current_timestamp - created_timestamp
            age_days = age_seconds / (24 * 60 * 60)
            
            print(f"[DUPLICATE] Event found within {DEDUP_WINDOW_DAYS}-day window")
            print(f"[DUPLICATE] Original created: {created_at}")
            print(f"[DUPLICATE] Age: {age_days:.2f} days")
            print(f"[DUPLICATE] Ticket: {item.get('ticketNumber', 'N/A')}")
            print(f"[DUPLICATE] Email status: {item.get('emailStatus', 'unknown')}")
            
            return CheckResult(
                processed=True,
                details=item,
                createdAt=created_at,
                ageInDays=age_days
            )
        else:
            age_days = (current_timestamp - created_timestamp) / (24 * 60 * 60)
            print(f"[INFO] Event found but outside window (age: {age_days:.2f} days)")
            return CheckResult(processed=False)
            
    except ClientError as e:
        error_code = e.response['Error']['Code']
        if error_code == 'ResourceNotFoundException':
            print(f"[NEW] Event not found (table might not exist)")
            return CheckResult(processed=False)
        print(f"[ERROR] DynamoDB error: {error_code}")
        raise
        
    except Exception as e:
        print(f"[ERROR] Unexpected error in check: {e!r}")
        raise

def record_event_processed(event_arn: str,
                          event_type_code: str,
                          event_details: EventDetails, 
                          event_source: str,
                          email_sent: bool) -> bool:
    try:
        time_info = get_current_time_info()
        dedup_key = generate_dedup_key(event_arn, event_type_code)
        
        # 获取当前尝试次数(如果存在)
        current_attempts = 1
        try:
            existing_item = table.get_item(Key={'eventDedupKey': dedup_key})
            if 'Item' in existing_item:
                current_attempts = existing_item['Item'].get('processingAttempts', 0) + 1
        except:
            pass
        
        item = {
            # 主键:使用组合去重键
            'eventDedupKey': dedup_key,
            
            # 原始标识(用于查询和审计)
            'eventArn': event_arn,
            'eventTypeCode': event_details.get('eventTypeCode', ''),
            
            # 时间信息
            'createdAt': time_info['iso8601'],
            'createdAtTimestamp': time_info['unix_timestamp'],
            'createdDate': time_info['date_only'],
            'createdYearMonth': time_info['year_month'],
            'recordCreatedTime': time_info['human_readable'],
            
            # 事件详情
            'maintenanceStartTime': event_details.get('startTime', ''),
            'maintenanceEndTime': event_details.get('endTime', ''),
            'region': event_details.get('region', ''),
            'account': event_details.get('account', ''),
            'resources': event_details.get('resources', []),
            'ticketNumber': event_details.get('ticketNumber', ''),
            'eventSource': event_source,
            
            # 处理状态
            'emailStatus': 'sent' if email_sent else 'failed',
            'emailSent': email_sent,
            'processingAttempts': current_attempts,
            'lastProcessedAt': time_info['iso8601'],
            
            # 元数据
            'recordVersion': 3,  # 版本号升级,反映去重逻辑变更
            'processedBy': 'lambda-dx-maintenance-handler-v3',
            'dedupWindowDays': DEDUP_WINDOW_DAYS,
            'retentionPolicy': 'permanent',
            'auditEnabled': True
        }
        
        print(f"[RECORD] Recording to DynamoDB...")
        print(f"[RECORD] Dedup Key: {dedup_key}")
        print(f"[RECORD] Event ARN: {event_arn}")
        print(f"[RECORD] Event Type: {event_type_code}")
        print(f"[RECORD] Event Source: {event_source}")
        print(f"[RECORD] Ticket: {event_details.get('ticketNumber', 'N/A')}")
        print(f"[RECORD] Email Status: {'sent' if email_sent else 'failed'}")
        print(f"[RECORD] Attempt: {current_attempts}")
        print(f"[RECORD] Timestamp: {time_info['iso8601']}")
        
        table.put_item(Item=item)
        
        print(f"[SUCCESS] Event recorded successfully")
        
        # 发送 CloudWatch 指标
        send_cloudwatch_metric('EventRecorded', 1)
        if not email_sent:
            send_cloudwatch_metric('EmailFailed', 1)
        
        return True
        
    except Exception as e:
        print(f"[ERROR] Failed to record event: {e!r}")
        send_cloudwatch_metric('RecordFailed', 1)
        raise

# ========================================
# Excel 生成
# ========================================

def create_excel_file(event_data: dict) -> str:
    excel_path = None
    
    try:
        print("[EXCEL] Creating workbook...")
        
        wb = Workbook()
        ws = wb.active
        ws.title = "Maintenance Notice"
        
        # 表头(注意:这里使用 GMT+8 格式,历史遗留要求)
        headers = [
            "Sequence Number",
            f"Email SentTime (GMT+{TIMEZONE_OFFSET})",
            f"Maintenance StartTime (GMT+{TIMEZONE_OFFSET})",
            f"Maintenance EndTime (GMT+{TIMEZONE_OFFSET})",
            "Maintenance Ticket Number",
            "Impacted Links",
            "Maintenance Impact",
            "Urgency",
            "Type",
            "Vendor",
            "Fault Report",
            "Reason",
            "Region",
            "Account",
            "Event ARN"
        ]
        ws.append(headers)
        
        # 数据行
        data_row = [
            event_data.get('sequence_number', 1),
            event_data.get('email_timestamp_local'),
            event_data.get('start_time_local'),
            event_data.get('end_time_local'),
            event_data.get('ticket_number'),
            event_data.get('affected_resources'),
            event_data.get('maintenance_impact', 'Yes'),
            event_data.get('urgency', 'Yes'),
            event_data.get('maint_type'),
            event_data.get('vendor', 'AWS'),
            event_data.get('any_wording', 'No'),
            event_data.get('maintenance_reason', MAINTENANCE_REASON),
            event_data.get('region'),
            event_data.get('account'),
            event_data.get('event_arn', '')
        ]
        ws.append(data_row)
        
        # 调整列宽
        for column in ws.columns:
            max_length = 0
            column_letter = column[0].column_letter
            for cell in column:
                try:
                    if len(str(cell.value)) > max_length:
                        max_length = len(str(cell.value))
                except:
                    pass
            adjusted_width = min(max_length + 2, 50)
            ws.column_dimensions[column_letter].width = adjusted_width
        
        # 使用临时文件
        with tempfile.NamedTemporaryFile(
            mode='w+b',
            suffix='.xlsx',
            prefix='AWS_DX_Maintenance_',
            delete=False
        ) as tmp_file:
            excel_path = tmp_file.name
        
        wb.save(excel_path)
        
        file_size = os.path.getsize(excel_path)
        print(f"[SUCCESS] Excel created: {excel_path}")
        print(f"[SUCCESS] File size: {file_size} bytes")
        
        return excel_path
        
    except Exception as e:
        print(f"[ERROR] Failed to create Excel: {e!r}")
        # 清理失败的文件
        if excel_path and os.path.exists(excel_path):
            try:
                os.remove(excel_path)
            except:
                pass
        raise

# ========================================
# 邮件发送
# ========================================

def create_email_with_attachment(subject: str, 
                                 body: str, 
                                 attachment_path: str,
                                 attachment_filename: str = None) -> str:
    try:
        msg = MIMEMultipart()
        msg['Subject'] = subject
        msg['From'] = SENDER_EMAIL
        msg['To'] = ', '.join(RECIPIENT_EMAILS)
        
        # 添加正文
        msg.attach(MIMEText(body, 'plain', 'utf-8'))
        
        # 添加附件
        if attachment_filename is None:
            filename = os.path.basename(attachment_path)
        else:
            filename = attachment_filename
        with open(attachment_path, 'rb') as f:
            attachment = MIMEApplication(f.read())
            attachment.add_header(
                'Content-Disposition',
                'attachment',
                filename=filename
            )
            msg.attach(attachment)
        
        print(f"[EMAIL] Message created successfully")
        print(f"[EMAIL] Attachment: {filename}")
        
        return msg.as_string()
        
    except Exception as e:
        print(f"[ERROR] Failed to create email: {e!r}")
        raise

def send_email_via_ses(subject: str, 
                       body: str, 
                       attachment_path: str,
                       attachment_filename: str = None,
                       max_retries: int = 3) -> bool:

    for attempt in range(1, max_retries + 1):
        try:
            print(f"[EMAIL] Sending attempt {attempt}/{max_retries}")
            print(f"[EMAIL] From: {SENDER_EMAIL}")
            print(f"[EMAIL] To: {RECIPIENT_EMAILS}")
            
            raw_message = create_email_with_attachment(subject, body, attachment_path, attachment_filename)
            
            response = ses_client.send_raw_email(
                Source=SENDER_EMAIL,
                Destinations=RECIPIENT_EMAILS,
                RawMessage={'Data': raw_message}
            )
            
            message_id = response['MessageId']
            print(f"[SUCCESS] Email sent! MessageId: {message_id}")
            
            # 发送成功指标
            send_cloudwatch_metric('EmailSent', 1)
            
            return True
            
        except ClientError as e:
            error_code = e.response['Error']['Code']
            error_msg = e.response['Error']['Message']
            print(f"[ERROR] SES ClientError (attempt {attempt}): {error_code} - {error_msg}")
            
            # 某些错误不需要重试
            non_retryable_errors = [
                'MessageRejected',
                'MailFromDomainNotVerified',
                'ConfigurationSetDoesNotExist'
            ]
            
            if error_code in non_retryable_errors:
                print(f"[ERROR] Non-retryable error, stopping attempts")
                send_cloudwatch_metric('EmailFailedNonRetryable', 1)
                return False
            
            if attempt < max_retries:
                wait_time = 2 ** attempt  # 指数退避
                print(f"[RETRY] Waiting {wait_time}s before retry...")
                time.sleep(wait_time)
            else:
                send_cloudwatch_metric('EmailFailedMaxRetries', 1)
                
        except Exception as e:
            print(f"[ERROR] Unexpected error (attempt {attempt}): {e!r}")
            if attempt < max_retries:
                time.sleep(2 ** attempt)
            else:
                send_cloudwatch_metric('EmailFailedUnexpected', 1)
    
    print(f"[FAILED] All {max_retries} email attempts failed")
    return False

# ========================================
# CloudWatch 指标
# ========================================

def send_cloudwatch_metric(metric_name: str, 
                           value: float = 1.0,
                           unit: str = 'Count',
                           dimensions: Optional[dict] = None):
    try:
        metric_data = {
            'MetricName': metric_name,
            'Value': value,
            'Unit': unit,
            'Timestamp': datetime.now(timezone.utc)
        }
        
        if dimensions:
            metric_data['Dimensions'] = [
                {'Name': k, 'Value': v} for k, v in dimensions.items()
            ]
        
        cloudwatch.put_metric_data(
            Namespace='DXMaintenance',
            MetricData=[metric_data]
        )
        
        print(f"[METRIC] Sent: {metric_name} = {value}")
        
    except Exception as e:
        # 指标发送失败不影响主流程
        print(f"[WARNING] Failed to send metric {metric_name}: {e!r}")

# ========================================
# 主处理函数
# ========================================

def lambda_handler(event, context):
    print("=" * 80)
    print("[START] Lambda function invoked")
    print(f"[CONFIG] Dedup window: {DEDUP_WINDOW_DAYS} days")
    print(f"[CONFIG] Timezone offset: UTC+{TIMEZONE_OFFSET}")
    print(f"[CONFIG] Max retries: {MAX_RETRY_ATTEMPTS}")
    print(f"[EVENT] Raw event: {json.dumps(event, default=str, indent=2)}")
    print("=" * 80)
    
    excel_path = None
    
    try:
        # ========================================
        # 1. 验证事件源
        # ========================================
        event_source = event.get('source', '')
        print(f"[VALIDATE] Event source: {event_source}")
        
        if event_source not in SUPPORTED_EVENT_SOURCES:
            print(f"[SKIP] Unsupported event source")
            send_cloudwatch_metric('EventSkipped', 1, dimensions={'Reason': 'UnsupportedSource'})
            
            return {
                'statusCode': 200,
                'body': json.dumps({
                    'message': 'Event source not supported',
                    'eventSource': event_source,
                    'supportedSources': SUPPORTED_EVENT_SOURCES
                })
            }
        
        # ========================================
        # 2. 验证事件类型
        # ========================================
        event_code = event['detail']['eventTypeCode']
        print(f"[VALIDATE] Event type: {event_code}")
        
        if event_code not in EVENT_TYPE_LIST:
            print(f"[SKIP] Event type not in monitoring list")
            send_cloudwatch_metric('EventSkipped', 1, dimensions={'Reason': 'UnmonitoredType'})
            
            return {
                'statusCode': 200,
                'body': json.dumps({
                    'message': 'Event type not monitored',
                    'eventTypeCode': event_code,
                    'monitoredTypes': EVENT_TYPE_LIST
                })
            }
        
        # ========================================
        # 3. 检查去重(基于 event_arn + event_type_code)
        # ========================================
        event_arn = event['detail']['eventArn']
        print(f"[VALIDATE] Event ARN: {event_arn}")
        print(f"[VALIDATE] Event Type Code: {event_code}")
        print(f"[VALIDATE] Dedup Key: {generate_dedup_key(event_arn, event_code)}")
        
        check_result = check_event_processed_in_window(event_arn, event_code)
        
        if check_result['processed']:
            print(f"[SKIP] Duplicate event within {DEDUP_WINDOW_DAYS}-day window")
            print(f"[SKIP] Same event_arn AND event_type_code already processed")
            send_cloudwatch_metric('EventSkipped', 1, dimensions={'Reason': 'Duplicate'})
            
            return {
                'statusCode': 200,
                'body': json.dumps({
                    'message': f'Duplicate event (within {DEDUP_WINDOW_DAYS}-day window)',
                    'eventArn': event_arn,
                    'eventTypeCode': event_code,
                    'dedupKey': generate_dedup_key(event_arn, event_code),
                    'eventSource': event_source,
                    'action': 'skipped',
                    'dedupWindowDays': DEDUP_WINDOW_DAYS,
                    'originalProcessingTime': check_result.get('createdAt'),
                    'recordAgeInDays': round(check_result.get('ageInDays', 0), 2),
                    'previousTicketNumber': check_result['details'].get('ticketNumber') if check_result.get('details') else None,
                    'previousEmailStatus': check_result['details'].get('emailStatus') if check_result.get('details') else None
                }, default=str)
            }
        
        print(f"[PROCESS] New event (or new event type for same ARN), proceeding with processing...")
        send_cloudwatch_metric('EventProcessing', 1)
        
        # ========================================
        # 4. 提取事件信息
        # ========================================
        start_time_raw = event['detail']['startTime']
        end_time_raw = event['detail']['endTime']
        ls_resources = event.get('resources', [])
        affected_resources = ', '.join(ls_resources) if ls_resources else 'N/A'
        region = event.get('region', 'unknown')
        account = event.get('account', 'unknown')
        
        print(f"[INFO] Region: {region}")
        print(f"[INFO] Account: {account}")
        print(f"[INFO] Resources: {affected_resources}")
        
        # ========================================
        # 5. 时间处理
        # ========================================
        print("[PROCESS] Converting timestamps...")
        
        # 转换维护时间(UTC -> UTC+TIMEZONE_OFFSET)
        start_time_utc = convert_time_with_timezone(start_time_raw)
        end_time_utc = convert_time_with_timezone(end_time_raw)
        
        start_time_local = start_time_utc + timedelta(hours=TIMEZONE_OFFSET)
        end_time_local = end_time_utc + timedelta(hours=TIMEZONE_OFFSET)
        
        # 当前时间(用于邮件发送时间)- 使用 UTC+TIMEZONE_OFFSET
        email_timestamp_utc = datetime.now(timezone.utc)
        email_timestamp_local = email_timestamp_utc + timedelta(hours=TIMEZONE_OFFSET)
        
        print(f"[TIME] Maintenance start (UTC): {format_datetime(start_time_utc)}")
        print(f"[TIME] Maintenance start (UTC+{TIMEZONE_OFFSET}): {format_datetime(start_time_local)}")
        print(f"[TIME] Email timestamp (UTC+{TIMEZONE_OFFSET}): {format_datetime(email_timestamp_local)}")
        
        # ========================================
        # 6. 生成票号(基于维护开始时间)
        # ========================================
        ticket_number = generate_ticket_number(start_time_local)
        
        # ========================================
        # 7. 确定维护类型
        # ========================================
        maint_type = "Cancel" if event_code == 'AWS_DIRECTCONNECT_MAINTENANCE_CANCELLED' else "New"
        print(f"[INFO] Maintenance type: {maint_type}")
        
        # ========================================
        # 8. 准备事件数据
        # ========================================
        event_data = {
            'sequence_number': 1,
            'email_timestamp_local': format_datetime(email_timestamp_local),
            'start_time_local': format_datetime(start_time_local),
            'end_time_local': format_datetime(end_time_local),
            'ticket_number': ticket_number,
            'affected_resources': affected_resources,
            'maintenance_impact': 'Yes',
            'urgency': 'Yes',
            'maint_type': maint_type,
            'vendor': 'AWS',
            'any_wording': 'No',
            'maintenance_reason': MAINTENANCE_REASON,
            'region': region,
            'account': account,
            'event_arn': event_arn
        }
        
        # ========================================
        # 9. 生成 Excel 文件
        # ========================================
        print("[PROCESS] Creating Excel file...")
        excel_path = create_excel_file(event_data)
        
        # ========================================
        # 10. 发送邮件
        # ========================================
        print("[PROCESS] Sending email notification...")
        
        subject = f'AWS Direct Connect Maintenance Notice - {ticket_number}'
        
        body = f"""Please check the attached summary of the AWS Direct Connect maintenance notice.

Maintenance Details:
{'=' * 60}
Ticket Number:      {ticket_number}
Event Type:         {maint_type}
Region:             {region}
Account:            {account}
Start Time:         {format_datetime(start_time_local)} (UTC+{TIMEZONE_OFFSET})
End Time:           {format_datetime(end_time_local)} (UTC+{TIMEZONE_OFFSET})
Affected Resources: {affected_resources}
Maintenance Type:   {event_code}
Event Source:       {event_source}
Event ARN:          {event_arn}
{'=' * 60}

Important Notes:
- This is an automated notification from the DX Maintenance Handler system
- Duplicate notifications within {DEDUP_WINDOW_DAYS} days are automatically filtered
- Note: Same event ARN with different event types (e.g., SCHEDULED vs CANCELLED) will be sent separately
- All events are permanently logged for audit purposes
- Processing attempts are tracked for reliability

For questions or issues, please contact AWS Support.

---
This message was generated at {format_datetime(email_timestamp_local)} (UTC+{TIMEZONE_OFFSET})
"""
        
        email_sent = send_email_via_ses(
            subject=subject,
            body=body,
            attachment_path=excel_path,
            attachment_filename=f'DX_Maintenance_{ticket_number}.xlsx',
            max_retries=MAX_RETRY_ATTEMPTS
        )
        
        # ========================================
        # 11. 记录到 DynamoDB(不论邮件是否成功)
        # ========================================
        print("[PROCESS] Recording event to DynamoDB...")
        
        event_details = EventDetails(
            eventTypeCode=event_code,
            startTime=start_time_raw,
            endTime=end_time_raw,
            region=region,
            account=account,
            resources=ls_resources,
            ticketNumber=ticket_number,
            emailStatus='sent' if email_sent else 'failed',
            processingAttempts=1
        )
        
        record_success = record_event_processed(
            event_arn=event_arn,
            event_type_code=event_code,
            event_details=event_details,
            event_source=event_source,
            email_sent=email_sent
        )
        
        # ========================================
        # 12. 准备返回结果
        # ========================================
        time_info = get_current_time_info()
        
        result = {
            'statusCode': 200 if email_sent else 206,  # 206 = Partial Content (邮件失败但已记录)
            'body': json.dumps({
                'message': 'Processing complete' if email_sent else 'Processing complete with email failure',
                'eventArn': event_arn,
                'eventTypeCode': event_code,
                'dedupKey': generate_dedup_key(event_arn, event_code),
                'eventSource': event_source,
                'ticketNumber': ticket_number,
                'maintenanceType': maint_type,
                'emailSent': email_sent,
                'recordedInDynamoDB': record_success,
                'recordTimestamp': time_info['iso8601'],
                'dedupWindowDays': DEDUP_WINDOW_DAYS,
                'retentionPolicy': 'permanent',
                'action': 'processed',
                'handlerVersion': 'v3.0',
                'processingTimestamp': time_info['iso8601'],
                'timezoneOffset': f'UTC+{TIMEZONE_OFFSET}'
            })
        }
        
        print("=" * 80)
        print(f"[SUCCESS] Lambda execution completed")
        print(f"[RESULT] Email: {'✓ Sent' if email_sent else '✗ Failed'}")
        print(f"[RESULT] DynamoDB: {'✓ Recorded' if record_success else '✗ Failed'}")
        print(f"[RESULT] {json.dumps(result, indent=2)}")
        print("=" * 80)
        
        return result
        
    except KeyError as e:
        error_msg = f"Missing required field: {e!r}"
        print(f"[ERROR] {error_msg}")
        send_cloudwatch_metric('ProcessingError', 1, dimensions={'ErrorType': 'KeyError'})
        
        return {
            'statusCode': 400,
            'body': json.dumps({
                'error': 'Invalid event format',
                'details': error_msg,
                'missingField': str(e)
            })
        }
        
    except Exception as e:
        error_msg = str(e)
        print(f"[ERROR] Unexpected error: {e!r}")
        send_cloudwatch_metric('ProcessingError', 1, dimensions={'ErrorType': 'Unexpected'})
        
        import traceback
        traceback.print_exc()
        
        return {
            'statusCode': 500,
            'body': json.dumps({
                'error': 'Internal processing error',
                'details': error_msg,
                'timestamp': datetime.now(timezone.utc).isoformat()
            })
        }
        
    finally:
        # ========================================
        # 清理临时文件
        # ========================================
        if excel_path and os.path.exists(excel_path):
            try:
                os.remove(excel_path)
                print(f"[CLEANUP] Removed temporary file: {excel_path}")
            except Exception as e:
                print(f"[WARNING] Failed to remove temporary file: {e!r}")

*前述特定亚马逊云科技生成式人工智能相关的服务目前在亚马逊云科技海外区域可用。亚马逊云科技中国区域相关云服务由西云数据和光环新网运营,具体信息以中国区域官网为准。

本篇作者

张蒙蒙

亚马逊云科技资深网络专家,负责亚马逊云科技网络相关的架构与解决方案设计。在企业网、运营商城域网与核心网、SDN、SD-WAN以及云网络等方向具备丰富的实践经验。张蒙蒙对Container/K8S相关技术和方案也具有深厚的兴趣和一定的研究。在加入亚马逊云科技之前,张蒙蒙曾历任Juniper、Versa和360企业安全等公司的高级技术支持工程师、资深解决方案架构师和SD-WAN产品总监等职位。

吴优

字节跳动系统运维工程师,隶属于云计算服务部门,负责云服务相关的架构设计与运维开发工作。吴优在分布式系统运维、组件开发以及云服务架构等方向具备丰富的实践经验,尤其在 ZooKeeper 等分布式协调服务的运维与优化方面有深入研究。在字节跳动,吴优负责云服务的全栈管理,涵盖架构设计、系统运维、性能优化等多个关键领域。在加入字节跳动之前,吴优曾任职于滴滴出行基础架构团队,担任组件运维开发工程师,主要负责 ZooKeeper 等核心分布式组件的运维开发工作。

AWS 架构师中心: 云端创新的引领者

探索 AWS 架构师中心,获取经实战验证的最佳实践与架构指南,助您高效构建安全、可靠的云上应用