序言
在快时尚跨境电商行业,客服体验直接影响转化率、复购率与品牌口碑。随着业务全球化、SKU 爆炸式增长以及促销活动高频化(如黑五、圣诞、季中大促),传统人工客服与基于 HTTP 的单向语音或文本机器人已难以满足“低延迟、可打断、强交互”的实时服务需求。
本文以快时尚电商实时语音智能客服为背景,系统介绍一种基于 WebSocket 实时双向通信 的云原生语音 Agent 架构。该架构以 Amazon Bedrock Nova 2 Sonic 提供底层双向流式语音能力,以 Strands Agents(BidiAgent) 负责编排对话与中断逻辑,并运行在 AgentCore Runtime 提供的生产级托管与安全隔离环境之上。
通过将 WebSocket 作为语音数据面的核心通信机制,该方案实现了:
- 真正的端到端实时语音流(而非请求/响应式调用)
- 用户自然插话(barge-in)与模型级中断控制
- 与电商订单、物流、退换货系统的实时协同
- 满足跨境电商对安全、合规与弹性扩展的要求
本文将从行业挑战、架构设计、关键组件与业务价值等角度,展示如何在快时尚电商场景中,构建一个以 WebSocket 为实时交互核心、可规模化落地的语音智能客服系统。
行业背景与挑战
快时尚电商客服的典型特征
快时尚跨境电商客服场景具有以下显著特点:
- 咨询高并发:大促期间咨询量呈指数级增长
- 问题强时效性:物流、尺码、库存、退换货需即时响应
- 强交互性:用户频繁插话、纠正、补充信息
- 语言全面覆盖需求
- 全面覆盖主要电商市场语言
- 同一会话中可能出现夹杂式表达
- 用户口音、语速差异显著,且客服响应必须保持一致体验
在实际业务中,快时尚品牌往往同时运营多个区域站点,但客服团队与系统能力却难以做到与市场扩展同步。这导致:
- 多语言客服人力成本急剧上升
- 非核心语种响应延迟显著高于主流市场
- 不同语言渠道的服务质量难以统一
传统 IVR 或单向语音机器人通常存在:
- 语言覆盖有限:仅支持少数主流语言,扩展成本高
- 对口音与混合表达鲁棒性不足
- 响应延迟高
- 无法被用户自然打断
- 与订单 / CRM 系统脱节
- 难以在云端安全、稳定地规模化
因此,行业迫切需要一种不仅“会说话”,而且能够在多语言、多口音、多市场条件下保持一致体验的实时语音智能体架构。
架构设计目标
本方案在设计之初明确了四个核心目标:
- 实时性(Real-time):语音输入到语音输出的端到端低延迟
- 可中断(Interruptible):支持用户自然插话与纠正
- 可执行(Actionable):不仅“能说”,还能“办事”
- 生产级(Production-ready):云原生、安全、可扩展
+----------------------------Client----------------------------+
| |
| Microphone ----> Audio Chunks (16kHz PCM) |
| | |
| v |
| WebSocket (SigV4 / Full-Duplex) |
| | |
| v |
| Speaker <---- Audio Stream / Interruption / Transcript |
+------------------------------+-------------------------------+
|
v
+-------------------- AgentCore Runtime (Managed) -------------+
| |
| +--------------- Isolated Session (microVM) -------------+ |
| | | |
| | +------------ Strands Agents -------------+ | |
| | | | | |
| | | BidiAgent (Concurrency / Interrupt) | | |
| | | | | | |
| | | v | | |
| | | Tool Calls / State / SOP | | |
| | | | | | |
| | | v | | |
| | | BidiNovaSonicModel | | |
| | | +- STT (Streaming) | | |
| | | +- Reasoning / LLM | | |
| | | +- TTS (Streaming) | | |
| | +-----------------------------------------+ | |
| | | |
| | Short-term Memory (Session) | |
| | (Optional Long-term Memory) | |
| +--------------------------------------------------------+ |
| |
| Observability / Guardrails / IAM / Auto-Scaling |
+--------------------------------------------------------------+
关键组件解析
Amazon Bedrock Nova 2 Sonic:实时语音大模型
Nova 2 Sonic 是 Bedrock 提供的双向流式语音模型,具备以下能力:
- 实时语音转理解(ASR + NLU)
- 实时语音生成(TTS)
- 原生支持 interruption / barge-in 事件
在客服场景中,Nova 2 Sonic 不再只是“语音转文字”的组件,而是对话引擎本身。
Strands Agents(BidiAgent):对话编排中枢
Strands Agents 在该方案中承担的是“对话外骨骼”的角色:
- 管理双向音频流(Input / Output)
- 监听模型产生的中断事件
- 执行业务规则与工具调用
- 控制对话节奏与状态
通过 BidiAgent,我们将:
- 传输层(WebSocket)
- 模型层(Nova 2 Sonic)
- 业务层(电商工具)
进行清晰解耦,避免形成“耦合式语音黑盒”。
AgentCore Runtime:生产级托管与治理层
AgentCore Runtime 为智能体提供了企业级运行环境:
- Serverless 托管:无需自管实例
- Session 生命周期管理:对话状态可持续
- IAM + SigV4 安全认证:满足跨境与合规要求
- 多协议支持:HTTP / WebSocket / MCP / A2A
在本案例中:
- HTTP
/invocations 用于控制与兜底
- WebSocket 用于实时音频数据面传输
这是一种典型的 Control Plane / Data Plane 分离设计。
三层架构设计的目的和必要性分析:
1. 为什么不直接从客户端使用 Nova 2 Sonic ?
虽然 Nova 2 Sonic 提供双向语音流 API,但直接从客户端(如手机、网页)调用会面临三个问题: 安全风险(凭证泄露): 直接调用意味着客户端需要持有你的 AWS IAM 密钥。一旦客户端被破解,你的账号额度可能被盗刷。 业务逻辑缺失: 纯模型调用只能聊天。如果你想让语音助手“帮我查一下我的订单”或“修改会议时间”,你需要在客户端编写复杂的工具调用(Tool Use)逻辑、状态机管理和音频流控制逻辑。 网络复杂性: 维持一个高稳定性的 WebSocket 双向连接对客户端的网络环境要求极高。
2. Strands Agents 的作用:智能体的“大脑外层”
Strands Agents 是一个轻量级 SDK,它封装了模型调用的繁琐细节。 简化并发管理: 在双向语音中,你需要同时处理“听”和“说”。Strands 提供了 BidiAgent 类,自动处理异步流、音频块的并发传输以及实时打断(Interruption)逻辑。 工具调用(Function Calling): 它是连接模型与现实世界的桥梁。通过 Strands,你可以用极其简洁的代码给助手挂载工具(如数据库查询、发送邮件),它会自动处理模型决定调用工具到执行工具的整个循环。 模型解耦: 如果需要更换 Nova 2 Sonic 的同类模型,使用 Strands 包装后,你可能只需更改一行代码,而不需要重写整个音频流处理逻辑。
3. AgentCore Runtime 的作用:企业的“生产装甲”
即使你用 Strands 写好了逻辑,你依然需要一台服务器来运行它。AgentCore Runtime 让“代码”到“可靠服务”变得简单: Serverless 自动伸缩: 你不需要管理服务器。AgentCore 会根据通话量自动扩容。 会话持久化与内存管理: 它内置了长短期记忆(Memory)功能。即使连接断开重连,AgentCore 也能让 AI “记得”刚才聊了什么。 安全隔离: 每个用户的对话都在独立的微虚拟机(microVM)或容器中运行,确保数据不会跨用户泄露。 监控与策略(Governance): 它提供了“护栏”功能(Policy),可以实时拦截 AI 的错误操作,并记录完整的对话轨迹(Tracing)用于审计和调优。
架构对比总结
| 层次 |
核心目的 |
如果没有它… |
| Nova 2 Sonic |
提供基础的语音转文字、思考、文字转语音能力。 |
自然插话、多语言理解和即时响将变得困难,用户在交互过程中的陪伴感与沉浸感将显著下降。 |
| Strands Agents |
处理复杂的双向流交互、业务逻辑、提升开发效率。 |
你需要处理 WebSocket 和音频缓冲。 |
| AgentCore Runtime |
提供安全认证、自动伸缩、长效记忆和生产监控。 |
弹性应对成千上万的突增并发会变得困难且成本高昂 |
关键能力详解
1. 实时中断(Barge-in)机制
在真实客服场景中,用户经常会:
本方案实现了完整的中断闭环:
用户插话
↓
Nova Sonic 识别中断
↓
BidiInterruptionEvent
↓
Agent 清空音频队列
↓
立即响应新输入
这一能力显著提升了“类真人”体验。
2. 面向电商的业务扩展能力
在基础语音对话之上,该架构可无缝扩展:
这些能力通过 Strands Agents 的 Tool Calling 实现,使语音 Agent 从“回答问题”升级为“直接解决问题”。
安全与合规设计
针对跨境快时尚电商的合规需求:
- 所有请求通过 AWS SigV4 签名
- Agent 运行在 受管隔离环境
- Session 与数据严格绑定
- 无需在客户端暴露模型密钥
这使得该方案可安全部署于北美、欧洲等合规要求严格的市场。
业务价值总结
在快时尚电商场景中,该实时语音智能客服方案带来:
- 显著降低人工客服压力
- 提升用户首响与问题解决速度
- 在大促期间保持体验一致性
- 为多语言、多市场扩展奠定基础
更重要的是,它提供了一条构建“企业级智能体平台”的清晰路径。
应用示例
部署准备
requirements.txt
bedrock-agentcore
strands-agents>=1.20.0
Dockerfile
# AgentCore Runtime WebSocket Server Dockerfile
# 用于部署快时尚电商语音客服系统到 AgentCore Runtime
FROM ghcr.io/astral-sh/uv:python3.13-bookworm-slim
WORKDIR /app
# 环境变量
ENV UV_SYSTEM_PYTHON=1 \
UV_COMPILE_BYTECODE=1 \
UV_NO_PROGRESS=1 \
PYTHONUNBUFFERED=1 \
DOCKER_CONTAINER=1 \
AWS_REGION=us-east-1 \
AWS_DEFAULT_REGION=us-east-1
# 安装系统依赖 (pyaudio 编译需要)
RUN apt-get update && apt-get install -y --no-install-recommends \
gcc \
portaudio19-dev \
python3-dev \
&& rm -rf /var/lib/apt/lists/*
# 安装依赖
COPY requirements.txt requirements.txt
RUN uv pip install -r requirements.txt
RUN uv pip install aws-opentelemetry-distro==0.12.2
# 创建非 root 用户
RUN useradd -m -u 1000 bedrock_agentcore
USER bedrock_agentcore
# 暴露端口
EXPOSE 9000
EXPOSE 8000
EXPOSE 8080
# 复制应用代码
COPY . .
# 启动命令
CMD ["opentelemetry-instrument", "python", "-m", "ws_server_on_agentcore_runtime"]
ws_client.py
"""
快时尚电商双向语音智能客服系统 - WebSocket 客户端
支持连接本地服务或 AgentCore Runtime 云端服务。
"""
import asyncio
import base64
import json
import queue
import sys
import threading
import hashlib
import hmac
import os
from datetime import datetime, timezone
from typing import Optional
from urllib.parse import urlparse, urlencode
import pyaudio
import websockets
# 音频配置
# Nova 2 Sonic 输出采样率是 16000Hz
AUDIO_CONFIG = {
"INPUT_SAMPLE_RATE": 16000,
"OUTPUT_SAMPLE_RATE": 16000,
"CHANNELS": 1,
"FORMAT": pyaudio.paInt16,
"CHUNK_SIZE": 512,
}
def sign(key: bytes, msg: str) -> bytes:
"""HMAC-SHA256 签名"""
return hmac.new(key, msg.encode("utf-8"), hashlib.sha256).digest()
def get_signature_key(secret_key: str, date_stamp: str, region: str, service: str) -> bytes:
"""生成 AWS SigV4 签名密钥"""
k_date = sign(("AWS4" + secret_key).encode("utf-8"), date_stamp)
k_region = sign(k_date, region)
k_service = sign(k_region, service)
k_signing = sign(k_service, "aws4_request")
return k_signing
def create_presigned_websocket_url(
agent_runtime_arn: str,
region: str = "us-east-1",
session_id: Optional[str] = None,
) -> str:
"""
创建带 SigV4 签名的 WebSocket URL (使用 botocore)
URL 格式: wss://bedrock-agentcore.<region>.amazonaws.com/runtimes/<encoded_arn>/ws
"""
import boto3
from botocore.auth import SigV4QueryAuth
from botocore.awsrequest import AWSRequest
from urllib.parse import quote
# 使用 boto3 获取凭证
session = boto3.Session(region_name=region)
credentials = session.get_credentials()
if not credentials:
raise ValueError("AWS credentials not found. Run 'aws configure' or set environment variables.")
frozen_credentials = credentials.get_frozen_credentials()
service = "bedrock-agentcore"
host = f"bedrock-agentcore.{region}.amazonaws.com"
# ARN 需要 URL 编码(: 和 / 都要编码)
encoded_arn = quote(agent_runtime_arn, safe='')
# 构建基础 URL
base_url = f"wss://{host}/runtimes/{encoded_arn}/ws"
# 添加 session_id 如果有
if session_id:
base_url += f"?X-Amzn-Bedrock-AgentCore-Runtime-Session-Id={session_id}"
# 创建 AWS 请求 - 注意 SigV4 签名需要使用 HTTPS scheme
# 但最终 URL 使用 WSS
https_url = base_url.replace("wss://", "https://")
request = AWSRequest(method="GET", url=https_url, headers={"host": host})
# 使用 SigV4QueryAuth 签名(将签名放在 query string 中)
signer = SigV4QueryAuth(frozen_credentials, service, region, expires=300)
signer.add_auth(request)
# 将签名后的 URL 转回 WSS
signed_url = request.url.replace("https://", "wss://")
return signed_url
class AudioManager:
"""音频输入输出管理"""
def __init__(self):
self.pyaudio_instance: Optional[pyaudio.PyAudio] = None
self.input_stream: Optional[pyaudio.Stream] = None
self.output_stream: Optional[pyaudio.Stream] = None
self.output_queue: queue.Queue = queue.Queue()
self._running = False
def start(self) -> bool:
"""启动音频设备"""
try:
self.pyaudio_instance = pyaudio.PyAudio()
# 打开麦克风输入
self.input_stream = self.pyaudio_instance.open(
format=AUDIO_CONFIG["FORMAT"],
channels=AUDIO_CONFIG["CHANNELS"],
rate=AUDIO_CONFIG["INPUT_SAMPLE_RATE"],
input=True,
frames_per_buffer=AUDIO_CONFIG["CHUNK_SIZE"],
)
# 打开扬声器输出
self.output_stream = self.pyaudio_instance.open(
format=AUDIO_CONFIG["FORMAT"],
channels=AUDIO_CONFIG["CHANNELS"],
rate=AUDIO_CONFIG["OUTPUT_SAMPLE_RATE"],
output=True,
frames_per_buffer=AUDIO_CONFIG["CHUNK_SIZE"],
)
self._running = True
return True
except Exception as e:
print(f"音频设备初始化失败: {e}")
return False
def stop(self):
"""停止音频设备"""
self._running = False
if self.input_stream:
self.input_stream.stop_stream()
self.input_stream.close()
if self.output_stream:
self.output_stream.stop_stream()
self.output_stream.close()
if self.pyaudio_instance:
self.pyaudio_instance.terminate()
def read_audio(self) -> Optional[bytes]:
"""从麦克风读取音频"""
if not self.input_stream or not self._running:
return None
try:
return self.input_stream.read(
AUDIO_CONFIG["CHUNK_SIZE"],
exception_on_overflow=False
)
except Exception:
return None
def play_audio(self, audio_data: bytes):
"""播放音频"""
if self.output_stream and self._running:
try:
self.output_stream.write(audio_data)
except Exception:
pass
def queue_audio(self, audio_base64: str):
"""将音频加入播放队列"""
try:
audio_data = base64.b64decode(audio_base64)
self.output_queue.put(audio_data)
except Exception:
pass
def clear_queue(self):
"""清空播放队列"""
while not self.output_queue.empty():
try:
self.output_queue.get_nowait()
except queue.Empty:
break
def output_loop(self):
"""音频输出循环"""
while self._running:
try:
audio_data = self.output_queue.get(timeout=0.1)
self.play_audio(audio_data)
except queue.Empty:
continue
async def main(
endpoint: str = None,
agent_arn: str = None,
region: str = "us-east-1",
local: bool = False,
):
"""主函数"""
print("=" * 50)
print("潮流速递 AI 语音客服 - WebSocket 客户端")
print("=" * 50)
# 确定 WebSocket URL
if local or endpoint:
ws_url = endpoint or "ws://localhost:8080/ws"
print(f"连接到本地服务: {ws_url}")
elif agent_arn:
print(f"连接到 AgentCore Runtime: {agent_arn}")
ws_url = create_presigned_websocket_url(agent_arn, region)
else:
print("错误: 请指定 --local 或 --agent-arn")
return
print("直接对着麦克风说话,按 Ctrl+C 结束对话\n")
audio = AudioManager()
if not audio.start():
print("音频设备初始化失败")
return
try:
# 增加超时时间,因为 AgentCore Runtime 可能需要冷启动
async with websockets.connect(
ws_url,
open_timeout=60, # 60秒连接超时
close_timeout=10,
) as ws:
print("已连接到服务端\n")
# 启动音频输出线程
output_thread = threading.Thread(target=audio.output_loop, daemon=True)
output_thread.start()
# 启动音频输入任务
async def send_audio():
while audio._running:
audio_data = await asyncio.to_thread(audio.read_audio)
if audio_data:
audio_base64 = base64.b64encode(audio_data).decode("utf-8")
await ws.send(json.dumps({
"type": "audio_input",
"audio": audio_base64,
}))
await asyncio.sleep(0.01)
# 启动接收任务
async def receive_events():
async for message in ws:
event = json.loads(message)
event_type = event.get("type", "")
if event_type == "audio_stream":
audio.queue_audio(event.get("audio", ""))
elif event_type == "transcript":
role = event.get("role", "")
text = event.get("text", "")
is_final = event.get("is_final", False)
if is_final and text:
label = "[用户]" if role == "user" else "[客服]"
print(f"{label}: {text}")
elif event_type == "interruption":
audio.clear_queue()
# 并发运行
await asyncio.gather(
send_audio(),
receive_events(),
)
except websockets.exceptions.ConnectionClosed:
print("\n连接已关闭")
except KeyboardInterrupt:
print("\n\n正在结束对话...")
except Exception as e:
print(f"\n错误: {e}")
finally:
audio.stop()
print("对话已结束,感谢使用!")
if __name__ == "__main__":
import argparse
parser = argparse.ArgumentParser(description="WebSocket 语音客户端")
parser.add_argument(
"--endpoint",
help="WebSocket 端点 URL (用于自定义端点)"
)
parser.add_argument(
"--agent-arn",
help="AgentCore Runtime ARN (用于连接云端服务)"
)
parser.add_argument(
"--region",
default="us-east-1",
help="AWS Region (默认: us-east-1)"
)
parser.add_argument(
"--local",
action="store_true",
help="连接本地服务 (ws://localhost:8080/ws)"
)
args = parser.parse_args()
try:
asyncio.run(main(
endpoint=args.endpoint,
agent_arn=args.agent_arn,
region=args.region,
local=args.local,
))
except KeyboardInterrupt:
print("\n程序已退出")
sys.exit(0)
ws_server_on_agentcore_runtime.py
"""
快时尚电商双向语音智能客服系统 - AgentCore Runtime WebSocket 服务端
基于 AgentCore Runtime 和 Nova 2 Sonic 的双向语音服务。
通过 WebSocket 暴露语音对话接口。
运行方式:
本地测试: python ws_server_on_agentcore_runtime.py
部署到 AgentCore: agentcore configure -e ws_server_on_agentcore_runtime.py && agentcore launch
"""
import asyncio
import base64
import json
import os
import logging
from bedrock_agentcore import BedrockAgentCoreApp
# 直接导入需要的模块,避免触发 pyaudio 导入
from strands.experimental.bidi.agent.agent import BidiAgent
from strands.experimental.bidi.models.nova_sonic import BidiNovaSonicModel
from strands.experimental.bidi.types.events import (
BidiAudioInputEvent,
BidiAudioStreamEvent,
BidiTranscriptStreamEvent,
BidiOutputEvent,
BidiInterruptionEvent,
)
logging.basicConfig(level=logging.INFO)
logger = logging.getLogger(__name__)
# SOP 系统提示词
FASHION_SOP_PROMPT = """
你是"潮流速递"快时尚品牌的AI语音客服小潮。
服务原则:
- 语气亲切友好,使用"您"称呼客户
- 回答简洁明了,每次回复控制在2-3句话
- 只在对话开始时问候一次,之后直接回答问题
- 用中文回答
你可以帮助客户:
- 尺码咨询:S码155-165cm/45-55kg,M码160-170cm/50-60kg,L码165-175cm/55-70kg,XL码170-180cm/65-80kg
- 订单查询:询问订单号或手机号
- 退换货:7天无理由退换,退款3-5个工作日
- 投诉处理:表达歉意,记录问题
"""
# 创建 AgentCore 应用
app = BedrockAgentCoreApp()
class WebSocketAudioInput:
"""从 WebSocket 接收音频输入"""
def __init__(self, websocket):
self.websocket = websocket
self._queue: asyncio.Queue = asyncio.Queue()
self._running = True
self._config = None
async def start(self, agent: BidiAgent) -> None:
"""启动输入流"""
self._config = agent.model.config.get("audio", {})
logger.info("WebSocket 音频输入已启动")
async def stop(self) -> None:
"""停止输入流"""
self._running = False
logger.info("WebSocket 音频输入已停止")
async def receive_from_websocket(self) -> None:
"""从 WebSocket 接收数据并放入队列"""
try:
while self._running:
data = await self.websocket.receive_json()
await self._queue.put(data)
except Exception as e:
logger.error(f"WebSocket 接收错误: {e}")
self._running = False
async def __call__(self) -> BidiAudioInputEvent:
"""返回音频输入事件"""
while self._running:
try:
data = await asyncio.wait_for(self._queue.get(), timeout=0.1)
if data.get("type") == "audio_input":
return BidiAudioInputEvent(
audio=data.get("audio", ""),
format=self._config.get("format", "pcm"),
sample_rate=self._config.get("input_rate", 16000),
channels=self._config.get("channels", 1),
)
except asyncio.TimeoutError:
continue
raise StopAsyncIteration
class WebSocketAudioOutput:
"""向 WebSocket 发送音频输出"""
def __init__(self, websocket):
self.websocket = websocket
async def start(self, agent: BidiAgent) -> None:
"""启动输出流"""
logger.info("WebSocket 音频输出已启动")
async def stop(self) -> None:
"""停止输出流"""
logger.info("WebSocket 音频输出已停止")
async def __call__(self, event: BidiOutputEvent) -> None:
"""发送事件到 WebSocket"""
try:
if isinstance(event, BidiAudioStreamEvent):
await self.websocket.send_json({
"type": "audio_stream",
"audio": event.get("audio", ""),
})
elif isinstance(event, BidiTranscriptStreamEvent):
await self.websocket.send_json({
"type": "transcript",
"role": event.get("role", ""),
"text": event.get("text", ""),
"is_final": event.get("is_final", False),
})
elif isinstance(event, BidiInterruptionEvent):
await self.websocket.send_json({
"type": "interruption",
"reason": event.get("reason", ""),
})
except Exception as e:
logger.error(f"WebSocket 发送错误: {e}")
@app.entrypoint
async def agent_invocation(payload, context):
"""AgentCore Runtime HTTP 入口点"""
prompt = payload.get("prompt", "")
return {"message": f"请使用 WebSocket 端点 /ws 进行语音对话。收到: {prompt}"}
@app.websocket
async def websocket_handler(websocket, context):
"""
WebSocket 语音对话端点 (路径: /ws)
客户端发送: {"type": "audio_input", "audio": "<base64>"}
服务端返回:
- {"type": "audio_stream", "audio": "<base64>"}
- {"type": "transcript", "role": "user|assistant", "text": "...", "is_final": true}
- {"type": "interruption", "reason": "..."}
"""
await websocket.accept()
logger.info("WebSocket 客户端已连接")
# 配置 Nova 2 Sonic 模型
model = BidiNovaSonicModel(
model_id="amazon.nova-2-sonic-v1:0",
client_config={
"region": os.environ.get("AWS_REGION", "us-east-1"),
},
provider_config={
"audio": {
"voice": "tiffany",
}
}
)
# 创建 BidiAgent
agent = BidiAgent(
model=model,
system_prompt=FASHION_SOP_PROMPT,
)
# 创建 WebSocket IO
ws_input = WebSocketAudioInput(websocket)
ws_output = WebSocketAudioOutput(websocket)
try:
# 启动 WebSocket 接收任务
receive_task = asyncio.create_task(ws_input.receive_from_websocket())
# 运行 BidiAgent
await agent.run(
inputs=[ws_input],
outputs=[ws_output]
)
except Exception as e:
logger.error(f"语音对话错误: {e}")
import traceback
traceback.print_exc()
finally:
await agent.stop()
await websocket.close()
logger.info("语音对话会话已结束")
# 运行应用
if __name__ == "__main__":
app.run()
部署命令
安装部署工具
pip install bedrock-agentcore-starter-toolkit
配置 AgentCore
agentcore configure -e ws_server_on_agentcore_runtime.py
Configuring Bedrock AgentCore...
✓ Using file: ws_server_on_agentcore_runtime.py
🏷️ Inferred agent name: ws_server_on_agentcore_runtime
Press Enter to use this name, or type a different one (alphanumeric without '-')
Agent name [ws_server_on_agentcore_runtime]:
✓ Using agent name: ws_server_on_agentcore_runtime
🔍 Detected dependency file: requirements.txt
Press Enter to use this file, or type a different path (use Tab for autocomplete):
Path or Press Enter to use detected dependency file: requirements.txt
✓ Using requirements file: requirements.txt
🚀 Deployment Configuration
Warning: Direct Code Deploy deployment unavailable (zip utility not found). Falling back to
Container deployment.
Select deployment type:
1. Container - Docker-based deployment
✓ Deployment type: Container
🔐 Execution Role
Press Enter to auto-create execution role, or provide execution role ARN/name to use existing
Execution role ARN/name (or press Enter to auto-create):
✓ Will auto-create execution role
🏗️ ECR Repository
Press Enter to auto-create ECR repository, or provide ECR Repository URI to use existing
ECR Repository URI (or press Enter to auto-create):
✓ Will auto-create ECR repository
🔐 Authorization Configuration
By default, Bedrock AgentCore uses IAM authorization.
Configure OAuth authorizer instead? (yes/no) [no]:
✓ Using default IAM authorization
🔒 Request Header Allowlist
Configure which request headers are allowed to pass through to your agent.
Common headers: Authorization, X-Amzn-Bedrock-AgentCore-Runtime-Custom-*
Configure request header allowlist? (yes/no) [no]:
✓ Using default request header configuration
Configuring BedrockAgentCore agent: ws_server_on_agentcore_runtime
💡 No container engine found (Docker/Finch/Podman not installed)
✓ Default deployment uses CodeBuild (no container engine needed), For local builds, install Docker,Finch, or Podman
Memory Configuration
Tip: Use --disable-memory flag to skip memory entirely
✅ MemoryManager initialized for region: us-east-1
Existing memory resources found:
1. claude_agent_quick_start_agentcore_mem-D
ID: claude_agent_quick_start_agentcore_mem-DhXXEsGCAP
2. customer_service_agent_mem-g5464XAoei
ID: customer_service_agent_mem-g5464XAoei
Options:
• Enter a number to use existing memory
• Press Enter to create new memory
• Type 's' to skip memory setup
Your choice:
✓ Short-term memory will be enabled (default)
• Stores conversations within sessions
• Provides immediate context recall
Optional: Long-term memory
• Extracts user preferences across sessions
• Remembers facts and patterns
• Creates session summaries
• Note: Takes 120-180 seconds to process
Enable long-term memory? (yes/no) [no]:
✓ Using short-term memory only
Will create new memory with mode: STM_ONLY
Memory configuration: Short-term memory only
Network mode: PUBLIC
⚠️ Platform mismatch: Current system is 'linux/amd64' but Bedrock AgentCore requires 'linux/arm64',
so local builds won't work.
Please use default launch command which will do a remote cross-platform build using code build.For
deployment other options and workarounds, see:
https://docs.aws.amazon.com/bedrock-agentcore/latest/devguide/getting-started-custom.html
Generated Dockerfile: .bedrock_agentcore\ws_server_on_agentcore_runtime\Dockerfile
Changing default agent from 'fashion-voice-agent' to 'ws_server_on_agentcore_runtime'
╭───────────────────────────────────── Configuration Success ─────────────────────────────────────╮
│ Agent Details │
│ Agent Name: ws_server_on_agentcore_runtime │
│ Deployment: container │
│ Region: us-east-1 │
│ Account: <your_aws_account_id> │
│ │
│ Configuration │
│ Execution Role: Auto-create │
│ ECR Repository: Auto-create │
│ Network Mode: Public │
│ ECR Repository: Auto-create │
│ Authorization: IAM (default) │
│ │
│ │
│ Memory: Short-term memory (30-day retention) │
│ │
│ │
│ Config saved to: ...\.bedrock_agentcore.yaml │
│ │
│ Next Steps: │
│ agentcore launch │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
部署到 AWS
agentcore launch
🚀 Launching Bedrock AgentCore (codebuild mode - RECOMMENDED)...
• Build ARM64 containers in the cloud with CodeBuild
• No local Docker required (DEFAULT behavior)
• Production-ready deployment
💡 Deployment options:
• agentcore deploy → CodeBuild (current)
• agentcore deploy --local → Local development
• agentcore deploy --local-build → Local build + cloud deploy
Using existing memory: ws_server_on_agentcore_runtime_mem-ZV5AEuG4kc
Starting CodeBuild ARM64 deployment for agent 'ws_server_on_agentcore_runtime' to account <your_aws_account_id>(us-east-1)
Setting up AWS resources (ECR repository, execution roles)...
Using ECR repository from config: <your_aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/bedrock-agentcore-ws_server_on_agentcore_runtime
Using execution role from config: arn:aws:iam::<your_aws_account_id>:role/AmazonBedrockAgentCoreSDKRuntime-us-east-1-5c42884fc3
Preparing CodeBuild project and uploading source...
⠹ Launching Bedrock AgentCore...Getting or creating CodeBuild execution role for agent: ws_server_on_agentcore_runtime
Role name: AmazonBedrockAgentCoreSDKCodeBuild-us-east-1-5c42884fc3
⠼ Launching Bedrock AgentCore...Reusing existing CodeBuild execution role: arn:aws:iam::<your_aws_account_id>:role/AmazonBedrockAgentCoreSDKCodeBuild-us-east-1-5c42884fc3
⠸ Launching Bedrock AgentCore...Using dockerignore.template with 46 patterns for zip filtering
⠙ Launching Bedrock AgentCore...Uploaded source to S3: ws_server_on_agentcore_runtime/source.zip
⠋ Launching Bedrock AgentCore...Updated CodeBuild project: bedrock-agentcore-ws_server_on_agentcore_runtime-builder
Starting CodeBuild build (this may take several minutes)...
⠴ Launching Bedrock AgentCore...Starting CodeBuild monitoring...
⠇ Launching Bedrock AgentCore...🔄 QUEUED started (total: 0s)
⠸ Launching Bedrock AgentCore...✅ QUEUED completed in 1.2s
🔄 PROVISIONING started (total: 1s)
⠼ Launching Bedrock AgentCore...✅ PROVISIONING completed in 7.3s
🔄 DOWNLOAD_SOURCE started (total: 9s)
⠏ Launching Bedrock AgentCore...✅ DOWNLOAD_SOURCE completed in 1.2s
🔄 PRE_BUILD started (total: 10s)
⠴ Launching Bedrock AgentCore...✅ PRE_BUILD completed in 1.2s
🔄 BUILD started (total: 11s)
⠙ Launching Bedrock AgentCore...✅ BUILD completed in 26.8s
🔄 POST_BUILD started (total: 38s)
⠹ Launching Bedrock AgentCore...✅ POST_BUILD completed in 12.2s
🔄 COMPLETED started (total: 50s)
⠇ Launching Bedrock AgentCore...✅ COMPLETED completed in 1.2s
🎉 CodeBuild completed successfully in 0m 51s
CodeBuild completed successfully
CodeBuild project configuration saved
Deploying to Bedrock AgentCore...
Passing memory configuration to agent: ws_server_on_agentcore_runtime_mem-ZV5AEuG4kc
⠦ Launching Bedrock AgentCore...Agent created/updated: arn:aws:bedrock-agentcore:us-east-1:<your_aws_account_id>:runtime/ws_server_on_agentcore_runtime-JDfa3wF3Jb
Observability is enabled, configuring observability components...
⠏ Launching Bedrock AgentCore...CloudWatch Logs resource policy already configured
⠇ Launching Bedrock AgentCore...X-Ray trace destination already configured
⠙ Launching Bedrock AgentCore...X-Ray indexing rule already configured
Transaction Search already fully configured
⠼ Launching Bedrock AgentCore...ObservabilityDeliveryManager initialized for region: us-east-1, account: <your_aws_account_id>
✅ Logs auto-created by AWS for runtime/ws_server_on_agentcore_runtime-JDfa3wF3Jb
⠇ Launching Bedrock AgentCore...✅ Traces delivery enabled for runtime/ws_server_on_agentcore_runtime-JDfa3wF3Jb
Observability enabled for runtime/ws_server_on_agentcore_runtime-JDfa3wF3Jb - logs: True, traces: True
✅ X-Ray traces delivery enabled for agent ws_server_on_agentcore_runtime-JDfa3wF3Jb
🔍 GenAI Observability Dashboard:
https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#gen-ai-observability/agent-core
Polling for endpoint to be ready...
⠙ Launching Bedrock AgentCore...Agent endpoint: arn:aws:bedrock-agentcore:us-east-1:<your_aws_account_id>:runtime/ws_server_on_agentcore_runtime-JDfa3wF3Jb/runtime-endpoint/DEFAULT
Deployment completed successfully - Agent: arn:aws:bedrock-agentcore:us-east-1:<your_aws_account_id>:runtime/ws_server_on_agentcore_runtime-JDfa3wF3Jb
╭────────────────────────────────────── Deployment Success ───────────────────────────────────────╮
│ Agent Details: │
│ Agent Name: ws_server_on_agentcore_runtime │
│ Agent ARN: │
│ arn:aws:bedrock-agentcore:us-east-1:<your_aws_account_id>:runtime/ws_server_on_agentcore_runtime-JDfa3wF │
│ 3Jb │
│ ECR URI: │
│ <your_aws_account_id>.dkr.ecr.us-east-1.amazonaws.com/bedrock-agentcore-ws_server_on_agentcore_runtime:l │
│ atest │
│ CodeBuild ID: │
│ bedrock-agentcore-ws_server_on_agentcore_runtime-builder:a870da44-c506-4fb0-afbf-12ceb61606b9 │
│ │
│ ARM64 container deployed to Bedrock AgentCore │
│ │
│ Next Steps: │
│ agentcore status │
│ agentcore invoke '{"prompt": "Hello"}' │
│ │
│ CloudWatch Logs: │
│ /aws/bedrock-agentcore/runtimes/ws_server_on_agentcore_runtime-JDfa3wF3Jb-DEFAULT │
│ --log-stream-name-prefix "2025/12/25/[runtime-logs]" │
│ /aws/bedrock-agentcore/runtimes/ws_server_on_agentcore_runtime-JDfa3wF3Jb-DEFAULT │
│ --log-stream-names "otel-rt-logs" │
│ │
│ GenAI Observability Dashboard: │
│ https://console.aws.amazon.com/cloudwatch/home?region=us-east-1#gen-ai-observability/agent-c │
│ ore │
│ │
│ Note: Observability data may take up to 10 minutes to appear after first launch │
│ │
│ Tail logs with: │
│ aws logs tail │
│ /aws/bedrock-agentcore/runtimes/ws_server_on_agentcore_runtime-JDfa3wF3Jb-DEFAULT │
│ --log-stream-name-prefix "2025/12/25/[runtime-logs]" --follow │
│ aws logs tail │
│ /aws/bedrock-agentcore/runtimes/ws_server_on_agentcore_runtime-JDfa3wF3Jb-DEFAULT │
│ --log-stream-name-prefix "2025/12/25/[runtime-logs]" --since 1h │
╰─────────────────────────────────────────────────────────────────────────────────────────────────╯
部署成功后会返回一个 Agent Runtime ARN,类似:
arn:aws:bedrock-agentcore:us-east-1:<your_aws_account_id>:runtime/ws_server_on_agentcore_runtime-xyz123
运行如下命令:
python ws_client.py —agent-arn "arn:aws:bedrock-agentcore:us-east-1:<your_aws_account_id>:runtime/ws_server_on_agentcore_runtime-SR7YV72aok"
语音对话示例:
python ws_client.py --agent-arn "arn:aws:bedrock-agentcore:us-east-1:<your_aws_account_id>:runtime/ws_server_on_agentcore_runtime-SR7YV72aok"
==================================================
潮流速递 AI 语音客服 - WebSocket 客户端
==================================================
连接到 AgentCore Runtime: arn:aws:bedrock-agentcore:us-east-1:<your_aws_account_id>:runtime/ws_server_on_agentcore_runtime-SR7YV72aok
直接对着麦克风说话,按 Ctrl+C 结束对话
已连接到服务端
[用户]: 你好
[客服]: 您好!有什么可以帮助您的吗?
[用户]: 我想查看一下订单
[客服]: 请您提供订单号或手机号,我可以帮您查询订单信息。
[用户]: 订单号是123
[客服]: 好的,我帮您查询订单123的信息。
[客服]: 请稍等片刻。
...
在这个示例中,基于 Nova 2 Sonic 的智能客服系统支持实时插话(barge-in)机制,用户可在系统生成语音响应过程中随时打断并补充或修正输入信息。系统通过双向 WebSocket 流对新增语音内容进行即时感知,从而动态调整后续回复,实现低延迟、高互动的陪伴式语音交互体验。
行为治理与能力延展架构
为了进一步完善和延展整体架构的能力,系统可以引入一层对智能体行为进行统一约束与编排的治理机制。该架构以 Strands Agents 作为核心的“行为治理中枢”,将模型能力与企业真实业务能力进行清晰解耦,Nova 2 Sonic 负责语音理解、语义推理与意图生成,而不直接执行任何业务动作。模型输出的 toolUse 只是结构化的意图信号,是否执行、如何执行、是否被允许,均由 Strands Agent 统一接管与裁决。在这一层中,可以集中实现 权限校验、策略约束、重试与兜底、人类介入(Human-in-the-Loop)、跨 Agent 协作(A2A)以及 MCP 工具治理,从而避免模型越权操作或不可控调用。通过该设计,语音智能体既能够持续接入和组合新的业务能力,又始终保持可审计、可把控、可演进的企业级行为边界,为从“可用的语音模型”走向“可规模化落地的业务智能体”提供关键支撑。
+--------------------------------------------------------------------+
| Application / UX Layer |
| (Web / Mobile / IVR / Device / UI Event / Audio Stream) |
+--------------------------------------------------------------------+
|
| Audio / Text / Control Events
v
+--------------------------------------------------------------------+
| Bedrock AgentCore Runtime |
| |
| - Session lifecycle |
| - WebSocket / HTTP |
| - promptStart / response / toolUse events |
| - Multi-protocol hosting |
| |
+--------------------------------------------------------------------+
|
| Realtime stream / Orchestration Loop
v
+--------------------------------------------------------------------+
| Strands Agent (Orchestration) |
| |
| Responsibilities: |
| - Receive toolUse from model |
| - Tool selection & routing |
| - Schema validation |
| - Retry / fallback / policy |
| - Human-in-the-loop hooks |
| - A2A (Agent-to-Agent) coordination |
| |
| This is where "agent behavior" lives |
| |
| +--------------------------------------------------------------+ |
| | Nova 2 Sonic (Model) | |
| | | |
| | Responsibilities: | |
| | - Speech to Text understanding | |
| | - Intent reasoning | |
| | - Decide WHEN a tool is needed | |
| | - Emit structured toolUse events | |
| | | |
| | Does NOT execute tools | |
| | Does NOT manage tool lifecycle | |
| +--------------------------------------------------------------+ |
| |
+--------------------------------------------------------------------+
|
+------------------+------------------+ toolUse event
v v v
+--------------------+ +--------------------+ +------------------+
| MCP Tool Server | | A2A Remote Agent | | Local Functions |
| | | | | |
| - Standard MCP API | | - Client / Host | | - APIs / DB / |
| - Tool registry | | - Delegated tasks | | Business logic |
| - Cross-agent use | | - Long workflows | | |
+--------------------+ +--------------------+ +------------------+
结语
随着大模型从“文本时代”迈入“实时多模态时代”,语音不再只是交互形式的变化,而是智能体能力跃迁的关键入口。
通过 Amazon Bedrock AgentCore、Nova 2 Sonic 与 Strands Agents 的组合,快时尚电商企业可以在保证安全、稳定与可扩展的前提下,构建真正具备商业价值的实时语音智能客服系统。
这不仅是一次技术升级,更是一次客户体验与运营模式的重构。
*前述特定亚马逊云科技生成式人工智能相关的服务目前在亚马逊云科技海外区域可用。亚马逊云科技中国区域相关云服务由西云数据和光环新网运营,具体信息以中国区域官网为准。
本篇作者
AWS 架构师中心: 云端创新的引领者
探索 AWS 架构师中心,获取经实战验证的最佳实践与架构指南,助您高效构建安全、可靠的云上应用

|
 |