Amazon Polly 更新 – 时间驱动的韵律和异步合成

希望您喜欢在本博客的最新文章中提供的由 Polly 加以支持的音频，这些文章包括 DeepLens 挑战赛和 Storage Gateway 概述。作为博客写作过程的一部分，我现在正聆听我的草稿博文的合成语音，以便更好地了解相关内容如何铺展。

今天，我们将推出 Amazon Polly 的两项新功能：

时间驱动的韵律

– 现在，您可以为同部分或全部输入文本相对应的合成语音指定所需的持续时间。

异步合成

– 现在，您可以处理大量文本，并在 Amazon S3 中使用单个调用存储合成的语音。
以上两种功能现已推出，您可以立即开始使用。我们来深入了解一下！

时间驱动的韵律

假设您正在创建一个多语言版本的视频或自行运行的演示文稿。在以一种语言编写脚本、录制视频之后，您使用 Amazon Translate 和 Amazon Polly 以其他语言创建音轨。为了使每种语言与视觉内容保持同步，您需要对每个片段的持续时间进行精细控制。这正是此新功能的用武之地。现在，您可以指定任何所需片段的最大所需持续时间，依靠 Polly 调整语速以限制每个片段的长度。
如果我使用 Amazon Polly 的 Joanna 语音并且不使用任何其他选项，以上段落会生成时长 19 秒的音频：

<speak>
  In order to keep each language in sync with the visual content, 
  you need to exercise fine-grained control over the duration of
  each segment. That's where this new feature comes in. You can 
  now specify the maximum desired duration for any desired segments, 
  counting on Polly to adjust the speech rate in order to limit 
  the length of each segment.
</speak>

我可以使用 <prosody> 标签将时长限制为 15 秒：

<speak>
  <prosody amazon:max-duration="15s">
    In order to keep each language in sync with the visual content, 
    you need to exercise fine-grained control over the duration of
    each segment. That's where this new feature comes in. You can 
    now specify the maximum desired duration for any desired segments, 
    counting on Polly to adjust the speech rate in order to limit 
    the length of each segment.
 </prosody>
</speak>

我可以使用多个 <prosody> 标签更精细地控制持续时间：

  <prosody amazon:max-duration="10s">
    In order to keep each language in sync with the visual content, 
    you need to exercise fine-grained control over the duration of
    each segment. 
  </prosody>
  <prosody amazon:max-duration="7s">
    That's where this new feature comes in. You can now specify 
    the maximum desired duration for any desired segments, 
    counting on Polly to adjust the speech rate in order to limit 
    the length of each segment.
 </prosody>

我的英语文本的西班牙语版本（由 Amazon Translate 提供）要稍长一些，速度也明显加快：

<speak>
  <prosody amazon:max-duration="15s">
    Para mantener cada idioma sincronizado con el contenido
    visual, es necesario ejercer un control detallado sobre
    la duración de cada segmento. Ahí es donde entra esta 
    nueva característica. Ahora puede especificar la 
    duración máxima deseada para los segmentos deseados, 
    contando con que Polly ajuste la velocidad de voz para 
    limitar la longitud de cada segmento.
 </prosody>
</speak>

每个有时间限制的 <prosody> 标签内的文本字符数必须控制在 1500 个以内，并且不得嵌套（内部标签将被忽略）。为了确保音频依然容易理解，Polly 会将音频速度最高加快 5 倍。

异步合成

借助此功能，您可以使用异步请求一次处理具有多达 100000 个字符的文本，从而可更轻松地使用 Polly 为文章或书籍章节等长篇内容生成语音。系统会将合成后的语音传送至您选择的 S3 存储桶，并将失败通知有选择地路由至您选择的 Amazon Simple Notification Service (SNS) 主题。所生成的音频长度上限为 6 小时，并且通常在几分钟内即可准备就绪。除了具有 100000 个字符的文本外，每个请求还可以额外包含 100000 个字符的语音合成标记语言 (SSML) 标记。
每个异步请求都会创建一个新的语音合成任务。您可以通过 Polly 控制台、CLI (start-speech-synthesis-task) 或 API (StartSpeechSynthesisTask) 启动和管理任务。

为了测试此功能，我为自己撰写的已经完全过时的 AWS 书籍创建了纯文本版本，并插入了一些 SSML 标签，在此过程中将其变为有效的 XML。然后，我打开 Polly 控制台，单击文本到语音转换，粘贴该 XML，然后单击Synthesize to S3(合成到 S3)：