亚马逊AWS官方博客

释放 Claude2 和 ComfyUI 超强能力,基于 Amazon Bedrock 和 SageMaker 的 GenAI 视觉管线

背景介绍

Claude2

在 2023 年 9 月底,亚马逊云宣布将 Antrophic 公司的 Claude2 模型纳入 Amazon Bedrock 服务并正式上线。作为 ChatGPT-4 最强有力的竞争对手,Claude2 的语料库截至 2023 年初。其单次对话可处理高达 10 万个 token 的长文本,使其在总结归纳等涉及超长文本的任务中表现尤为突出。此外,许多影视和游戏领域的专家认为,Claude2 在故事创作和角色扮演等方面展现出更加拟人化和准确的表达特点。

ComfyUI

ComfyUI 是一个基于节点式工作流的 WebUI,其核心采用了稳定扩散视频生成模型。它通过将稳定扩散的流程分解成多个节点,实现了更加精细化的流程定制和更高的结果可重用性。相比于 SD WebUI,ComfyUI 的节点工作流需要一定的学习曲线,因此普及程度不如前者。然而,在作者看来,在特定领域的项目(比如游戏)中,ComfyUI 依然显示出以下优势,并且受到了越来越多的青睐:

  • 支持 json 文件或者图片来共享节点工作流,提高了项目组内部的工作效率(流程+质量)
  • 由于节点内部的优化,其整体的出图速度对于 SD WebUI 快了 10%-20% 左右
  • 在超分或者出大图的情况下也不容易将显卡打爆,得到黑图
  • 主流的 ControlNet, LoRA 等等都已支持,缺少的模块,也可以通过 custom_nodes 方式安装
  • 节点式工作方式,在游戏工作室(UE 蓝图背景)会有天然的亲和性
  • 由于其存储为结构化的数据结构,支持各种语言编程

架构图

本文将基于 Amazon Cloud Service 能力,利用 Bedrock 上面的 Claude2 模型作为驱动来输出关键提示词,然后输入到构建在 SageMaker 上的 ComfyUI,最后的视频素材存储到 S3 上面,来实现自动生成视频的管线。

项目开源在:https://github.com/xiwan/SageMaker-ComfyUI/,欢迎大家来下载和尝试。

本方案通过 CloudFormation进行一件部署,主要的架构图如下,其中包含的组建有:

  • SageMaker Notebook:基于 g5.2xlarge 的笔记本实列,包含项目运行环境以及核心代码
  • Claud2:Amazon Bedrock LLM 大模型
  • S3:存储图片和视频
  • ComfyUI:提供了节点式的界面,进行 GenAI 推理网站
  • Ngrok:第三方反向代理软件,方便外部访问 ComfyUI

实现步骤

申请 Ngrok Authtoken

Ngrok 是一个第三方的反向代理应用,通过安全可靠的内网穿透能力提供了较为方便的网络访问端点。在使得 ComfyUI 能够公开访问之前,我们需要去 Ngrok 的官方网站(https://ngrok.com/)申请一个免费 Authtoken。

CloudFormation 安装

获取到安装模板后,需要填写如下参数:

  • StackName:CloudFormtion 堆栈名字,方便管理
  • NotebookInstanceName:SageMaker 笔记本的名字
  • NotebookInstanceType:SageMaker 笔记本实列类型,建议 g5 系列
  • VolumeSizeInGB:SageMaker 笔记本硬盘大小,建议 300G 以上
  • SageMakerIAMRole:如果留空,则会创建新的笔记本执行角色
  • DefaultCodeRepository:项目的 Github 地址

等待 5-10 分钟后,我们可以在 CloudFormation Outputs 中,或者 SageMaker 中找到对应的笔记本连接:

进去后,我们发现项目源码也下载好了, 具体内容如下:

  • comfyui-Sagemaker-notebook.ipynb:安装和运行 ComfyUI
  • comfyui-Bedrock-Claude2-notebook.ipynb:Claude2 调用 ComfyUI 管线
  • workflows:ComfyUI 生成视频管线
  • langchain_tasks:langchain 模版文件
  • utils:python 脚本

安装运行 ComfyUI

  • 进入/home/ec2-user/SageMaker/SageMaker-ComfyUI,打开 comfyui-Sagemaker-notebook.ipynb,顺序执行脚本即可
  • 将前面获取的 Ngrok 的 authtoken 替换到对应的位置
  • 获取 ComfyUI 项目,安装环境依赖
    %cd $WORKING_DIR/ComfyUI
    
    !pip install -r requirements.txt
    !pip install torch torchvision
    
    !pip install pyngrok
    
  • 按需获取 VAE,Checkpoints,ControlNets,Lora 等等
  • 推荐安装 ComfyUI 的插件:comfyUI-manager,animatediff,upscaler 等等
  • 运行 Ngrok Agent + ComfyUI
    ngrok_token=NGROK_AUTHTOKEN
    
    from threading import Timer
    from queue import Queue
    from pyngrok import ngrok
    
    def ngrok_tunnel(port,queue,auth_token):
    ngrok.set_auth_token(auth_token)
    url = ngrok.connect(port)
    queue.put(url)
    
    ngrok_output_queue = Queue()
    ngrok_thread = Timer(2, ngrok_tunnel, args=(8188, ngrok_output_queue, ngrok_token))
    ngrok_thread.start()
    ngrok_thread.join()
    print(ngrok_output_queue.get())
    
    %cd $WORKING_DIR/ComfyUI
    !python main.py —preview-method auto —enable-cors-header —use-pytorch-cross-attention —disable-xformers
    
  • 成功起来后,我们即可获得一个公开的访问地址(蓝色文字部分,第一次访问,可能会看到警告,可以忽略直接访问)
  • 如果想停止对外服务,停止 SageMaker notebook 即可
  • 如果想回收资源,找到对应的 CloudFormation 删除即可

测试管线

当 ComfyUI 的网站起来以后,可以做以下测试(将图片拖到 ComfyUI 即可使用)

txt2img

txt2gif

关于 ComfyUI 的更详细用法可以参考 : https://github.com/comfyanonymous/ComfyUI_examples

可编程管线

后面我们将基于下面的管线制作视频,大家可以直接下载这张图片(https://github.com/xiwan/SageMaker-ComfyUI/blob/main/workflows/workflow-animate.png),然后导入 ComfyUI。

使用可编程管线之前,需要开启 ComfyUI 的 Enable Dev mode Option 模式, 然后保存为 API 模式。

https://github.com/xiwan/SageMaker-ComfyUI/blob/main/workflows/workflow_api_txt2gif.json

Claude2 安装和设置

选择 Claude2 的一个重要原因:对比 Claude2 Vs GPT4,关于天安门升旗仪式攻略 这个问题,Claude2 的答案覆盖了最重要的时间和地点,而 GPT4 只是一些简单文字输出,没有任何具体的数字。

进入目录/home/ec2-user/SageMaker/SageMaker-ComfyUI,获取笔记 comfyui-Bedrock-Claude2-notebook.ipynb,顺序执行脚本即可。

测试生成短片代码,生成目录为/home/ec2-user/SageMaker/outputs/:

from utils import comfyui_api
        
test_prompt = """
    (A_Rostov_Style:0.7), rough brush strokes:1.3, oil painting, soft lighting
    protrait, John Wick, shoreline, ocean, skyline, windy, daylight,
    soothing tones, calm colors, 
    hdr, (intricate details, hyperdetailed:1.15)
        """
seed = 4410130 # random.randint(1000000, 9999999)

comfyui_api.generate_clip(test_prompt, seed)

安装 Bedrock 以及 Claude2 所需的依赖

# dependencies
!pip install --no-build-isolation --force-reinstall --quiet \
    "boto3>=1.28.57" \
    "awscli>=1.29.57" \
    "botocore>=1.31.57"

# text
!pip install --quiet\
    langchain==0.0.309 \
    "transformers>=4.24,<5" \
    sqlalchemy -U \
    "faiss-cpu>=1.7,<2" \
    "pypdf>=3.8,<4" \
    pinecone-client \
    apache-beam \
    datasets \
    tiktoken \
    "ipywidgets>=7,<8" \
    matplotlib 

# agents
!pip install --quiet \
    duckduckgo-search  \
    yfinance  \
    pandas_datareader  \
    langchain_experimental \
    pysqlite3 \
    google-search-results

# entity extraction
!pip install --quiet beautifulsoup4

# image
!pip install --quiet "pillow>=9.5,<10"

测试 Amazon Bedrock 是否安装成功

import json
import os
import sys

import boto3
import botocore

module_path = ".."
sys.path.append(os.path.abspath(module_path))
from utils import bedrock, print_ww

boto3_bedrock = bedrock.get_bedrock_client()

#boto3_bedrock.list_foundation_models()

提示词工程

如果上面都通过后,接下来就可以尝试让 Claude2 去帮我们生成各种 prompt 即可。比如这里我希望以 John Wick cannot fall asleep 为主题来生成一系列相关的分镜主题,需要使用一个提示词模板来做,这里大家可以参考 story_prompt_generator.txt 模版。

As an experienced film script master, you are required to write a continuous storyboard for '{idea}' The following requirements should be followed:

# Your Objective:

- Use a bullet-point outline format.
- Divide the paragraphs into a maximum of {number} scenes, ensuring continuity in both visuals and story between each scene.
- Express each sentence using only keywords, not full sentences.
- Clearly describe the character's full name and action in each sentence

# Samples:
- JOHN WICK drives to cemetery 
- JOHN WICK walks down path 
- JOHN WICK stops at grave
- Close up JOHN WICK crying
- Flashback happy JOHN WICK and WIFE
- JOHN WICK watches lowering casket
- People tossing roses  
- JOHN WICK sitting alone grieving 
- JOHN WICK placing rose on casket
- JOHN WICK walking away alone

# Ideation Catalysts:

Pull from the above examples and infuse your creativity. Think of how you might visualize literature's most iconic scenes, reimagine historic events, or even translate music into visual art. The possibilities are endless. Dive deep, and let's create together!

Please generate the output should be listed in bulletin.

那么继续执行代码可以得到符合这个主题的一系列动作:

['- JOHN WICK lying in bed staring at ceiling', '- Clock reads 3AM ', '- JOHN WICK gets up frustrated', '- JOHN WICK makes coffee', '- JOHN WICK drinks coffee looking out window ', '- City streets empty and quiet', '- JOHN WICK does pushups trying to tire himself', '- JOHN WICK takes shower ', '- JOHN WICK gets back in bed', '- JOHN WICK tosses and turns ', '- JOHN WICK takes sleeping pills', "- Pills don't work", '- JOHN WICK makes warm milk ', '- JOHN WICK drinks milk while watching TV', '- JOHN WICK dozes off on couch', '- JOHN WICK startles awake ', '- JOHN WICK punches pillow in frustration', '- Sun starts to rise ', '- JOHN WICK gives up and makes breakfast', '- JOHN WICK drinks more coffee looking tired', '- JOHN WICK starts his day exhausted']

接下来,我们将这个列表传入下一个生成 prompts 的 提示词模板文件 midjourney_prompt_generator.txt 即可。

Assume the role of a seasoned photographer in a future where AI drives art. Collaborate with me to craft intricate prompts tailored for Midjourney, an AI-art generator converting words into mesmerizing visuals.

# Your Objective:

Transform basic ideas into detailed, evocative prompts, maximizing Midjourney's potential:
- Emphasize nouns and adjectives, specifying image content and style.
- Infuse references from pop culture, iconic artists, and specific artistic mediums.
- For every concept, devise two unique prompt variations.

# Sample Prompts:

PROMPT EXAMPLE:
Conjoined twins, side attachment, grungy, high contrast, cinematic ambiance, ultra-realism, deep hues, —ar 16:9 —q 2
PROMPT EXAMPLE:
Twins, divergent expressions, chiaroscuro lighting, moody, in the style of Annie Leibovitz, —ar 16:9 —q 2
PROMPT EXAMPLE:
Full-body blonde, brown jacket, DSLR shot, Canon EOS 5D, EF 50mm lens, ISO: 32,000, Shutter: 8000 second
PROMPT EXAMPLE:
Profile view, blonde woman, casual denim, city backdrop, Nikon D850, prime 85mm lens, —ar 3:4 —q 2
PROMPT EXAMPLE:
Crimson sunset over sea at dusk, vivid, lifelike, wide-angle, depth, dynamic illumination —ar 7:4
PROMPT EXAMPLE:
Twilight horizon, sea meeting sky, moody blue palette, reminiscent of Hiroshi Sugimoto seascapes —ar 7:4
PROMPT EXAMPLE:
White-haired girl, car filled with flowers, inspired by Rinko Kawauchi, naturalistic poses, vibrant floral overflow, Fujifilm XT4 perspective —q 2 —v 5 —ar 3:2
PROMPT EXAMPLE:
Male figure, vintage convertible, cascade of autumn leaves, evoking Chris Burkard's aesthetics, retro vibrancy, Canon EOS R6 capture —q 2 —v 5 —ar 16:9
PROMPT EXAMPLE:
Detailed shot, astronaut beside a serene lake, neon geometry backdrop, reflections, night ambiance, Fujifilm XT3 capture
PROMPT EXAMPLE:
Astronaut, hovering drone lights, misty lake morning, ethereal, shot on Sony Alpha 7R IV
PROMPT EXAMPLE:
Super Mario sprinting, Mushroom Kingdom panorama, ultra-high res, 3D rendition, trending game visuals —v 5.2 —ar 2:3 —s 250 —q 2
PROMPT EXAMPLE:
Sonic dashing, Green Hill Zone, dynamic motion blur, Sega Genesis retro feel, vibrant and iconic —ar 2:3
PROMPT EXAMPLE:
Hyper-detailed photo, mason jar containing a nebula, cosmic fusion with mundane, Sony a9 II, wide-angle, sci-fi inspiration —ar 16:9
PROMPT EXAMPLE:
Crystal ball, galaxy swirling within, juxtaposed against a bedroom setting, Canon EOS R5, sharp foreground, dreamy background —ar 16:9 —s 500
PROMPT EXAMPLE:
Pixar-inspired render, cat's seaside adventure, vibrant tones echoing "Finding Nemo", playful antics, sunny ambiance —v 5.2 —ar 9:16
PROMPT EXAMPLE:
DreamWorks-style art, dog's beach day out, hues reminiscent of "Madagascar", lively, waves crashing playfully —v 5.2 —stylize 1000 —ar 21:9
PROMPT EXAMPLE:
Vivid skyscraper, bustling city, classic cartoon blend with photo-realistic landscape, rich textures, bygone and modern melding, bustling streets —ar 101:128 —s 750 —niji 5
PROMPT EXAMPLE:
Gothic cathedral, steampunk city backdrop, Monet-inspired skies, urban vibrancy meets historic reverence, bustling marketplaces —ar 101:128 —niji 5
PROMPT EXAMPLE:
Cinematic frame, man in military attire, post-apocalyptic LA, overgrown streets, IMAX 65mm perspective, sunlit —ar 21:9 —style raw
PROMPT EXAMPLE:
Cinematic portrayal, female survivor, desert city remnants, sun setting, IMAX 65mm vision, golden tones —ar 21:9 —style raw
PROMPT EXAMPLE:
Futuristic sunglasses, cyberpunk essence, 3D data particles surrounding, 8K brilliance, neon interplay —style raw —ar 16:9

# Ideation Catalysts:

Pull from the above examples and infuse your creativity. Think of how you might visualize literature's most iconic scenes, reimagine historic events, or even translate music into visual art. The possibilities are endless. Dive deep, and let's create together!

Please generate {number} short Midjourney prompts about: {idea}, the output should be listed in bulletin.

通过上面两个例子,可以看到 prompt 设置需要比较具体地描述业务需求。我总结下来的一个比较有效的 prompt 工程格式为:

  • 角色描述 + 业务描述
  • 业务约束条件
  • 参考输出例子
  • 嵌入可替换的参数

转换效果:

– JOHN WICK lying in bed staring at ceiling

- John Wick resting in a dim bedroom, staring pensively at the ceiling, dramatic lighting, high contrast, black and white tones, cinematic perspective --ar 16:9 —q 2 

https://github.com/xiwan/SageMaker-ComfyUI/blob/main/Images/1_706098217.gif

– Clock reads 3AM

- Extreme closeup of an analog clock face, hands pointing to 3 o'clock, dim lighting, cinematic mood, highly detailed, depth of field, —ar 1:1

https://github.com/xiwan/SageMaker-ComfyUI/blob/main/Images/2_149437121.gif

总结

本文提供了一个快速搭建 Claude2 与 ComfyUI 的解决方案,通过程序化手段巧妙地将这两个 LLM 和视频生成模型融合,使其在实际业务场景中具备更显著的价值。

我们欣喜地看到,Claude2 具有单次可处理 10 万个 token 的强大能力,能够帮助我们快速解析理解各种长度的文本提示,并生成所需的高质量输出结果。基于 Claude2 的这些输出,我们可以进一步驱动 ComfyUI 的视频生成流程,只需要替换其中的关键参数,就可以实现个性化的视频生成。通过这个方案,我们成功实现了 LLM 与视频生成模型的有效融合,使之为实际业务带来更多可能性。

根据笔者的使用体验,这里有几点需要强调:

  1. 提示词工程对于最终的生成效果有很大的影响,它的输出效果直接会影响整个结果。
  2. ComfyUI 管线的设置可以大大提高生产的速率,降低试错的成本。
  3. 前面的所有设置只能做到减少 LLM 或 SD 的不可控,实际生产过程中仍旧需要反复“抽卡”,人的因素还是很重要。

最后,笔者同时基于这套管线做了一个蒸汽朋克主题的小视频: https://www.bilibili.com/video/BV1SB4y1Z7JY/


*前述特定亚马逊云科技生成式人工智能相关的服务仅在亚马逊云科技海外区域可用,亚马逊云科技中国仅为帮助您了解行业前沿技术和发展海外业务选择推介该服务。

参考

Amazon Bedrock

https://docs.aws.amazon.com/zh_cn/bedrock/latest/userguide/api-setup.html

How to run ComfyUI on Amazon SageMaker Notebook

https://medium.com/@dminhk/3-easy-steps-to-run-comfyui-on-amazon-sagemaker-notebook-c9bdb226c15e

ComfyUI Examples

https://github.com/comfyanonymous/ComfyUI_examples

Animatediff

https://animatediff.github.io/

Prompt Template Examples

https://blog.langchain.dev/the-prompt-landscape/

本篇作者

万曦

亚马逊云科技解决方案架构师,负责基于亚马逊云科技的云计算方案的咨询和架构设计。坚实的 AWS Builder 文化拥抱者。拥有超过 12 年的游戏研发经验,参与过数个游戏项目的管理和开发,对于游戏行业有深度理解和见解。