Stable Diffusion Quick Kit 动手实践 – 在 SageMaker Training Job 上进行 SDXL Dreambooth 训练并推理

本文是 Stable Diffusion Quick Kit 系列博客的一部分，Stable Diffusion Quick Kit（https://github.com/aws-samples/sagemaker-stablediffusion-quick-kit）是一个基于 Amazon SageMaker 进行 Stable Diffusion 模型快速部署的工具包，包括了一组示例代码，服务部署脚本，前端 UI，可以帮助可以快速部署一套 Stable Diffusion 的原型服务。Quick Kit 基础及 Stable Diffusion controlnet，LoRA 等各种模型的使用以及微调详见附录的文章链接。

本文中我们将介绍如何在 SageMaker Training Job 中加载进行 Stable Diffusion XL（以下简称 SDXL）的 Dreambooth 微调训练，及训练完成后使用 Stable Diffusion WebUI 开源框架进行模型部署和即时推理，实现训推一体的整体 pipeline 及业务流程。

1. 背景介绍

1.1 Dreambooth 微调训练

Dreambooth 是 Stable Diffusion 模型训练的一种方式，通过输入 instance_prompt 定义实体主体（e.g. 人物或者实体物品）和 instance images 的 fine-tuning 图像，抽取原 SD 中 UNet，VAE 网络，将 instance prompt 和 instance images 图像与之绑定，以便后续生成的图片中只要有 instance 的 prompt 中的关键词，即可保持输入 instance 图片里面的主体实体，实现人物和物品生图时的高保真效果。

Quick Kit 上已经实现 Stable Diffusion 1.x 版本的 Dreambooth 模型训练，因此关于 Dreambooth 详细原理这里不在赘述，感兴趣的小伙伴可以参考已发布的 Quick Kit 系列博客《Stable Diffusion Quick Kit 动手实践 – 使用 Dreambooth 进行模型微调在 SageMaker 上的优化实践》。

1.2 Stable Diffusion WebUI

Stable Diffusion WebUI 是基于 Stable Diffusion 开发的一个开源的可视化软件，WebUI 在 Stable Diffusion txt2img，img2img 生图基础上拓展了很多插件来增强 Stable Diffusion 的生图能力，比如 Ultimate Upscale，Inpain 等，使得开发者可以方便地通过界面拖拽或者 API 调用进行 Stable Diffusion 模型的加载和调用。

相对于 Diffuser SDK 的模型推理， WebUI 有更丰富的调用参数及更多的插件支持，因此同样模型的出图效果某些场景下会比 Diffuser 更好，这也是目前业界不少客户使用 WebUI API 方式进行推理生图的原因。

关于 Stable Diffusion WebUI 的详细信息可以参考其 github 官方说明。

1.3 训练+推理业务场景

在遇到的使用 Stable Diffusion 模型微调和推理的业务场景中，针对 ToB 端客户，通常会上传需要训练的图像，使用 Dreambooth 训练人物（如模特或者数字人）和商品（如箱包，衣服），然后针对训练好的模型批量生成海报/广告/logo 等文案素材的图像，该过程并不需要像 app 应用一样实时交互的出图，而是一个离线异步的过程。

这种情况下，我们可以在训练任务的算力机上，同时安装部署模型微调和模型推理的框架，利用 SageMaker Training Job 方式，将微调和推理放到一个 job 中，微调训练完成，即加载 model 进行推理出图，从而一次性完成模型微调（Dreambooth）+ 模型推理（WebUI API）整个完整 pipeline 工作，将推理的模型改造到训练任务中，而不用再单独部署模型的服务端点。

同时，SageMaker Training Job 支持 Spot 竞价实例，训练任务完成则推理出图也完成，机器资源释放，进一步帮助用户节约整体的成本。

以下部分详细介绍如何在 SageMaker Training Job 进行 SDXL 版本的 Dreambooth 模型训练和推理。

2. SageMaker Training Job 中进行 SDXL Dreambooth Fine-tune

2.1 Dreambooth 训练框架

Stable Diffusion 1.x 版本时，Dreambooth fine-tune 有多种开源版本的微调框架，SDXL 版本后，Diffuser 官方在 HuggingFace 社区发布了基于 LoRA 的 Dreambooth fine-tune 框架，代码相对于原 1.x 版本更加简洁，且使用了更新的 xformers 加速框架，支持 Flash Attension v2，其 Pytorch 版本也升级到了 2.0 以上。

因此本文采用更加具有通用和扩展性的 Diffuser 官方 Dreambooth 进行模型 fine-tune，其官方代码在 diffuser repo，其中 examples/dreambooth/目录下即为 Dreambooth 模型训练的相关代码。

其中 train_dreambooth_lora_sdxl.py 就是微调训练 Dreambooth 的代码。

2.2 SageMaker Training Job 脚本

在 SageMaker Training Job 中，我们 clone 上一章节的 diffuser 官方 repo 训练代码作为 source 训练脚本目录，并将其依赖的 xformers，deepspeed 等依赖打包在 Docker 训练镜像中，通过 shell entrypoint 方式在算力机上拉起其训练脚本。

详细如下：

准备 source 源代码目录并 clone 官方代码

!mkdir -p sd_xl_dreambooth
!cd sd_xl_dreambooth && git clone https://github.com/huggingface/diffusers
!rm -rf sd_xl_dreambooth

打包训练任务的 docker 镜像（使用 Amazon 预置的 0.0+cuda118 HuggingFace DLC 容器作为基础镜像，与 diffuser 官方 pytorch/cuda 版本保持一致）

dockerfile 编写

From 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04

RUN pip install wandb
RUN pip install xformers==0.0.18
RUN pip install bitsandbytes

ENV LANG=C.UTF-8
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE

build 镜像并推送到 Amazon ECR 镜像仓库

#!/usr/bin/env bash
repo_name="sd_xl_dreambooth_finetuning"
algorithm_name=${repo_name}

account=$(aws sts get-caller-identity --query Account --output text)

# Get the region defined in the current configuration (default to us-west-2 if none defined)
region=$(aws configure get region)
region=${region:-us-west-2}

fullname="${account}.dkr.ecr.${region}.amazonaws.com/${algorithm_name}:latest"

# If the repository doesn't exist in ECR, create it.
aws ecr describe-repositories --repository-names "${algorithm_name}" > /dev/null 2>&1

if [ $? -ne 0 ]
then
    aws ecr create-repository --repository-name "${algorithm_name}" > /dev/null
fi

# Get the login command from ECR and execute it directly
aws ecr get-login-password --region ${region}|docker login --username AWS --password-stdin ${fullname}

# Build the docker image locally with the image name and then push it to ECR
# with the full name.

docker build -t ${algorithm_name} .
docker tag ${algorithm_name} ${fullname}

docker push ${fullname}

准备训练图像，这里我们使用官方示例 dataset 图像

from huggingface_hub import snapshot_download

local_dir = "./dog"
snapshot_download(
    "diffusers/dog-example",
    local_dir=local_dir, repo_type="dataset",
    ignore_patterns=".gitattributes",
)
!chmod -R 777 ./sd_xl_dreambooth
!./sd_xl_dreambooth/s5cmd sync ./dog/ $images_s3uri

图像数据上传到 $images_s3uri 的 S3 路径，以便 SageMaker Training Job 拉取。

SageMaker Estimator 拉起 Training Job
训练任务脚本编写，这里采用 shell entrypoint 方式，方便调用 diffuser 官方脚本，且传递环境变量。

%%writefile ./sd_xl_dreambooth/train.sh


mkdir -p /tmp/dog
ls -lt ./
chmod 777 ./s5cmd


cd diffusers && pip install -e .
cd examples/dreambooth/ && pip install -r requirements_sdxl.txt

cp -r /opt/ml/input/data/images/* /tmp/dog/

export MODEL_NAME="stabilityai/stable-diffusion-xl-base-1.0"
export INSTANCE_DIR="/tmp/dog/"
export OUTPUT_DIR="/tmp/ouput"
#export OUTPUT_DIR="/opt/ml/model/"
export VAE_PATH="madebyollin/sdxl-vae-fp16-fix"
export dreambooth_s3uri="s3://sagemaker-us-west-2-687912291502/stable-diffusion/dreambooth/"

accelerate launch /opt/ml/code/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py \
  --gradient_checkpointing \
  --use_8bit_adam \
  --pretrained_model_name_or_path=$MODEL_NAME  \
  --instance_data_dir=$INSTANCE_DIR \
  --pretrained_vae_model_name_or_path=$VAE_PATH \
  --output_dir=$OUTPUT_DIR \
  --mixed_precision="fp16" \
  --instance_prompt="a photo of sks dog" \
  --resolution=1024 \
  --train_batch_size=1 \
  --gradient_accumulation_steps=4 \
  --learning_rate=1e-5 \
  --report_to="tensorboard" \
  --lr_scheduler="constant" \
  --lr_warmup_steps=0 \
  --max_train_steps=500 \
  --validation_prompt="A photo of sks dog in a bucket" \
  --validation_epochs=25 \
  --seed="0" \
  --enable_xformers_memory_efficient_attention

/opt/ml/code/s5cmd sync /tmp/ouput/ $dreambooth_s3uri/output/$(date +%Y-%m-%d-%H-%M-%S)/

我们通过 SageMaker 提供的 Pytorch 的 Estimator 训练器 SDK，拉起 Training Job 训练任务。

import time
from sagemaker.estimator import Estimator
from sagemaker.pytorch.estimator import PyTorch

environment = {
    'PYTORCH_CUDA_ALLOC_CONF':'max_split_size_mb:32'
}

## The image uri which is build and pushed above
image_uri = "{}.dkr.ecr.{}.amazonaws.com/{}:latest".format(account_id, region_name, repo_name)
base_job_name = 'sd-xl-dreambooth-finetuning-high'
instance_type = 'ml.g5.2xlarge'
inputs = {
    'images': f"s3://{bucket}/dreambooth-xl/images/"
}

estimator = PyTorch(role=role,
                      entry_point='train.sh',
                      source_dir='./sd_xl_dreambooth/',
                      base_job_name=base_job_name,
                      instance_count=1,
                      instance_type=instance_type,
                      image_uri=image_uri,
                      environment=environment,
                      keep_alive_period_in_seconds=3600, 
                      disable_profiler=True,
                      debugger_hook_config=False,
                      train_use_spot_instance=True,
                      max_run=24*60*60*2,
                      )

estimator.fit(inputs)

关于 SageMaker Estimator 训练器 SDK 的使用及 shell entrypoint 的设置这里不再赘述，希望详细了解的小伙伴可以参阅 Amazon SageMaker SDK 官方文档。

2.3 Dreambooth 训练调参

SDXL Dreambooth Fine-tune 的训练参数与之前 1.x 版本调参类似，详细可参考 Quick Kit Dreambooth 博客中配置，这里把 Diffuser 框架及 SageMaker 新加的主要配置参数说明如下：

‘images’: f”s3://{bucket}/dreambooth-xl/images/”：上一步骤中准备好的dreambooth 微调图像数据，通过 inputs 参数指定 S3 路径，SageMaker 会自动将该路径下训练图像上传到训练算力机的/opt/ml/data/input/images 目录下
keep_alive_period_in_seconds：该参数是 SageMaker Training Job 的 warmpool，设置后可以把下一次训练机器保持在该用户的一个资源池中，这样方便多个 SDXL Dreambooth 训练时的镜像拉起，节省耗时的开销
enable_xformers_memory_efficient_attention：启用 xformers 的 flash attention 关注度计算优化，加速训练过程
train_use_spot_instance：是否使用 spot 竞价实例进行训练，进一步节省成本
max_run：训练任务的最大运行时间
max_wait：等待竞价实例的最长时间，如果使用 spot 竞价实例该参数是必须的

3. SageMaker Training Job 中安装部署 Stable Diffusion WebUI

如上文所述，训练完成后我们可以直接使用 fine-tuned 模型进行推理出图，这里采用 Stable Diffusion WebUI 进行推理，需要在 training job 训练算力机上安装部署开源的 WebUI 组件，将模型目录同步到 WebUI 的 model location 下，然后调用 WebUI API text2img/img2img 出图，详细如下：

3.1 docker 镜像脚本

由于是在 training job 中进行推理，我们扩充训练任务的 dockerfile 镜像文件，将 Stable Diffusion WebUI 组件及依赖同样的方式和上文中 training 的 dockerfile 打包到一起：

From 763104351884.dkr.ecr.us-west-2.amazonaws.com/huggingface-pytorch-training:2.0.0-transformers4.28.1-gpu-py310-cu118-ubuntu20.04

################ stable diffusion webui ##########################
ENV DEBIAN_FRONTEND noninteractive
ENV PATH="/opt/ml/code:${PATH}"
ENV PYTHONPATH="/opt/ml/code"
ENV COMMANDLINE_ARGS="--skip-torch-cuda-test"

# webui dependency packages
RUN apt-get update && \
    apt-get install --assume-yes apt-utils vim wget git libgl1-mesa-glx -y && \
    rm -rf /var/lib/apt/lists/* && \
    pip install \
        opencv-python-headless \
        sagemaker-training \
        boto3==1.26.64 \
        uvicorn \
        sagemaker \
        diffusers==0.14.0 \
        accelerate==0.17.0 \
        controlnet_aux \
        wheel bitsandbytes \
        GPUtil \
        nvidia-ml-py \
        pynvml \
        clip-interrogator==0.6.0 \
        spacy \
        retrying \
        piexif \
        supervision==0.6.0 \
        roboflow \
        sagemaker-ssh-helper \
        chardet && \
    
WORKDIR /opt/ml/code

# clone webui code并copy到docker内
RUN mkdir -p /opt/ml/code/third-package
RUN chmod 755 /opt/ml/code/third-package

/opt/ml/code/third-package
RUN git clone https://github.com/AUTOMATIC1111/stable-diffusion-webui /opt/ml/code/third-package && \
    python /opt/ml/code/third-package/stable-diffusion-webui/env.py && \
    pip install -r /opt/ml/code/third-package/stable-diffusion-webui/extensions/sd-webui-segment-anything/requirements.txt && \
    pip install -r /opt/ml/code/third-package/stable-diffusion-webui/extensions/sd-webui-controlnet/requirements.txt

############ dreambooth fine tune 
#################################
RUN pip install wandb
RUN pip install xformers==0.0.18
RUN pip install bitsandbytes

ENV LANG=C.UTF-8
ENV PYTHONUNBUFFERED=TRUE
ENV PYTHONDONTWRITEBYTECODE=TRUE

上述 dockerfile 中，由于 WebUI 安装部署步骤相对于 Dreambooth 更多（dreambooth fine-tune 只需要 requirements 安装依赖），WebUI 需要 env 脚本及 model 目录等，我们把 Stable Diffusion WebUI 安装部署在单独的/opt/ml/code/third-package 目录，以便和 Dreambooth 训练源码目录区分开，另外需要注意二者的依赖包版本如 diffusers，accelerate 需要对应调整一致。

3.2 WebUI 启动脚本

使用上述章节同样的 build & push 脚本，将 docker 镜像打包推送，然后在统一训练和推理的 entry point 脚本中启动训练任务，任务完成后启动 WebUI。

WebUI 的启动脚本 start_sd_webui.py 参考如下：

import subprocess
import os
import time
import requests
import json
import logging
from tenacity import retry, wait_exponential

logging.basicConfig(level=logging.INFO, filename='./webui.log', filemode='a')
logger = logging.getLogger(__name__)

@retry(wait=wait_exponential(multiplier=1, min=10, max=100), stop=stop_after_attempt(5))
def check_server(server_url):
    txt2img_url = server_url + "/sdapi/v1/txt2img"
    data = {
        'prompt': 'A photo of sks dog in a bucket',
        'sampler_index': 'DPM++ SDE',
        'seed': 1234,
        'steps': 20,
        'width': 512,
        'height': 512,
        'cfg_scale': 8
    }
    response = requests.post(txt2img_url, data=json.dumps(data),
                             headers={"Content-Type": "application/json"})
    response.raise_for_status()

def start_stable_diffusion(server_url="http://0.0.0.0:7860", log_file="./webui.log"):
    try:
        with open(log_file, "a") as f:
            process = subprocess.Popen(
                ["python", "/opt/ml/code/third-package/stable-diffusion-webui/launch.py","--port" ,"8080", 
                 "--xformers","--ckpt-dir", "/tmp/ouput/", "--api", "--listen"],
                stdout=f, stderr=subprocess.STDOUT, preexec_fn=os.setpgrp)
        if process.returncode is not None:
            raise RuntimeError("Failed to start stable diffusion process.")
        time.sleep(100)
        check_server(server_url)
        logger.info("stable diffusion server started.")
        return server_url
    except Exception as error:
        logger.error(f"stable diffusion server failed, {error}")
        raise RuntimeError("Failed to start stable diffusion server or server not responding.")

其中/tmp/output/webui 模型目录，即为上文中 Dreambooth training 后模型输出目录，check_server 会检查端口是否已监听并测试请求，如果正常则表示 training job 中已经拉起 WebUI 组件，可以开始 API 调用推理。

4. SageMaker Training Job 中对 Fine-tuned Dreambooth Model 进行推理

在 start_sd_webui.py 脚本启动 WebUI 服务器之后，即可使用 WebUI API 进行 txt2img/img2img 的推理调用，其推理 API 与官方参数一致。

由于在同一台训练算力机上，其 URI 为 localhost（0.0.0.0）对应端口及 API 路径前缀。

start_sd_webui.py 中的推理调用参考代码示例如下：

def txt2image():
  server_url = "http://0.0.0.0:7860" 
  max_retries = 5
  retry_count = 0
  while retry_count < max_retries:
   try:
    txt2img_url = server_url + "/sdapi/v1/txt2img"
    
    data = {
     'prompt': '1 girl',
     'sampler_index': 'DPM++ SDE',
     'seed': 1234,
     'steps': 20,
     'width': 512,
     'height': 512,
     'cfg_scale': 8
    }

    response = requests.post(txt2img_url, data=json.dumps(data), headers={"Content-Type": "application/json"})
    if response.status_code == 200:
     log_and_raise("info", "stable diffusion server started.")
     return server_url
   except Exception as error:
    log_and_raise("error", f"stable diffusion server failed, {error}")

合并后整个训练+推理的 SageMaker Training Job 入口脚本如下：

%%writefile ./sd_xl_dreambooth/train.sh

...省略
##############sdxl dreambooth finetune#################### 
accelerate launch /opt/ml/code/diffusers/examples/dreambooth/train_dreambooth_lora_sdxl.py \
...省略

/opt/ml/code/s5cmd sync /tmp/ouput/ $dreambooth_s3uri/output/$(date +%Y-%m-%d-%H-%M-%S)/

##############webui startup & inference##################
cd /opt/ml/code/third-packages  && python start_sd_webui.py

5. 总结

本文介绍了在 Quick Kit 中使用 SageMaker Training Job 对 SDXL 模型进行 Dreambooth 微调，并且可以在训练完成后对 fine-tuned 后的模型使用 Stable Diffusion WebUI 进行推理，实现从训练到推理的一体化操作，满足客户对于快速训练人物或商品实体并批量推理出图的需求。

文中脚本代码及笔记本训练示例，可做为用户 Dreambooth 微调及推理从基于 SD 1.5/2.0 版本到 SDXL 的 AIGC ML 平台的升级迭代工程化落地参考。

附录

Stable Diffusion XL https://arxiv.org/pdf/2307.01952.pdf
diffusers release note
ControlNet Canny https://huggingface.co/diffusers/controlnet-canny-sdxl-1.0-small
ControlNet Depth https://huggingface.co/diffusers/controlnet-depth-sdxl-1.0-small
Stable Diffusion Quick Kit 动手实践 – 基础篇
Stable Diffusion Quick Kit 动手实践-在 SageMaker 中进行 LoRA fine tuning 及推理

亚马逊AWS官方博客