AWS for M&E Blog

Generative AI assists creative workflows with text-guided inpainting and outpainting using Amazon Bedrock – Part 2

Generative AI offers exciting new possibilities for creatives. In this blog series, we explore different generative AI models for image editing. In Part 1, we developed an AI-powered eraser using Segment Anything Model (SAM) and Big LaMa model hosted on Amazon SageMaker. In just a few clicks, we can remove any unwanted objects from an image. Now let’s take this a step further and extend the solution to include text-guided editing.

Imagine you have a beach scene, but there is an unwanted object such as a vehicle in the foreground. Rather than just erasing it, you can instruct the AI to “replace the vehicle with a sandcastle”. This is called inpainting, where the generative AI model generates a realistic sandcastle blended seamlessly with the rest of the photo. We can also expand the same photo around the edges, called outpainting. This can help you change the aspect ratio of the photo and reveal more of the background scene based on your text description, generating convincing new details that blend naturally with the original image.

This blog post extends the solution developed in Part 1 by incorporating text-guided inpainting and outpainting functionality using Amazon Bedrock. This fully managed AWS service provides simple API access to state-of-the-art generative AI models without the need to manage the model or the underlying infrastructure. In particular, we use Stable Diffusion XL (SDXL) and Titan Image Generator (TIG) models available through Amazon Bedrock and demonstrate how flexible and easy Amazon Bedrock’s APIs are to use, enabling you to experiment and choose the best model for your use case.

Solution overview

We demonstrate the solution using the famous painting by Johannes Vermeer, Girl with a Pearl Earring. The image is from Hugging Face samples.

For text-guided inpainting, we replace the actual girl with a pearl earring with the likeness from the Mona Lisa. To do that, we need to first select the girl in the image, as depicted in Step 1 of the following illustration. Next, in Step 2, the pixel coordinates of the selection and the full image are passed to the SAM model to generate a mask. This mask creation process is identical to what was created in Part 1, so we do not repeat it here. The key new step is Step 3. Here, the original image, the mask, and a text prompt are fed into SDXL in Amazon Bedrock. The model uses this information to inpaint a new object into the selected area in a way that seamlessly blends with the rest of the image.

For text-guided outpainting, the steps are slightly different. First, we extend the canvas of the image to the desired size as depicted in step 1 of the following image. Next, we generate a mask that covers the extended area in Step 2. Finally, in Step 3, we use the TIG model to outpaint the extended area according to the text instruction.

An example code of this solution is provided in this Github repo. The inpaint_outpaint.ipynb notebook is tested using SageMaker studio with python 3 kernel on an ml.t3.medium instance.

Prerequisites

Complete the following prerequisites before experimenting with the code.

  • Enable SDXL 1.0 and Titan Image Generator model access in Amazon Bedrock. Reference the Amazon Bedrock documentation on how to manage model access.
  • Create a SageMaker Studio Domain: Amazon SageMaker Studio, and more specifically Studio Notebooks, are used.
  • Enable execution role access to Amazon Bedrock. Use this IAM documentation as a guide.

Language-guided inpainting

Our first exercise is to replace the girl from the picture previously with the Mona Lisa. We start our example by downloading an image of Girl with a Pearl Earring from the Hugging Face library.

from diffusers.utils import load_image

image = load_image(
    "https://hf.co/datasets/huggingface/documentation-images/resolve/main/diffusers/input_image_vermeer.png"
)

The first inpainting step is to select the girl in the image and use the SAM model from Part 1 to generate a mask image. This mask is used to direct the model as to which area should be replaced. To make this step easier, in Part 1 we also provided a mask image in the data folder to use directly.

mask = Image.open('data/mask.jpg') 

make_image_grid([image, mask], rows=1, cols=2)

Next, we define the textual prompt describing the desired inpainting output, such as “The Mona Lisa wearing a wig”. This prompt allows us to guide the model’s image generation capabilities. We also specify the artistic style preset to further steer the output. For the full list of style presets, please reference the SDXL Documentation.

inpaint_prompt = "The Mona Lisa wearing a wig" 
style_preset = "digital-art" # (e.g. photographic, digital-art, cinematic, ...)

The prompt, input image, mask, and preset are formatted as a JSON request and sent to the Amazon Bedrock runtime to invoke the Stable Diffusion XL model. The model inpaints the masked region based on the given text and style information. The output image is returned in base64 encoding and decoded back into an image format. As depicted, this enables text-controllable inpainting to synthesize novel image content that blends seamlessly with the existing non-masked areas. The same technique can be extended to object removal, image editing, and other applications.


request = json.dumps({
    "text_prompts":[{"text": inpaint_prompt}], # prompt text 
    "init_image": image_to_base64(image), # original image encoded in base64 
    "mask_source": "MASK_IMAGE_WHITE", # mask color
    "mask_image": image_to_base64(mask), # mask image encoded in base64 
    "cfg_scale": 10, # model creativity, how closely does the model follow the text 
    "seed": 10, # a number to initialize the generation 
    "style_preset": style_preset, # predefine style for SDXL 
}) 

modelId = "stability.stable-diffusion-xl" 
response = bedrock_runtime.invoke_model(body=request, modelId=modelId) 
response_body = json.loads(response.get("body").read()) 
image_2_b64_str = response_body["artifacts"][0].get("base64") 
inpaint = Image.open(io.BytesIO(base64.decodebytes(bytes(image_2_b64_str, "utf-8")))) 

make_image_grid([image, inpaint], rows=1, cols=2)

Review the image before and after.

The previous two images depict the before and after effect from language-guided inpainting. The left image is the original, and the right image is generated from the left using the text prompt while maintaining the same layout and style.

Language-guided outpainting

For outpainting, we extend the Girl with a Pearl Earring image canvas and fill the edges based on the text prompt. This time, we will try a different model from Amazon Bedrock, Titan Image Generator.

First, we create an extended blank canvas at the new target dimensions and paste the original image centered within it. Next, we generate a mask image where the original image area is blacked out and the extended region is white. This indicates the area for the model to fill.

original_width, original_height = image.size 
target_width = 1024 #extended canvas size 
target_height = 1024 
position = ( #position the existing image in the center of the larger canvas 
    int((target_width - original_width) * 0.5), 
    int((target_height - original_height) * 0.5), 
) 

extended_image = Image.new("RGB", (target_width, target_height), (235, 235, 235)) 
extended_image.paste(image, position) 

# create a mask of the extended area 
inside_color_value = (0, 0, 0) #inside is black - this is the masked area
outside_color_value = (255, 255, 255) 
mask_image = Image.new("RGB", (target_width, target_height), outside_color_value) 
original_image_shape = Image.new( 
    "RGB", 
    (original_width-40, original_height-40), 
    inside_color_value 
) 

mask_image.paste(original_image_shape, tuple(x+20 for x in position)) 

make_image_grid([extended_image, mask_image], rows=1, cols=2)

The TIG model uses a slightly different set of inputs compare to SDXL. We need to first define the TaskType as OUTPAINTING. This is unique to TIG because it also supports other task types. Review the full list of task types here.

The rest of the inputs are similar, meaning you send the encoded image, image mask, text prompt, and other parameters to generate the new image.

# Configure the inference parameters. 
request = json.dumps({ 
    "taskType": "OUTPAINTING", 
    "outPaintingParams": { "image": image_to_base64(extended_image), 
    "maskImage": image_to_base64(mask_image), 
    "text": "A girl standing on a grass field in a dark night with stars and a full moon.", # Description of the background to generate 
    "outPaintingMode": "DEFAULT", # "DEFAULT" softens the mask. "PRECISE" keeps it sharp. 
}, 
"imageGenerationConfig": { 
    "numberOfImages": 1, # Number of variations to generate 
    "quality": "premium", # Allowed values are "standard" or "premium" 
    "width": target_width, 
    "height": target_height, 
    "cfgScale": 8, 
    "seed": 5763, # Use a random seed 
    }, 
}) 

modelId = "amazon.titan-image-generator-v1" 
response = bedrock_runtime.invoke_model(body=request, modelId=modelId) 
response_body = json.loads(response.get("body").read()) 
image_bytes = base64.b64decode(response_body["images"][0]) 
outpaint = Image.open(io.BytesIO(image_bytes)) 

make_image_grid([extended_image, outpaint], rows=1, cols=2)

Review the image before and after.

The previous two images show the before and after effect from language-guided outpainting. The left image is the original with an extened canvas area, and the right image is generated from the left using the text prompt.

Note: The quality of the image generated depends on the prompt and other configuration parameters like cfg_scale and seed. You may need to experiment with different prompts or parameters to generate optimal results using these techniques.

Conclusion

This blog post walks through the steps to build a text-guided inpainting and outpainting tool that can enhance creative efficiency for your organization. We took the solution from Part 1 in this blog series and extended it with new text-guided functionality using generative AI models from Amazon Bedrock.

Specifically for inpainting, we used the Stable Diffusion XL model to replace any object within an image according to the text prompt. For outpainting, we experimented with the new Titan Image Generator model to extend images beyond their original borders without losing detail or resolution.

We also demonstrated the flexibility of Amazon Bedrock, where you can swiftly experiment with different models without the burden of managing your own models and infrastructure. Test the sample code today, and be sure to share the creative marvels you produce with Amazon Bedrock with your AWS account team and colleagues.

James Wu

James Wu

James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.

Harish Rajagopalan

Harish Rajagopalan

Harish Rajagopalan is a Senior Solutions Architect at Amazon Web Services. Harish works with enterprise customers and helps them with their cloud journey. Harish is a part of the SME community for AI/ML at AWS.

Deepti Tirumala

Deepti Tirumala

Deepti Tirumala is a Senior Solutions Architect at Amazon Web Services, specializing in Machine Learning and Generative AI technologies. With a passion for helping customers advance their AWS journey, she works closely with organizations to architect scalable, secure, and cost-effective solutions that leverage the latest innovations in these areas.