AWS for M&E Blog

Super slow motion video creation using generative AI on AWS

Slow motion video is an integral part of media productions, particularly sports broadcasting, that provides viewers with an enhanced experience. Capturing action in slow motion allows the audience to see details not possible to observe at full speed. However, recording real-time video at high frame rates requires expensive specialized cameras. Even then, extreme slow motion (more than 240 frames per second) can be jittery and of low quality.

Generative AI provides an innovative solution to create super slow-motion video by interpolating synthetic frames between real frames. Models like Frame Interpolation for Large Motion (FILM) analyze motions between input frames to generate new transitional frames that create seamless ultra-high frame rate slow motion. This approach also scales using cloud services like Amazon SageMaker to deliver low-latency performance for near real-time use cases.

In this blog post, we demonstrate how to leverage FILM and Amazon SageMaker to generate super slow-motion video from standard footage. We will share start how to package the model, deploy the model at scale using Amazon SageMaker Asynchronous endpoint, and then invoke the endpoint to process the video frames. Finally, we assemble the original and synthesized frames into a high frame rate slow motion video. Figure1 and Figure2 show side-by-side comparison videos of the original media and 3x slow motion (3 times more frames per second) media.

Figure1 - Original media

Figure1 – Original media

Figure2 - 3x slow motion media

Figure2 – 3x slow motion media

 

Architecture overview

Figure 3 depicts a reference architecture for building an end-to-end super slow-motion workflow on Amazon Web Services (AWS). Users can reference the end-to-end solution architecture described in this image to understand how these AWS Services fit into a bigger picture, and to build on top as needed.

Figure3 - Reference architecture of Super Slow Motion using generative AI on AWS

Figure3 – Super Slow Motion using generative AI on AWS

Architecture steps

  1. Invoke an Amazon API Gateway RESTful API endpoint with AWS Identity and Access Management (IAM)
  2. Amazon API Gateway invokes an AWS Lambda function to process the request.
  3. AWS Lambda function uploads model artifacts (FILM model) and endpoint configuration to an Amazon Simple Storage Service (Amazon S3) bucket and creates an Amazon SageMaker Asynchronous Inference endpoint.
  4. Amazon SageMaker Asynchronous Inference endpoint are used to run the FILM
  5. Upload a short video to an Amazon S3 bucket for processing.
  6. An Amazon Simple Storage Service (S3) event triggers an AWS Step Functions state machine execution to process the request.
  7. An AWS Lambda function extracts frames from the video and stores them in the S3 bucket.
  8. An AWS Lambda function creates an inference job by invoking the SageMaker Asynchronous inference endpoint where the FILM model interpolates new frames. The state machine execution is paused and waits for a job completion status.
  9. SageMaker Inference endpoint sends job status to Amazon Simple Notification Service (Amazon SNS).
  10. The state machine execution resumes where an AWS Lambda function encodes all new frames to create a slow motion video for storage in the S3 bucket.

This blog post focuses on the model hosting and management portion (steps 4 and 5) of the reference architecture described in Figure 3. We also share jupyter notebook and sample code, which can be run on Amazon SageMaker Studio. The jupyter notebook code is published on GitHub.

Deployment prerequisites

  • You need to have an AWS account. Make sure your AWS identity has the requisite permissions, which includes the ability to create SageMaker Resources (Domain, Model, and Endpoints) in addition to Amazon S3 access to upload model artifacts. Alternatively, you can attach the AmazonSageMakerFullAccess managed policy to your IAM User or Role.
  • This notebook is tested using default python3 kernel on SageMaker Studio. A GPU instance such as ml.g4dn.xlarge is recommended. Please reference the documentation on setting up a domain for SageMaker Studio.
  • You need at least one ml.g5.4xlarge instance for inference. More is required if you want to process multiple video chunks in parallel. Please make sure your AWS account has sufficient quota for SageMaker inference.

Deployment steps

  • To deploy the solution manually, download the AWS CloudFormation template to your local hard drive.
  • Sign in to the AWS CloudFormation console.
  • Select Create Stack.
  • On the Create stack page, specify template section and select Upload a template file.
  • Under Upload a template file, select Choose file and select the edited template from your local drive.
  • Choose Next and follow the steps in Launch the stack.
  • This will take a few minutes and set up a SageMaker Studio Domain. Follow the instructions here to launch the Studio environment.
  • In SageMaker Studio, clone this Git repository using the following command. More details about how to clone Git repository in SageMaker Studio is available here.

Deployment validation

After successfully cloning the repo, the following files and libraries will download in the following directory structure:

|– assets/                      Assets folder

|– deployment/                  CloudFormation template to deploy SageMaker environment

|– source/                      Code directory to host FILM model and generate slow-mo video

|   |–slow-mo.ipynb

|   |–helper.py

└── slow_mo_generator        Model and inference code for SageMaker Asynchronous Inference

|– interpolator.py

|– model.py

|– requirements.txt

|– serving.properties

└── utils.py

Running the guidance

  • From within the SageMaker Studio console, cd to the repo folder guidance-for-super-slow-motion-video-creation-using-generative-ai-on-aws.
  • open slow-mo.ipynb notebook, and follow the instructions to run through each cell.
Figure 4 - Open slow-mo.ipynb on SageMaker Studio

Figure 4- open slow-mo.ipynb

The notebook automatically provides a sample video to test. Please feel free to replace the sample video with your own. The architecture diagram in Figure 3 provides an overview of the full end-to-end solution. In this blog post, we focus on the following core solution components.

  1. Packaging and deploying the FILM model to a SageMaker Asynchronous endpoint
  2. Preparing input video
  3. Invoke Amazon SageMaker Asynchronous Inference Endpoint
  4. Generating slow motion video

Let’s look at each solution component in more detail.

Packaging and deploying the FILM model to a SageMaker Asynchronous endpoint.

We use Amazon SageMaker asynchronous inference (SAI) to host the FILM model. Amazon SageMaker is a fully managed machine learning platform that enables building, training, deploying, and managing machine learning models at scale. Asynchronous inference is a feature in SageMaker that allows deploying a live endpoint to process large volumes of data asynchronously.

We chose SAI because it provides a managed way to scale large payload inference like video frame processing. With auto-scaling, it can scale to thousands of concurrent inference processes to handle large volumes of payload in parallel. The service has a built in queue to manage unpredictable traffic and add robustness. SAI can automatically scale down to 0 when not in use, helping customers save on costs. All these features make SAI ideal for our slow motion video creation use case, and allow us to build a solution that can scale to process large volumes of video frames in a robust and cost-effective manner.

To create the SAI endpoint, we first need to download the pre-trained FILM model.

CDN_URL = "https://d2yqlwoly7fl0b.cloudfront.net/super-slomo" 
model_path = "slow_mo_generator/model"

# download pretrained model and unzip it to source/pretrained_model folder

PRETRAINED_MODEL = "Style-20230929T132001Z-001.zip" 
!wget -L {CDN_URL}/pretrained_models/{PRETRAINED_MODEL} -O {PRETRAINED_MODEL} 
!unzip Style-20230929T132001Z-001.zip -d {model_path}

Next, we want to package the model for deployment. How to package depends on the model server we use. In this case, we use Deep Java Library (DJL) serving, which is an open-source, framework-agnostic model server developed by Amazon. Following are the files we need to package for this solution.

slow_mo_generator
|–model/…                    # Downloaded Pretrain model
|–serving.properties           # DJL serving configuration file
|–model.py                     # inference handler code
|–interpolator.py              # Custom module
|–utils.py                     # Custom module
|–requirements.txt             # Public module to extend the container

Following is the DJL container we used for this project:

inference_image_uri = ( 
      f"763104351884.dkr.ecr.{region}.amazonaws.com/djl-inference:0.21.0-deepspeed0.8.3-cu117" 
) 

print(f"Image going to be used is ---- > {inference_image_uri}")

DJL requires a serving.properties and model.py to serve the model. The first is a configuration file to define how your model is served and what type of inference optimization libraries you need. And the latter is a custom handler functions in python to define how the model is loaded and how you like the inference to be served. If you need to extend the container, you can include a requirements.txt file.

Example serving.properties file:

engine=Python
option.model_loading_timeout=3600
option.predict_timeout=3600
minWorkers=1
maxWorkers=2
option.s3_bucket=<Bucket-Name>
option.s3_prefix=slow-mo/inference

The handler function is the core component of the model.py file. It first loads the original video frame and model, then iteratively runs inference to generate in-between frames, and then uploads the results to Amazon S3.

def handle(inputs: Input):
    
    global is_initialized
    global s3_bucket
    global s3_prefix
    properties = inputs.get_properties()
    if not is_initialized:
        initialize_service(properties)
    if inputs.is_empty():
        return None
    tar_buffer = BytesIO(inputs.get_as_bytes())
    # extract input frames
    frame_dir, process_config = utils.extract_frames(tar_buffer)
    print(f"extracted frames to here: {frame_dir}")
    # kick off interpolation process to generate new frames
    
    slow_frame_dir = asyncio.run(utils.interpolate_frames(input_frame_dir=frame_dir, 
                                                          model_path=model_path,
                                                         **process_config))
        
    timeout = 30  
    
    print(f"slow-mo frames to here: {slow_frame_dir}")
    print(os.listdir(slow_frame_dir))
    
    # Upload the frames to S3
    output_s3_path = f"s3://{s3_bucket}/{s3_prefix}/{os.path.basename(slow_frame_dir)}/"
    try:
        # Build s5cmd command
        cmd = ["/opt/djl/bin/s5cmd", "cp", f"{slow_frame_dir}/", output_s3_path]

        # Run command
        subprocess.run(cmd, timeout=timeout, check=True)
        
        print(f"Frames uploaded to {output_s3_path}")
        status = "SUCCESS"

    except subprocess.CalledProcessError as e:
        print("Error executing s5cmd:") 
        print(e.output)
        print(e.stderr)
        status = "Failed"

    except Exception as e:
        print("Unexpected error:")
        print(e)
        status = "Failed"
    
    return Output().add_as_json({"status":status, "output_location": output_s3_path})

All of this inference code along with the pre-trained model are packaged in a tar.gz file and uploaded to Amazon S3.

!tar czvf model.tar.gz slow_mo_generator/

inference_artifact = sagemaker_session.upload_data("model.tar.gz", default_bucket, f"{prefix}/inference") 
print(f"S3 Code or Model tar ball uploaded to --- > {inference_artifact}")

To host this to an SAI endpoint, we first create a SageMaker model using the container image and S3 location of our model package.

model = Model(
        image_uri=image_uri, 
            model_data=model_data, 
            role=role,
            env=env
            )

Create an endpoint configuration that defines how the Async Inference will be served.

# create async endpoint configuration
async_config = AsyncInferenceConfig(
    output_path=f"s3://{default_bucket}/{prefix}/async_inference/output" , # Where our results will be stored
    max_concurrent_invocations_per_instance=1,
    # notification_config={
            #   "SuccessTopic": "arn:aws:sns:us-east-2:123456789012:MyTopic",
            #   "ErrorTopic": "arn:aws:sns:us-east-2:123456789012:MyTopic",
    # }, #  Notification configuration
)

Finally, deploy and create a predictor.

model.deploy(
    initial_instance_count=1,
    instance_type=instance_type,
    endpoint_name=endpoint_name,
    async_inference_config=async_inference_config
    )

predictor = sagemaker.Predictor(
    endpoint_name=endpoint_name, 
    sagemaker_session=sagemaker_session
)

Prepare Input video

After the model is hosted to an SAI, we need to prepare our video by extracting the frames using FFmpeg, an open-source video processing tool. In our sample code, we created a helper function that takes in a video file and generates the frame in a `tmp` location.

#
# use ffmpeg to extract frames from video
#
def extract_frames(video_path):
    
    output_dir = Path(f"/tmp/{random.randint(0, 1000000)}")
    while output_dir.exists():
        output_dir = Path(f"/tmp/{random.randint(0, 1000000)}")
        
    output_dir.mkdir(parents=True, exist_ok=False)
    
    output_pattern = output_dir / "frame-%07d.jpg"
    print(output_pattern)
    
    ffmpeg_cmd = ["ffmpeg", "-i", video_path, 
                  "-qmin", "1", "-q:v", "1", str(output_pattern)]
    
    try:
        subprocess.run(ffmpeg_cmd, check=True)
    except subprocess.CalledProcessError as err:
        print(f"Error running ffmpeg: {err}")
        
    return output_dir
 frame_dir = helper.extract_frames(SAMPLE_VIDEO)

Along with the frames, we also store a config.json file in the same folder. This file specifies process configuration parameters to tell the model how the slow motion frames should be interpolated. Following are some of the example parameters:

  • ALIGN: frame dimensions padded to align with GPU memory.
  • BLOCK_HEIGHT and BLOCK_WIDTH: split frame into blocks if too large. This is good for high-resolution slow motion that is too large to process at once.
  • TIME_TO_INTERPOLATE: number of frames to generate between each input frame.
config = {
  "align": 64,
  "block_height": 1,
  "block_width": 1, 
  "time_to_interpolate": 2   
}

The frames and the config.json file are compressed in a single tar.gz file and upload to Amazon S3 for inference.

frames_tarfile = helper.make_tar(frame_dir)
input_s3_loc = sagemaker_session.upload_data(frames_tarfile, bucket, input_prefix)

For production, you would run this code in a dynamically scalable batch job to split a large video into small chunks and send each chunk to the SAI to maximize the parallel processing. For that, AWS Batch is a great managed solution.

Invoke Amazon SageMaker Asynchronous Inference endpoint

With the endpoint deployed and input frames staged in Amazon S3, we can now invoke the endpoint to generate our slow motion frames.

response = sm_runtime.invoke_endpoint_async(
    EndpointName=endpoint_name,
    InputLocation=input_s3_loc,
    ContentType='application/x-image')

This triggers the asynchronous job to process the frames using our pre-trained FILM model. The newly generated frames are stored in the S3 output location when the process is complete.

To maximize parallelization to process large video files more quickly, we also provide example code to auto-scale the SAI endpoint when receiving a large number of invocations. In sample code, we provided an example to auto-scale the endpoint between 0 and 5 instances. The endpoint will automatically scale out instances if it has more than 5 jobs in the queue, and it will scale to 0 if it’s not used, so you will not incur any cost. You can also define how fast to scale out and in using the ScaleInCooldown and ScaleOutCooldown attributes.

client = boto3.client(
    "application-autoscaling"
)  # Common class representing Application Auto Scaling for SageMaker amongst other services

resource_id = (
    "endpoint/" + endpoint_name + "/variant/" + "AllTraffic"
)  # This is the format in which application autoscaling references the endpoint

# Configure Autoscaling on asynchronous endpoint down to zero instances
response = client.register_scalable_target(
    ServiceNamespace="sagemaker",
    ResourceId=resource_id,
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",
    MinCapacity=0,
    MaxCapacity=5,
)

response = client.put_scaling_policy(
    PolicyName="Invocations-ScalingPolicy",
    ServiceNamespace="sagemaker",  # The namespace of the AWS service that provides the resource.
    ResourceId=resource_id,  # Endpoint name
    ScalableDimension="sagemaker:variant:DesiredInstanceCount",  # SageMaker supports only Instance Count
    PolicyType="TargetTrackingScaling",  # 'StepScaling'|'TargetTrackingScaling'
    TargetTrackingScalingPolicyConfiguration={
        "TargetValue": 5.0,  # The target value for the metric. - here the metric is - SageMakerVariantInvocationsPerInstance
        "CustomizedMetricSpecification": {
            "MetricName": "ApproximateBacklogSizePerInstance",
            "Namespace": "AWS/SageMaker",
            "Dimensions": [{"Name": "EndpointName", "Value": endpoint_name}],
            "Statistic": "Average",
        },
        "ScaleInCooldown": 600,  # The cooldown period helps you prevent your Auto Scaling group from launching or terminating
        # additional instances before the effects of previous activities are visible.
        # You can configure the length of time based on your instance startup time or other application needs.
        # ScaleInCooldown - The amount of time, in seconds, after a scale in activity completes before another scale in activity can start.
        "ScaleOutCooldown": 300  # ScaleOutCooldown - The amount of time, in seconds, after a scale out activity completes before another scale out activity can start.
        # 'DisableScaleIn': True|False - ndicates whether scale in by the target tracking policy is disabled.
        # If the value is true , scale in is disabled and the target tracking policy won't remove capacity from the scalable resource.
    },
)

Generate slow motion video

Once the process is complete, the last step is to assemble the newly generated frames in the correct sequence, and generate the final slow motion video. We created another helper function to do this automatically.

#
# use ffmpeg to create video from frames
#
def create_video(input_frame_dir, output_file, fr=60):
    
    ffmpeg_cmd = [
    "ffmpeg", 
    "-y", 
    "-framerate", str(fr),
    "-pattern_type", "glob",
    "-i", f"{input_frame_dir}/frame*.jpg", 
    "-c:v", "libx264",
    "-pix_fmt", "yuvj420p",
    output_file
    ]

    try:
        subprocess.run(ffmpeg_cmd, check=True)
    except subprocess.CalledProcessError as err:
        print(f"Error running ffmpeg: {err}")

When calling the create_video function, you provide the location of the frames, the output file name, and the frame rate of the final video. FFmpeg will take care of the rest and combine the frames to the final slow motion video in .mp4 format.

Figure 5 - Playing generated slow-motion video

Figure 5 – Playing generated slow-motion video

Clean up

To avoid incurring AWS charges after you are done testing the guidance, make sure you delete the following resources:

  • Amazon SageMaker Studio Domain
  • Amazon SageMaker Asynchronous Inference endpoint

Conclusion

In this post, we demonstrate an end-to-end workflow to generate super slow-motion video using a generative AI model called FILM. This model creates synthetic frames that are interpolated between real footage to increase frame rates. Running FILM on a SageMaker asynchronous endpoint provides scalable and low-latency performance for near real-time use cases. The solution can process standard video to produce high quality, ultra-smooth slow motion effects.

Generative AI models like FILM open up new possibilities for creating engaging video content. Slow motion is just one application – similar techniques can be used to increase video resolution, fill in missing frames, remove objects, and more. As generative AI advances, we can expect more innovations in synthesizing and enhancing video.

Lastly, we encourage you to check out the following AWS blog posts in which we explore a few other generative AI use cases for the media and entertainment industry.

James Wu

James Wu

James Wu is a Senior AI/ML Specialist Solution Architect at AWS. helping customers design and build AI/ML solutions. James’s work covers a wide range of ML use cases, with a primary interest in computer vision, deep learning, and scaling ML across the enterprise. Prior to joining AWS, James was an architect, developer, and technology leader for over 10 years, including 6 years in engineering and 4 years in marketing & advertising industries.

Ken Shek

Ken Shek

Ken Shek is an AWS Principal Solutions Architect specializing in Data Science and Analytics for the Global Media, Entertainment, Games, and Sports industries. He assists media customers in designing, developing, and deploying workloads on the AWS Cloud using best practices. Passionate about artificial intelligence and machine learning use cases, he has built the Media2Cloud on AWS guidance to help hundreds of customers ingest and analyze content, enriching its value.

Amit Kalawat

Amit Kalawat

Amit Kalawat is a Senior Solutions Architect at Amazon Web Services based out of New York. He works with enterprise customers as they transform their business and journey to the cloud.