AWS for M&E Blog

Getting started with Open Job Description from “Hello World” to FFmpeg

Go from “Hello World” to a chained and configurable set of FFmpeg instructions using the Open Job Description specification and newly released tools and libraries to create and configure compute jobs.

Introduction

In January and February of 2024, the Amazon Web Services (AWS) Thinkbox Deadline team released specification, tooling, and sample templates for the open-source project Open Job Description (OpenJD). OpenJD is a human-readable and expressive way of describing compute jobs, using AWS Thinkbox’s experience in the areas of visual effects (VFX), animation, games, and more to solve common problems in those spaces. OpenJD aims to simplify pipelines and maintenance by creating a standard that can be used running various computation jobs. While OpenJD was born from the problems of rendering VFX, care has been taken to avoid limiting the solution, and OpenJD allows for increased interoperability and modularity across a wide range of compute jobs.

A workflow diagram, showing work submitters and integrations on the left side, passing work through Open Job Description in the middle, and reaching a variety of compute solutions on the right side.

Figure 1: By integrating Open Job Description into the pipeline, compute jobs such as render can be submitted to a variety of Open Job Description compatible compute solutions, such as render farms, batch compute tools, or even the CLI tool, openjd, on the local machine.

With these releases, users can start using OpenJD in their own pipelines to more easily describe units of work. This reduces the requirements for maintenance of pipeline code, lowers the complexity of the pipeline, and increases the ability for pipelines to interact with other tooling. OpenJD does not need to be adopted across entire pipelines at once, and instead can be used only where appropriate.

In this walkthrough, we start by writing a “Hello World” OpenJD template, and gradually increase the complexity until it’s able to encode videos with FFmpeg.

Prerequisites

If you’re interested in following along, it’s easy to install the OpenJD tooling into your Python 3.9+ environment:

$> pip install openjd-cli

Installing openjd-cli also installs the required libraries, openjd-sessions and openjd-model. OpenJD runs the same across Linux, OSX, and Windows, provided your commands run the same across those platforms. This walkthrough will give commands formatted for Linux and OSX. Users on Windows will have to adjust the commands for the Windows environment. For more information on path remapping support for that workflow, please see this part of the OpenJD specification wiki on how Jobs are run.

Following the optional FFmpeg sections requires the installation of FFmpeg.

Walkthrough

Hello World

An OpenJD template is a JSON or YAML document, and the tools and libraries released work with both formats. For the purposes of this demonstration, we use YAML as it allows for comments while JSON does not.

A Job template minimally only needs 3 things: A specification version, so the tools know which version to work with; a name to identify Jobs; and a step, so there’s something to run. Thus, our “Hello World” can be as simple as:

specificationVersion: "jobtemplate-2023-09"
name: Hello World
steps:
- name: Hello World Step
  script:
    actions:
      onRun:
        command: echo 
        args:
        - "Hello World"

We have our specification version, 2023-09; our name, “Hello World”; and a step, named “Hello World Step”. Within that Step, we have an action that runs a single command with a list of arguments, although there is only one argument so far.

$> openjd check ./hello-world.yaml
Template at 'hello-world.yaml' passes validation checks

To run it, we use openjd run and specify the step to run:

$> openjd run ./hello-world.yaml --step "Hello World Step"

        ==============================================
        --------- Running Task
        ==============================================
        ----------------------------------------------
        Phase: Setup
        ----------------------------------------------
        ----------------------------------------------
        Phase: Running action
        ----------------------------------------------
        Running command /private/var/folders/kb/tmpeiiw16h4.sh
        Command started as pid: 17213
        Output:
        Hello World
        Open Job Description CLI: All actions completed successfully!
        Open Job Description CLI: Local Session ended! Now cleaning up Session resources.

--- Results of local session ---

Session ended successfully

Job: Hello World
Step: Hello World Step
Duration: 0.24399404200000002 seconds
Tasks run: 1

You can see the “Hello World” result where it specifies the “Output:”, along with some statistics about how the session ran.

Job Parameters

To get more elaborate, we add our first parameter to the template, our name. This requires adding a top-level item named parameterDefinitions, and then a parameter with a STRING type:

specificationVersion: "jobtemplate-2023-09"
name: Hello World
parameterDefinitions:
  - name: Name
    type: STRING
    description: Who to say hello to.  # Not technically required
steps:
- name: Hello World Step
  script:
    actions:
      onRun:
        command: echo 
        args:
        - "Hello {{Param.Name}}"

To make use of our new parameter, we refer to it in our “Hello World Step” by using two curly brackets on either side of the parameter name prepended with Param.. Here the “Name” parameter would be referenced by Param.Name.

Now that we have this field, we must provide a value for it when running the job. We can do this with either --job-param or -p, then providing the value in the form of key=value:

$> openjd run ./hello-world.yaml -p Name=Sean --step "Hello World Step" 
 
... snip ...

        Output:
        Hello Sean
        
... snip ...

We can get some basic information about our expanded “Hello World” template with the summary command:

$> openjd summary ./hello-world.yaml -p Name=Sean

--- Summary for 'Hello World' ---

Parameters:
        - Name (STRING): Sean

Total steps: 1
Total tasks: 1
Total environments: 0

--- Steps in 'Hello World' ---

1. 'Hello World Step'
        1 total Tasks

Tasks and Task Parameters

Tasks are invoked when a step breaks out work into multiple parts as a result of its Task parameters. To add Tasks to our “Hello World” example, we will add a taskParameterDefinition. This is similar to parameterDefinition, except that you don’t directly provide these parameter values when you run the template, and its position in the template is within the definition of the parameterized step in the template doc.

A flow diagram showing first a taskParameterDefinition named “WorldNumbers” defined at the top, then in the middle there is Step specifying an echo command and the “WorldNumbers” parameter used to iterate over the number values given in the taskParameterDefinition, and finally the breakout of all the individual echo Tasks in the bottom row, 3 total Tasks for 3 values in the Task parameter’s range.

Figure 2: When a taskParameterDefinition is defined in a Step, the Step understands that there’s multiple units of work that must be iterated on in order to complete all work. These units of work are called Tasks, and you define what they iterate over in the taskParameterDefinition.

When one or more Task parameters are present, it indicates to OpenJD that the Step should be broken into multiple Tasks.

steps:
- name: Hello World Step
  parameterSpace:
    taskParameterDefinitions:
    - name: WorldNumbers
      type: INT
      range: "1-5"
  script:
    actions:
      onRun:
        command: echo 
        args:
        - "Hello {{Param.Name}}"

Running summary for this template now reports that it will kick off 5 total Tasks. Running the Job results in 5 Tasks running, each with a single line repeating the same “Hello Sean”. To differentiate these outputs, we need to reference our new Task parameter in our command, this time prepending Param with Task. as the parameter is in the Task.

  script:
    actions:
      onRun:
        command: echo 
        args:
        - "Hello {{Param.Name}} {{Task.Param.WorldNumbers}}"

Running it results in:

$> openjd run ./hello-world.yaml -p Name=Sean --step "Hello World Step"

        ==============================================
        --------- Running Task
        ==============================================
        Parameter values:
        WorldNumbers(INT) = 1
        ----------------------------------------------
        Phase: Setup
        ----------------------------------------------
        ----------------------------------------------
        Phase: Running action
        ----------------------------------------------
        Running command /private/var/folders/kb/tmpsfdkiud.sh
        Command started as pid: 78361
        Output:
        Hello Sean 1

        ... snip ...

        ==============================================
        --------- Running Task
        ==============================================
        Parameter values:
        WorldNumbers(INT) = 5
        ----------------------------------------------
        Phase: Setup
        ----------------------------------------------
        ----------------------------------------------
        Phase: Running action
        ----------------------------------------------
        Running command /private/var/folders/kb/tmpdsfmqbp.sh
        Command started as pid: 78369
        Output:
        Hello Sean 5
        Open Job Description CLI: All actions completed successfully!
        Open Job Description CLI: Local Session ended! Now cleaning up Session resources.

--- Results of local session ---

Session ended successfully

Job: Hello World
Step: Hello World Step
Duration: 0.423452333 seconds
Tasks run: 5

FFmpeg

We can now apply these basics to create a template for a common workflow in VFX using the popular open-source software FFmpeg to encode multiple video files from image sequence render output. We reference the Academy Software Foundation’s (ASWF) published Encoding Guidelines to save time. ASWF’s mission is to increase the quality and quantity of open-source contributions for and from the Film and Television industries, and AWS is proud to be a premier member.

A quick translation of our “Hello World” Job template to FFmpeg using the quickstart results in:

specificationVersion: 'jobtemplate-2023-09'
name: FFmpeg Review Media
parameterDefinitions:
  - name: InputFile
    type: PATH
    objectType: FILE
    dataFlow: IN
    description: The input image sequence, with %04d style padding.
  - name: OutputDir
    type: PATH
    objectType: DIRECTORY
    dataFlow: OUT
    description: The output directory to put all generated movies in
steps:
  - name: h264
    script:
      actions:
        onRun:
          command: "ffmpeg"
          args: 
            - "-r"
            - "24"
            - "-start_number"
            - "1"
            - "-i"
            - "{{Param.InputFile}}"
            - "-pix_fmt"
            - "yuv420p"
            - "-vf"
            - "scale=in_color_matrix=bt709:out_color_matrix=bt709"
            - "-c:v"
            - "libx264"
            - "-preset"
            - "slower"
            - "-color_range"
            - "tv"
            - "-colorspace"
            - "bt709" 
            - "-color_primaries"
            - "bt709"
            - "-color_trc"
            - "iec61966-2-1"
            - "-movflags"
            - "faststart"
            - "{{Param.OutputDir}}/output.mp4"

The biggest changes occurred in the parameterDefinitions, where we added a parameter for the output directory called “OutputDir”, and changed some parameter values. The specifics of these keys and the possible values are fully documented in the specification, but they all have to do with working with file and directory paths.

We needed “OutputDir” because when the openjd tool runs a template against parameters provided by the user, it creates a new script in temp space. If we had just specified “./output.mp4” as our movie output in the OpenJD template, that movie would have ended up in temp space, as that’s the local directory the script is running from, and then our output would be immediately deleted when OpenJD cleans up the environment.

Debugging Tip

To keep temporary files around for debugging, you can pass --preserve to openjd run, and to get more verbose output, you can use --verbose. These options only work for the openjd tool, and will not necessarily translate to other OpenJD runtime environments.

Embedded Files

Instead of using a command with an args list, we can instead use a bash script, and embed it within our step:

steps:
  - name: h264
    script:
      actions:
        onRun:
          command: bash
          args:
            - "{{Task.File.Encode}}"
      embeddedFiles:
      - name: Encode
        type: TEXT
        runnable: True
        data: |
          #!/bin/env bash

          # Set bash's strict mode to cause instant errors if any part of the
          # script fails, making debugging much easier.
          set -euxo pipefail

          ffmpeg -r 24 -start_number 1 -i {{Param.InputFile}} -pix_fmt yuv420p \
          -vf "scale=in_color_matrix=bt709:out_color_matrix=bt709" \
          -c:v libx264 -preset slower -color_range tv -colorspace bt709 \
          -color_primaries bt709 -color_trc iec61966-2-1 \
          -movflags faststart {{Param.OutputDir}}/output.mp4

An embedded or companion script is the only way of running comparisons or changing input values in the Job template. If you have those requirements, you should either handle them before running the Job template, or you’ll need to use an embedded file script as described here.

The tenets of OpenJD emphasize portability, readability, and parse-ability, so while using an embedded file for scripting will technically allow you to do anything you want, we think the system works better if the results for a specific set of inputs are consistent and predictable. We won’t continue to demonstrate embedded files in this post, but please use that style if desired.

Chaining Steps

We can increase the complexity by creating a dependency link between a high-quality h264 movie to be used as an input in the following step for the creation of a WebM movie. This is a common workflow in VFX workflows, where the first step might bake in color transformations or media information, and later steps generate lower resolution and different quality proxies for web review, editorial, or production tracking.

To create a new step named “webm” and have it depend on our existing “h264” step, we specify a dependencies list in the “webm” step:

steps:
  - name: h264
    script:
      actions:
        onRun:
          command: "ffmpeg"
          args: 
            - "-r"
            ... snip ...  # Pretend I increased the h264 quality here
            - "{{Param.OutputDir}}/h264_hq_output.mp4"
  - name: webm
    dependencies:
      - dependsOn: "h264"
    script:
      actions:
        onRun:
          command: "ffmpeg"
          args: ...

Each list item in dependencies is a single key/value pair, where the key is dependsOn and the value is the name of the step this step should wait on. Multiple list entries with multiple dependsOn keys will result in this step waiting for all of those steps to complete before executing. Steps without any dependencies can run concurrently, if the execution engine allows for concurrent operations, such as on different workers in a batch compute farm. This allows for ProRes encoding on one machine, while the WebM encode happens on a different machine:

steps:
  - name: h264
    script:
      actions:
        onRun:
          command: "ffmpeg"
          args: 
            - "-r"
            ... snip ...
            - "{{Param.OutputDir}}/h264_hq_output.mp4"
  - name: webm
    dependencies:
      - dependsOn: "h264"
    script:
      actions:
        onRun:
          command: "ffmpeg"
          args: ...
  - name: prores
    dependencies:
      - dependsOn: "h264"
    script:
      actions:
        onRun:
          command: "ffmpeg"
          args: ...

Using a Task parameter, we can even create an additional proxy version of the ProRes encode and a high-quality 10-bit version:

  - name: prores
    dependencies:
      - dependsOn: "h264"
    parameterSpace:
      taskParameterDefinitions:
      - name: Quality
        type: STRING
        range: ["0", "3"]
    script:
      actions:
        onRun:
          command: "ffmpeg"
          args: 
            - "-r"
            - "24"
            - "-i"
            - "h264_hq_output.mp4"
            - "-pix_fmt"
            - "yuv422p10le"
            - "-vf"
            - "scale=in_color_matrix=bt709:out_color_matrix=bt709"
            - "-c:v"
            - "prores_ks"
            - "-profile:v"
            - "{{Task.Param.Quality}}"
            - "-vendor"
            - "apl0"
            - "-qscale:v"
            - "10"
            - "-color_range"
            - "tv"
            - "-colorspace"
            - "bt709" 
            - "-color_primaries"
            - "bt709"
            - "-color_trc"
            - "iec61966-2-1"
            - "-y"
            - "{{Param.OutputDir}}/prores_{{Task.Param.Quality}}_output.mov"

Cleaning Up

If you’ve installed the python libraries to follow along, they can be easily uninstalled with:

pip uninstall openjd-cli openjd-sessions openjd-model

Conclusion

This post shows how to describe, validate, and run complex compute jobs with OpenJD. The OpenJD specification repository includes a full sample file of the template we created in this blog post. Please take the time to read further about how Jobs are constructed, and how Jobs are run to get a better understanding of what’s going on under the hood with these templates. Further information on media encoding with FFmpeg for use within the film and television industries can be found at the ASWF Encoding Guidelines repository. If you have comments on the specification itself, we invite you to participate in the RFC process or the discussion forums.

Sean Wallitsch

Sean Wallitsch

Sean is a Senior Solutions Architect, Visual Computing at AWS.