Optimizing computing performance with AWS Thinkbox deadline – Part 2

In Part 1 of this blog series, we discussed how to collect benchmark data and how to decide what hardware and software to run to get the highest performance at the lowest cost. Next, we will look at the AWS Thinkbox Deadline features related to performance optimization.

Note: Basic familiarity with AWS Thinkbox Deadline, 3D rendering workflows, and AWS EC2 offerings is assumed as a prerequisite for the following discussion.

Maximizing CPU Performance in Deadline

Using Deadline Concurrent Tasks when CPU utilization is low

The Concurrent Tasks feature allows a single Deadline Worker application to de-queue up to 16 Tasks of the same Job; this is ideal for single-threaded computing. Only one Deadline license is required per Worker when running Concurrent Tasks outside of AWS (no licensing has been required on AWS since Deadline v10.1). The actual number of Concurrent Tasks can be optionally capped to a Worker-specific Task Limit property which defaults to the number of available cores, or . In other words, if a Job is set to run 16 Concurrent Tasks per Worker, but the Worker’s Concurrent Tasks limit is set to 8 because it only has 8 cores, it will de-queue just 8 Tasks to avoid over-subscribing threads to cores.

The main requirements of this feature:

The rendering application must support running multiple instances in parallel.
All application instances must fit in the available RAM.

The latter point is straightforward to handle, but the former can be tricky. For example, Adobe After Effects can run multiple render instances as Concurrent Tasks on the local render farm as long as the Deadline Worker is installed as an application and not as a Service.

However, attempting to render the same job with Concurrent Tasks in After Effects on an Amazon EC2 instance will not work because the Deadline Worker must run as a Service in the cloud, for various technical reasons, although many applications can be run with Concurrent Tasks.

For example, Autodesk 3ds Max’s native particle system Particle Flow is inherently single-threaded, and is thus a very good candidate for running multiple simulations in parallel on the same render node, for example when saving particles as “partitions” with the AWS Thinkbox Krakatoa renderer.

Again, this option is useful when the task is mostly single-threaded. In other words, it makes little sense to run a highly multi-threaded rendering application like V-Ray or Arnold with Concurrent Tasks set to a value greater than 1.

Using multiple Deadline Worker instances to run multiple jobs with different hardware requirements

In some cases, if the performance is constrained by another factor like disk I/O (input/output, aka read/write) or network bandwidth, running Concurrent Tasks would not be beneficial because the tasks would still fight for the limiting resource. In such cases, running another Job with a completely different requirements profile on the same hardware could utilize the otherwise unused resources.

For example, running a 2D compositing job that is mainly I/O-bound and uses very few cores, together with a 3D rendering job that has lower I/O requirements but uses a lot of cores, would utilize the available resources better than running each of these jobs on a separate machine. Deadline lets you define and launch any number of Worker instances using a single Deadline license per operating system (OS) instance.

This option applies only to the on-premises render nodes. Render nodes on Amazon EC2 Spot Instances run a single instance of the Deadline Worker client application in Service mode and can de-queue only one Job at a time.

Using multiple virtual machines

The Multiple Worker Instances approach is easy to use and does not require extra Deadline licenses, but it has a drawback: All Workers will run on the same OS with the same settings.

You could go a step further by using Virtualization of on-premises hardware to define multiple virtual machines (VMs) with unique OS and software profiles.

Each OS instance on each VM could then run one or more Deadline Workers, possibly over-subscribing CPU and RAM resources to make sure every clock cycle and every byte is used. However, one Deadline license would be consumed by every VM.

This approach is not applicable to AWS instances.

Using multiple frames per task

When rendering animations, Deadline assigns one frame per Task by default. In most cases, rendering multiple frames in a single Task can improve overall performance.

There will be no Deadline Job/Task-related overhead where the Deadline Worker checks the Repository database to decide whether to move on to the next Task of the current Job, or to switch to a different Job.

Some render applications will cache data between frames and thus perform faster; for example, Adobe After Effects tends to render 10 frames in a Task faster than 10 frames in 10 Tasks.

Some 3D applications do not allow Deadline to keep them open and loaded in memory between frames (a very useful feature supported by applications like Autodesk 3ds Max and Autodesk Maya), so the whole application must be restarted for each Task. Running multiple Frames in a single Task can significantly reduce the overhead of restarting such rendering application.

The drawback of multiple Frames per Task is that if the rendering gets interrupted for any reason (an error, an EC2 Spot instance interruption, etc.), the whole Task will be re-queued, including the Task’s frames that have already finished rendering. If you are using EC2 Spot instances that are highly available and rarely reclaimed by AWS, this might be a risk worth taking.

Splitting a frame into tiles

Some jobs involve rendering very high-resolution output, such as single images for large format printing, animation frames for high-resolution projections, etc. Since the smallest unit of work in a Deadline Job is the Task, and normally a Task is assigned a Frame, the render time of the Task is the time the user will have to wait before getting an image to look at.

For example, if a Task renders a frame in 10 hours on a 32 core machine, the artist would get the result no earlier than 10 hours after Job submission.

One approach to speeding up the performance of such Jobs is throwing more render power at the problem.

The brute-force approach would be to select a c5.24xlarge instance with 96 vCPUs. But this approach has its obvious limits: The 96 cores will likely render about 3x faster than the 32 cores, but that is still over 3 hours of waiting time for the artist!

Deadline offers the ability to split a single frame into multiple sub-regions and compute each of them on a separate render node.

This feature is available as an option in the integrated Deadline submitters for all major 3D digital content creation (DCC) applications like Autodesk 3ds Max, Autodesk Maya, SideFX Houdini, MAXON Cinema4D, and Foundry Modo.

The resulting image segments, also known as Tiles, are then assembled by a dependent Draft job (Draft is a simple compositing application that ships with Deadline). This approach not only lets you get the render results much faster, it also works around possible memory limitations of some of the above-mentioned DCC applications.

In our example, if we would split the render into 10×10 tiles and launch 100 render nodes to process the image, the result would come back in less than 10 minutes. The theoretical minimum would be 6 minutes, but we have to consider some overhead and the time required by Daft to assemble the final image.

What’s next?

In Part 3 of this blog series, we will look at the AWS Portal-specific considerations related to performance optimization.

AWS for M&E Blog