Cloud rendering using Pixar’s Tractor on AWS
Rendering is the process of converting 3D models into 2D images that can be displayed on screen and is a critical part of making almost any movie these days. Rendering is resource-intensive, requiring powerful computational processing. With 8K resolution coming online, the demand for compute resources for rendering is at an all-time high. When the resolution of an image doubles, the surface area quadruples. This is particularly relevant for video rendering, which is a time-consuming process. Long render times can be mitigated by adding capacity to a render farm.
Cloud rendering provides a great option of bursting into the cloud when additional capacity is needed quickly. A cloud render farm is more agile, more scalable, and more elastic than on-premises equipment. Customers can procure resources in a matter of minutes using the cloud, while it may take weeks to physically build a render farm.
Pixar’s Tractor is a popular solution for network rendering, capable of scaling up to the largest render farms. This blog post demonstrates how to integrate Pixar’s Tractor with AWS, using services such as AWS Direct Connect, Amazon EC2 Spot Fleet, Amazon Elastic File System, AWS FXs File Gateway, Amazon FSx for Windows File Server, Amazon API Gateway, AWS Lambda, AWS Directory Service for Microsoft Active Directory, and others. Review the Studio in the Cloud implementation guide for additional information.
As a first step, network connectivity between an on-premises environment and AWS needs to be established. AWS Direct Connect (1) is a preferred option, as it can provide up to 100 Gbps Dedicated Connection.
AWS Site-to-Site VPN can be used in the proof of concept (POC) phase. It is important to keep in mind that the maximum bandwidth for a Site-to-Site VPN tunnel is 1.25 Gbps. This limitation may affect the ability to burst to the cloud, as the network will start saturating relatively quickly. While customers can scale their VPN throughput by using AWS Transit Gateway and multiple VPN tunnels, the general recommendation is to use AWS Direct Connect in production.
Bursting into the cloud
A Spot Fleet (2) of Amazon Elastic Compute Cloud (Amazon EC2) burst rendering instances is used to meet render demand.
Spot Fleet is deployed into an Amazon Virtual Private Cloud (Amazon VPC) that is completely isolated from the internet to accommodate content-aware rendering and encrypted asset storage.
Each Amazon EC2 Spot Instance is provisioned from a prebaked Amazon Machine Image (AMI) (3), containing software needed for the instance to be a part of the render farm fleet. Each Amazon EC2 Spot Instance has a Tractor Blade process running. Spot Instance needs to be registered with the Tractor Engine (4) that is running on-premises to be able to participate in rendering work distribution.
Extending on-premises file system to the cloud is the next piece of the puzzle. The models that artists are working on (as well as any other studio data, or render assets) need to be available to the burst instances to finish the render successfully.
The previous diagram depicts a possible solution for Server Message Block (SMB) file shares. Amazon FSx File Gateway (6) allows on-premises access to Windows file shares stored on FSx for Windows File Server (7). An FSx File Gateway appliance (5) is deployed on-premises, and acts as a local cache of frequently used files, enabling faster performance and reduced data transfer traffic. Since FSx for Windows File Server is a native SMB protocol implementation, features like Distributed File System (DFS), server-side copy, and client-side caching are natively compatible with existing applications. Artists see the appliance as a network file share, which allows them to preserve existing workflows. Artists save their work to the network file share, and the data becomes available in FSx File System. Amazon EC2 Spot Instances have FSx for Windows File Server shares mounted during startup.
Amazon EC2 Spot Fleet (Tractor Blades), FSx for Windows file shares, and EFS file shares are deployed into a VPC that is isolated from the public internet. Yet, the software that is running on the Tractor Blades might need to go out to the public internet. This requirement is addressed by establishing a VPC peering connection (13) between the Render VPC and the Management VPC.
Management VPC is hosting a License Server (14) for any possible licensing requirements. Internet access is provided via redundant NAT Gateways (15). Internet Gateway (16) is a highly available component that allows communication between Management VPC and the internet.
Single Availability Zone (AZ) is leveraged throughout the deployment. The main reason behind this decision is cost. A rendering process is internal to a company. It is completely different from customer-facing workloads where downtime is not tolerable. Most rendering jobs support checkpoints, as well as pause-and-resume functionality. This means costs can be cut significantly by leveraging a single AZ. Single AZ also allows customers to avoid cross-AZ data transfer charges.
One thing to keep in mind is storage. If the primary data store is on-premises and the data is replicated to AWS, single-AZ deployment works really well. On the other hand, opting for multi-AZ Amazon FSx for Windows File Server is a reasonable choice if your assets are stored natively in AWS. Services like Amazon S3, or Amazon EFS are multi-AZ by design (with Amazon EFS having a single-AZ option).
Another potential reason for a multi-AZ deployment is to tap into extra compute capacity.
As of April 1, 2022, the inter-AZ data transfer within the same AWS Region for AWS PrivateLink, AWS Transit Gateway, and AWS Client VPN is free of charge. This makes multi-AZ deployment with an AWS Transit Gateway as a Regional virtual router a viable architectural option.
Putting it all together
The following steps are involved in a typical rendering workflow:
- Administrators provision additional Tractor Blades by issuing a request to Amazon API Gateway.
- API Gateway is backed by AWS Lambda. Lambda functions are responsible for spinning up/tearing down a Spot Fleet.
- Lambda functions have business logic that checks the limits imposed by the company. Then the function does all the preparation work necessary to spin up additional Spot Instances.
- Each Spot Instance is provisioned from a prebaked Amazon Machine Image (AMI), containing software needed for the instance to be a part of the render farm fleet.
- Each Spot Instance has a Tractor Blade process running. Spot Instance needs to register with the Tractor Engine to be able to participate in render work distribution.
- Amazon Route 53 is used for Hybrid DNS resolution. This is needed to establish seamless Tractor Engine discovery process for Tractor Blades.
- NFS: Amazon EFS file shares are mounted on the Spot Instances.
- SMB: Amazon EFS FSx for Windows file shares are mounted on the Spot Instances.
- SMB: FSX for Windows File Server is configured using AWS Microsoft AD. A Trust relationship is setup between on-premises Active Directory and AWS Microsoft Managed AD.
- Tractor Engine (running on-premises) sees new Tractor Blades as ready. Tractor Engine can now distribute the work between a fleet of on-premises instances and Spot Fleet in the cloud.
- Artists can initiate render jobs directly from their workstations.
The previous architecture represents only a high-level conceptual solution. AWS customers often rely on the AWS Partner Network (APN) to find a partner that has experience and competencies in implementing a solution.
DeadDrop Labs is a good candidate to assist you in your cloud journey. DeadDrop Labs is a member of the AWS APN with a proven track record of implementing Media Data Pipelines, Burst & Overflow Rendering solutions (including custom integrations with products like Pixar’s Tractor), and virtual workstation environments. In addition, DeadDrop Labs has extensive experience in media production, image processing, workflows, and overall systems design.
This post describes how to integrate Pixar’s Tractor with AWS, using services such as AWS Direct Connect, Amazon EC2 Spot Fleet, AWS FXs File Gateway, Amazon FSx for Windows File Server, Amazon API Gateway, AWS Lambda, AWS Directory Service for Microsoft Active Directory, and others.
It also provides information about the AWS Partner Network and possible AWS Partners that are able to help with the Tractor Pipeline implementation for Canadian Studios.
Explore https://aws.amazon.com/media/content-production/ to learn more.