Choosing the right Amazon EC2 Instance types for rendering with Thinkbox Deadline (Part 2)
Previously, I published a post detailing the differences between the Amazon EC2 instance types available on AWS. This time, I’m not going to focus on the differences between the instance types. Instead, I focus on how to pick the right instance type for you.
One size fits all?
I’d love to be able to end this post right now by saying you should use ‘c4.2xlarge’ (a compute optimized instance with 8 CPUs and 15 GB RAM) for all of your instances and you’ll never run into any problems and it’ll all be great. But I can’t.
It turns out that picking an instance type requires a bit more finesse, fine-tuning, and experimentation to achieve the best results. There isn’t a one-size-fits-all answer. The right instance type isn’t going to be the same for each studio or even between jobs. If you take the time to figure out which instance types are the right ones for the jobs that you are rendering, you’ll see the best results.
Too big, too small, just right
To help chose an instance type that’s right for you, I highly recommend doing some experimentation. Take a scene that is similar to the kinds of scenes you send to the cloud. I’d grab a scene that has an average amount of geometry, assets, simulations, etc. I’d also recommend getting a test scene that you’re familiar with. That way, you should be able to tell if the rendering is taking longer than expected.
If you have an on-premises farm, take a few test frames from your scene and render them on a few of your local render nodes. In Deadline Monitor, if you look at the task panel for your test job, you can see the peak and average RAM and CPU usage for your frames. Based on these numbers, you should be able to determine an instance type to fit your job.
In the above example, my Autodesk Maya render didn’t use much RAM. It topped out at 20% but usually was sitting around 10%. However, the CPU was used more heavily, hitting up to 98% usage and averaging 38%. For these tests, I was using an 8-core machine with 15 GB of RAM. Based on these numbers, I should look at scaling back on my RAM.
With all this in mind, I ran the same test scene using a few different instance types on AWS. In my sample case, I tried a c4.2xlarge, m4.xlarge, and c4.xlarge. These instance types range from 4 CPUs to 8 CPUs and have 7.5 GB to 16 GB of RAM. The m4.xlarge type is a General purpose instance type, while the other two are Compute optimized. I chose these types to compare the results I got from dialing the CPUs and RAM up and down.
After the renders finished, I again looked at the RAM and CPU utilization on those machines. I also looked at total task time, startup time, and render time, to see where I was spending the most time. If most of the time is spent on startup, it probably means that the rendering application is taking a longer time to boot. Startup time should go down after the first frame renders on an instance. I recommend running a handful of frames on each instance to avoid that issue.
You can see that, in my second run-through, all the instances took a bit longer and used a bit more CPU/RAM than the first benchmark. Each frame was about 30s slower than the original render, but these instance types are also inexpensive. From here, I need to decide if I want to run more tests with smaller instance types or stick with one (or more) of these instance types. Alternatively, I could do a test with larger instance types (16 CPUs) to see how quickly they finish my render. The trick is finding the right balance between performance and cost.
It’s also a good idea to mix in some different categories of instance types. Start with General Purpose instances and then go with some Compute or Memory optimized ones depending on your needs. You can find details on all of the instance types on Amazon EC2 Instance Types. I like to use the On-Demand Instance pricing page because that lists every instance type on a single page and shows the On-Demand price too.
Try to err on the side of caution when it comes to those average usage percentages. If you start hitting 80% or 90% consistently, it’s probably not a good idea to go with a less powerful instance type. If you do, it can result in reduced performance, longer render times, or even failed renders.
If you don’t have an existing farm or on-premises machines that can effectively render your scenes, I suggest following the same approach as above to get your initial benchmarks using EC2 instances. Start with a General Purpose instance type that you think would work for your needs, and then go from there.
Rinse and repeat
This process is going to require at least a couple rounds of iteration and testing. Yes, you could pick the largest instance type and not run into any issues but that’s not going to be cost-effective. Through repeated testing, you should be able to find an appropriate instance type for your jobs without paying for more processing power than you really need.
There are many instance types to choose from and it’s only through repeated testing that you can tell which ones are right for you. Keep in mind that choosing an instance type is a process and the instance type that is right for you may change.
From time to time, AWS adds new generations of hardware. Even if you already have some instance types that work well for you, it’s usually worth benchmarking against new instance types when they become available. This should result in a much more efficient cloud farm.