AWS Open Source Blog
Run Open Source FFMPEG at Lower Cost and Better Performance on a VT1 Instance for VOD Encoding Workloads
FFmpeg is an open source tool commonly used by media technology companies to encode and transcode video and audio formats. FFmpeg users can leverage a cost efficient Amazon Web Services (AWS) instance for their video on demand (VOD) encoding workloads now that AWS offers VT1 support on Amazon Elastic Compute Cloud (Amazon EC2).
VT1 offers improved visual quality for 4K video, support for a newer version of FFmpeg (4.4), expanded OS/kernel support , and bug fixes. These instances are powered by the AMD-Xilinx Alveo U30 media accelerator. Xilinx has the ability to add a single line into FFmpeg, and enable the Alveo U30 to do the transcoding work. The Xilinx Video SDK includes an enhanced version of FFmpeg that can communicate with the hardware accelerated transcode pipeline in Xilinx devices to deliver up to 30% lower cost per stream than Amazon EC2 GPU-based instances and up to 60% lower cost per stream than Amazon EC2 CPU-based instances.
Companies typically use EC2 CPU instances such as the C5 and C6 coupled with FFmpeg for their VOD encoding workloads. These workloads can be costly in cases where companies encode thousands of VOD assets. The cost of an EC2 workload is influenced by the number of concurrent encoding jobs that an instance can support and this subsequently affects the time it takes to encode targeted outputs. As VOD libraries expand, companies typically auto scale to increase the size or number of C5 and C6 instances or allow the instances to operate longer. In both cases, these workloads experience an increase in cost. Important note: There is no additional charge for AWS Auto Scaling. You pay only for the AWS resources needed to run your applications and Amazon CloudWatch monitoring fees.
Amazon EC2 VT1 instances are designed to accelerate real-time video transcoding and deliver low-cost transcoding for live video streams. VT1 is also a cost effective and performance-enhancing alternative for VOD encoding workloads. Using FFmpeg as the transcoding tool, AWS performed an evaluation of VT1, C5, and C6 instances to compare price performance and speed of encode for VOD assets. When compared to C5 and C6 instances, VT1 instances can achieve up to 75% cost savings. The results show that you could operate two VT1 instances for the price of one C5 or C6 instance. Additionally, the VT1 XMA (Xlinix U30) codec completed the targeted outputs 15.709 seconds faster than the C5 x264 (CPU) codec and 12.582 seconds faster than the C6 instance when transcoding an adaptive bitrate (ABR) stack for a 13 second 4K VOD file.
Benchmarking Method
First let’s determine the best instance type to use for our VOD workload. C5 and C6 instances are commonly used for transcoding. We used and C6i.8xl instances and compared them against VT1.3xl instances to transcode 4K and 1080p VOD assets. The two assets were encoded into the output targets as shown in the next section on Evaluation ABR Targets. The VT1.3xl, C5.9xl, and C6i.8xl output targets were measured against the amount of time it took to complete the encode.
As shown here in the screenshot from the AWS console for various instance types, VT1.3xl is the smallest instance type in the VT1 family. Even though VT1.6xl compares closely to C5 and C6 in terms of CPU/memory, we chose VT1.3xl for a closer price/performance comparison.
Input data points
Sample input content
The following table summarizes the key parameters of the source content video files used in measuring the encoding performance for the benchmarking tests.
Clip Name | Frame Count | Duration | Frame Rate | Codec | Resolution | Chroma Sampling |
1080p | 43092 | 12 mins | 60 | H.264, High Profile | 1920×1080 | 4:2:0 YUV |
4K | 776 | 13 secs | 60 | H.264, High Profile | 3840×2160 | 4:2:0 YUV |
Evaluation Adaptive Bit Rate (ABR ) targets
Adaptive bitrate streaming (ABR or ABS) is technology designed to stream files efficiently over HTTP networks. Multiple files of the same content, in different size files, are offered to a user’s video player, and the client chooses the most suitable file to play back on the device. This involves transcoding a single input stream to multiple output formats optimized for different viewing resolutions.
For the benchmarking tests the input 4K and 1080K files were transcoded to various target resolutions that can be used to support different device and network capabilities at their native resolution: 1080p, 720p, 540p, and 360p. The bitrate (br) in the graphic shown here indicates the bitrate associated with each pixel. For example a 4K input file was transcoded to 360p resolution and 640 bitrate.
Output results
Target duration analysis
The VT1.3xl instance completed the targeted encodes 15.709 seconds faster than the C5.9xl instance and 12.58 seconds faster than the C6.8xl instance. The results in the following charts detail that the VT1.3xl instance has better speed and price performance when compared to the C5.9xl and C6i.8xl instances.
% Price Performance = {(C5/C6 Price Performance – VT1 Price Performance) /C5/C6 Price Performance} * 100
% Speed = { (C5/C6 duration – VT1 duration) /C5/C6 duration } * 100
H.264 4K Clip (3 seconds duration) | Instance Type | ||
VT1.3xl | C5.9xl | C6i.8xl | |
Codec | mpsoc_vcu_h264 | x264 | x264 |
Duration to complete ABR targets (See Evaluation Targets Below) | 14.47632 | 30.186221 | 27.05856 |
Speed Compared to VT1.3xl (%) | N/A | 52.043284 % | 46.500035 % |
Instance Cost ($/hour) | $0.65 | $1.53 | $1.22 |
Instance Cost ($/second) | $0.00018 | $0.00043 | $0.000338 |
Price Performance: $/(clip transcoded) | $0.002605 | $0.01298 | $0.009145 |
Price Performance Compared to VT1.3xl (%) | N/A | 79.930662 % | 71.514488 % |
H.264 1080p Clip (12 minutes duration) | Instance Type | ||
VT1.3xl | C5.9xl | C6i.8xl | |
Codec | mpsoc_vcu_h264 | x264 | x264 |
Duration to complete ABR targets (See Evaluation Targets Below) | 490.82074 | 837.20001 | 762.63252 |
Speed Compared to VT1.3xl (%) | N/A | 41.373538 % | 35.641252 % |
Instance Cost ($/hour) | $0.65 | $1.53 | $1.22 |
Instance Cost ($/second) | $0.00018 | $0.00043 | $0.000338 |
Price Performance: $/(clip transcoded) | $0.0883 | $0.3599 | $0.2577 |
Price Performance Compared to VT1.3xl (%) | N/A | 75.465407 % | 65.735351 % |
The following section explains the encoding parameters used for testing. FFmpeg was installed on 1 x C5.9xl, 1x C6i.8xl, and 1 x VT1.3xl instances. The two input files as mentioned in sample input files were run in parallel on each instance type, and the total duration to complete the transcoding to various output target resolutions was calculated.
Technical specifications
- EC2 Instances: 1 x C5.9xl, 1x C6i.8xl, 1 x VT1.3xl
- Video framework: FFmpeg
- Video codecs: x264 (CPU), XMA (Xilinx U30)
- Quality objective: x264 faster
- Operating system
- For C5 and C6 – x264, Amazon Linux 2 (Linux kernel 4.14)
- For VT1- Xilinx: Amazon Linux 2 (Linux kernel 5.4.0-1038-aws)
Video codec settings for encoding performance tests
Instance | C5, C6 | VT1 |
Codec | x264 | mpsoc_vcu_h264 |
Preset | Faster | n/a |
Output Bitrate (CBR) | See Evaluation Targets above | See Evaluation Targets above |
Chroma Subsampling | YUV 4:2:0 | YUV 4:2:0 |
Color Bit Depth | 8 bits | 8 bits |
Profile | High | n/a |
Conclusion
Amazon VT1 EC2 instances are typically used for live real-time encoding; however, this blog post demonstrates VT1 VOD encoding and price performance advantages when compared to C5 and C6 EC2 instances. VT1 instances can encode VOD assets up to 52% faster, and achieve up to 75% reduction in cost when compared to C5 and C6 instances. VT1 is best utilized in workloads with VOD encoding jobs that require low encoding speeds in the time it takes to complete outputs. Please visit the Amazon EC2 VT1 instances page for more details.