AWS for M&E Blog

Converting frame rates in the cloud with InSync FrameFormer and AWS Elemental MediaConvert

Authored by Paola Hobson, Managing Director at InSync Technology, Ltd. The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

 

Frame rate conversion

Since the earliest days of TV broadcasting, global viewers have wanted to watch live breaking news, international sporting events, entertainment, and cultural programming from other countries. However, simple international program exchange is difficult due to the huge differences in TV standards around the world. Similarly, delivering movies in formats suitable for home TV viewing is complex and inherently requires both format and frame rate conversion for audiences in all regions.

Format and standards converters are necessary to maintain the most faithful reproduction of the original content for each audience, at least to the best approximation available for the viewer’s display device. Improper standards conversion leads to blurred content, motion judder, and jagged diagonals. Dissatisfaction with viewing quality contributes directly to subscriber churn and loss of operator revenue. High-quality frame rate conversion is essential for every content owner seeking global monetization of their assets. This blog explains how to achieve consistent high-quality results using motion-compensated frame rate conversion from FrameFormer and AWS Elemental MediaConvert.

Why do we need frame rate conversion?

You perceive motion when watching a movie or video content because your viewing device displays a certain number of individual images, or frames, each second on the screen to create the illusion of motion. In the US, for example, your home TV screen refreshes at around 29.97 frames per second (fps).

If you’re watching a US-produced TV show or a made-for-TV movie, the content was produced at 29.97 fps. That exactly matches your home TV display and you enjoy the subjective effect of smooth motion.

Conversely, in Europe, the typical television display standard is 25 fps. Similarly, if you’re watching content on a laptop, tablet or smartphone anywhere in the world, the screen refresh rate could be anything from 60 to 120 fps, while gaming displays might be 144 fps and above.

Content creators produce video in a huge range of frame rates including 23.98, 24, 25, 29.97, 30, 50, 59.94, and 60 fps. Frame rate conversion is required any time the frame rate used in production and the frame rate of the display device are different. A show produced at 29.97 fps will not display correctly on 25-fps display devices, for example, without frame rate conversion (standards conversion). Poorly done standards conversion causes visible artifacts including irregular or jerky motion, pulsing or flickering of detailed areas of the scene, object shadowing or “ghosting,” or other visible defects.

Low complexity frame rate conversion

Broadcast services use specific frame rates. European TV broadcasters transmit video at 25 frames per second, and European TV viewers will have 25-fps sets in their homes. If a European broadcaster were to air an American talk show produced at 29.97 fps, it would be theoretically possible for the European broadcaster to modify the metadata in the video file so that it would appear to be a 25-fps programme. This would, however, cause the content to play back more slowly and would adversely affect the speed of motion of the objects in the scene. It would also distort the audio playback. This practice of “off-speeding” is used in specific, limited situations with only small frame rate changes, such as from 23.98 fps to 25 fps. In these instances, the resulting change in audio is imperceptible. The broadcaster would need to accept a 4% change in program length, however.

When streaming video to mobile devices, the frame rate could be any of those listed above. The streaming service relies on the software player to match the source frame rate to the device frame rate. For these frame rate conversions, very simple frame rate converters simply copy frames from input to output where the input and output presentation positions would be closely related in time, and skip or duplicate frames to maintain the required overall frame rate. AWS Elemental MediaConvert includes an option under the frame rate conversion algorithm menu list to drop duplicate frames (Figure 1). AWS Elemental MediaConvert is a file-based video transcoding service with broadcast-grade features. It allows you to easily create video-on-demand (VOD) content for broadcast and multiscreen delivery at scale.

Figure 1: Drop-duplicate option

 Figure 1: Drop-duplicate option

 

This drop/repeat approach is useful for certain situations but can introduce problems in other situations. These drawbacks include unnatural and discontinuous motion, audio artifacts in which audio packets are lost or repeated at the frame skip/repeat, and corruption of metadata. As an example, closed caption packets could be lost or repeated. While these might not be an issue for user-generated video or enterprise video applications, professional video applications typically require other, more complex techniques.

Motion-compensated conversion – no compromises

Linear interpolation is another simple way to create new frames. MediaConvert includes an interpolate option as shown in Figure 1 above. In the simplest sense, linear interpolation uses pixels from two input frames to create the pixels in a new frame, lying temporally between them. Simple linear interpolation using a weighted sum of existing pixels to generate new output pixels gets round some of the problems associated with frame drop/repeat, but can still cause picture quality problems such as blur, loss of resolution, and unnatural movement of objects, also known as judder.

Even allowing for motion adaptation, where different processing is applied in stationary and moving areas of the image, linear frame rate conversion is only a compromise. In practice, real-life images are rarely entirely stationary, and common effects such as lighting changes and image noise can all contribute to false detection of motion, which inevitably reduces quality in the output image.

The most reliable way to achieve high-quality frame rate conversion and avoid undesirable visual or audible artifacts is to use motion compensation. A motion-compensated frame rate converter calculates the motion between frames in the content, and works out where to move objects when creating new frames in between, as illustrated in Figure 2.

 

Figure 2: Illustration of motion-compensated frame rate conversion

Figure 2: Illustration of motion-compensated frame rate conversion

 

As shown in Figure 2, if we can calculate the change in the object’s position between frames 1 and 2, and we know the time interval between those frames, and if we assume that the object moves at a constant speed, we can work out where the object should be at any other time interval. A motion-compensated converter can thus reproduce the object within any interpolated or re-timed frames. In this way, all picture objects remain sharp and in focus, and their motion is presented smoothly without judder or irregular movement.

FrameFormer motion-compensated conversion

MediaConvert includes the option of using FrameFormer to perform high-quality frame rate conversion using sophisticated processing to ensure consistent results. Creating new pictures at different temporal intervals requires extremely accurate calculation of the motion vectors. In the simple case illustrated in Figure 2, a single object is moving at a constant speed in front of a plain background. In practice, real TV programs contain multiple objects that move at various speeds, occlude each other, move into and out of the scene, and move into and out of the camera field of view. Even the assumption of constant speed is a huge simplification.

 

Figure 3: Illustration of different types of motion in a typical scene

Figure 3: Illustration of different types of motion in a typical scene

 

Figure 3 illustrates an example some typical motion, where arrows of various lengths show how each person is walking in different direction and at different speeds. People cross one another, come into and out of the shot, and some walk towards the camera and appear to get larger while others walk away from the camera and appear to get smaller. Furthermore, although we cannot see this in a still image, the camera could be panning across the shot, meaning that there is global as well as local motion. Typical scenes also include rotating objects, camera zoom, and special effects. Further complications arise from moving graphics superimposed over the picture, as in the case of titles and credits.

FrameFormer uses Phase Correlation to estimate motion. The Phase Correlation method uses the Fourier transform to convert image data into the frequency domain. A Fourier transform is a mathematical process that decomposes a time signal into its constituent frequencies. A two-dimensional Fourier transform applied to image data (spatial domain) provides information about the picture’s vertical and horizontal phase and frequency detail. The magnitude values are normalised so that they all contribute equally. The size and direction of motions present can then be obtained by subtracting the phases obtained from two sequential frames and transforming the result back to the spatial domain.

Phase Correlation is insufficient to perform motion compensation by itself. This method identifies motions present within the image but does not define which areas in the image have that motion. Therefore, the measured motions must be mapped to specific regions within the image that typically represent separate real-world objects.

Motion estimation is more difficult with noisy, low resolution, or blurred content. Other content properties such as shot changes, concealed and revealed picture elements, brightness changes, and the presence of abrupt picture boundaries add additional complexity the motion estimation analysis. FrameFormer applies a variety of proprietary processing steps to improve accuracy and resilience and accounts for these typical content challenges.

Using FrameFormer

In MediaConvert, when you select an output frame rate that is different from the input frame rate, you have access to the frame rate conversion algorithm menu list. Simply choose FrameFormer for motion-compensated frame rate conversion.

Setting frame rate

FrameFormer generates an output sequence in exactly the frame rate needed for your workflow. Dealing with fractional frame rates requires additional precision. The frame rate often referred to as “59 fps” is actually 60/1.001 fps. If you enter “59” into the framerate selection field, FrameFormer generates an output sequence at exactly 59 fps. Unexpected errors can occur when presenting 59 fps to a 60/1.001 fps display device or when ingesting 59 fps into a 60/1.001 fps workflow for further editing. These errors include motion discontinuities or failure to play the video.

As shown in Figure 4, you can choose a frame rate from the drop-down list, or if you are working with a fractional frame rate, you can enter the fraction in the available fields. For example, you would enter 60000/1001 for 59.94 fps interlaced content.

Figure 4: entering fractional frame rates

Figure 4: entering fractional frame rates

 

Conclusion

Since frame rate conversion is critically important to maintaining picture quality when distributing content globally, it’s important to use a professional solution such as FrameFormer within MediaConvert. FrameFormer brings the all-important elements of motion-compensated frame rate conversion from within the MediaConvert console. FrameFormer is the result of many years of standards conversion experience, giving you confidence in achieving the result your viewers demand.