AWS Media Blog

Guest post: Machine learning and corporate videos in today’s business video world

Guest post by Steve Vonder Haar, senior analyst with Wainhouse Research covering the enterprise video industry. The content and opinions in this post are those of the third-party author and AWS is not responsible for the content or accuracy of this post.

The Reese’s Peanut Butter Cup Effect seems to be in full force in today’s enterprise.

For this little analogy, it’s machine learning and online video that are playing the role of chocolate and peanut butter: Two great tastes that unexpectedly taste great together.

To some, it may seem like an unlikely pairing. Machine learning is becoming a catch-all of sorts for describing how algorithms can absorb and sort through massive amounts of data to help individuals make more informed decisions. Video offers a venue for executives to communicate in a more engaging way with large, far-flung audiences.

Put them together though, and you have the ingredients for a technology blend that helps organizations get far more accomplished than would be possible implementing each of these technologies on a stand-alone basis.

Machine learning unlocks the value of video in a way that few other technologies can do on a cost-effective basis. It automates the conversion of the video’s narrative into a text format that helps computers make sense of this untapped, rich vein of information. By creating accurate transcriptions, machine learning boosts video’s accessibility by enabling enhanced searchability and real-time translation.

Low Awareness of the Benefits of Machine Learning for Corporate Streaming

Given the potential for such profound change made possible by the combination of machine learning and video, corporate awareness of the promise for bringing these capabilities  together remains startingly low.

One of the big challenges facing technology vendors today is that many end users remain largely unaware of machine learning capabilities that are making their way to the market today. Many workers, for instance, are not familiar with machine learning-driven solutions that convert speech from videos automatically into written text.  As illustrated in Figure 1, more than seven out of 10 of individuals participating in a WR survey of more than 2,000 end-users in the fourth quarter of 2018 say they agree with the statement that “I had not previously been aware that such “speech-to-text” features were available for use with business video.”

In this case, though, ignorance is not bliss. Those that begin to develop an understanding of how machine learning can work hand-in-hand with video have an edge in identifying how the combined technologies can be implemented to create business advantage.

Applications of ML-driven Automated Speech-to-Text and Translation

To comprehend the power of the convergence of machine learning and video, one must first understand the capabilities of “speech-to-text” solutions. Software-based speech-to-text capabilities make it possible to automatically convert the words spoken during a video presentation into text that can be presented as on-screen subtitles. While such captioning makes it possible for those with hearing loss to follow along with a video presentation, it offers other benefits, as well. When a video has on-screen subtitles, it offers viewers multiple modes for absorbing – and understanding – its message.

Such automated transcriptions, which can be produced for pennies on the dollar compared with traditional manual transcription services, lay the foundation for a wide array of add-on digital applications. Essentially, all of the software solutions that have made it possible to search and manipulate text and images in the cyber realm now can be put to work in enhancing the value (and accessibility) of online video content, as well.

Take the case of automatic translation services. For years, large technology vendors have worked on improving the accuracy of solutions that can take text-based input in one language and translate it into text in another language. When “speech-to-text” is applied to video, the text output can be submitted to translation services to create foreign language captioning for video presentations on a real-time basis.

High Interest Levels in Translating Speech-to-Text in Corporate Streaming Videos

When Wainhouse Research asked survey respondents about their interest in leveraging speech-to-text to create foreign-language translations of video content, workers reported substantial interest in these capabilities. Among all 2,002 respondents participating in the fourth quarter 2018 survey, 35% say they “strongly agree” with the statement “I believe that my organization would be highly interested in leveraging speech-to-text output to generate foreign-language closed caption content for our videos. Another 41% said they “somewhat agree” regarding their organization’s interest in speech-to-text enabled translation.

Interest levels in these capabilities skyrocket among those at the highest levels of the corporate hierarchy. Among C-Level executives surveyed by Wainhouse Research (refer to Fig. 2), 63% say they “strongly agree” that they would be interested in leveraging speech-to-text solutions to create foreign-language translations. Only one-in-10 of these C-Level executives cited no interest in these translation capabilities.

It’s little wonder that interest in these speech-to-text capabilities are so high. At their core, these automated approaches to captioning and translation enable a form of global outreach and communications that would not be possible in any other medium without significant investments in translators and other support personnel.

Benefits to Global Companies

Armed with these technical translation capabilities, multinational companies can extend the reach and accessibility of messages from top leadership. An all-hands employee meeting that previously would be of interest to employees speaking the same language as the CEO becomes relevant even to foreign employees who speak a different language. Those workers can watch the same presentation and understand the messages from top executives via translated captions while still experiencing the tone and energy of company leadership as they share their message.

Indeed, for companies with global operations, the process of combining speech-to-text capabilities with video can foster a whole range of enhancements in communicating with their worldwide audiences. When used in employee training, product launches or industry briefings, video enhanced with translated captions has a broader reach with the power to disseminate an organization’s message in a more-standardized manner than would otherwise be possible.

Of course, we would be selling machine learning short if we view it only as a tool that drives increased access to video and boosts viewership around the globe. It also can be employed to personalize content consumption, streamlining the process of getting the right videos to the right workers at the right time.

So, we may just have to look beyond peanut butter cups. The blend of machine learning and video just may offer enough to create a full five-course meal of business benefit. In our next blog posting, we’ll do a deeper dive into how machine learning can be put to work to optimize the value of the video that workers are watching.

AWS Editorial note: for information on AWS’ Live Streaming with Automated Multi-Language Subtitling solution which helps customers automate the creation of captions in multiple languages for live streams, click here.

John Lai

John Lai

John Lai is an Industry Solutions Marketing Manager for AWS Elemental