AWS Machine Learning Blog

Amazon Polly powers Nexmo’s next-gen text-to-speech use cases

This is a guest blog post by Roland Selmer, Product Director, Voice and RTC at Nexmo, the Vonage API Platform. In their own words, Nexmo “enables enterprises to reimagine their digital customer experiences by providing them with the tools they need to easily communicate information to customers in real-time through text messaging, chat, social media and voice.”

As a cloud communications provider that allows businesses to integrate communications capabilities into their applications, Nexmo, the Vonage API Platform, needed a text-to-speech (TTS) solution to help deliver the many synthesized speech use cases we enable for our customers. The solution that we chose had to meet our technological requirements and product philosophy to power Nexmo’s global TTS offerings.

Amazon Polly met all these criteria perfectly. This powerful service is now the main engine at the core of all of Nexmo’s synthesized speech use cases, delivering broad coverage in languages and voices.

Nexmo use cases powered by Amazon Polly

At Nexmo, we’re big believers in voice as an interface for application-to-person (A2P) communication, and we have enabled our customers to implement this most natural of communication modes into their applications primarily using Amazon Polly as the backend. Specifically, our customers in various industries have been able to leverage Amazon Polly-powered TTS to achieve better business outcomes through these key use cases:

  • Voice broadcast
  • Critical voice alerts
  • Inbound call whisper
  • Failover voice delivery of PIN codes in two-factor authentication (2FA)

Voice broadcast:

The voice broadcast use case depends on Amazon Polly scalability and language support. Businesses can efficiently and cost-effectively engage large audiences around the world by broadcasting A2P marketing messages via outbound TTS phone calls.

A Nexmo customer’s voice broadcast application calls a list of phone numbers from a database. If those numbers are provisioned local numbers from Nexmo, every call recipient will see a local number as the caller ID, regardless of where the call was initiated. When they answer, they will hear the TTS message, which could include information that has been personalized for the recipient.

Critical voice alerts:

For TTS communications that are intended to elicit timely responses to critical issues—from disruptions to a customer’s service to internal business issues to even extreme weather that threatens the safety of large populations—voice-based critical alerts ensure worldwide delivery of critical messages via phone calls. The customer’s app initiates simultaneous calls to everyone who needs to know about an event that has happened or is about to happen. The app plays a recorded or text-to-speech message to convey the alert. The customer then has the option to enable the app to also track who has received the message via a simple interactive voice response (IVR) prompt that asks the recipient to press a key to acknowledge receipt.

Inbound call whisper:

In the inbound call whisper use case, businesses associate a Nexmo virtual number with a specific ad campaign. When a prospect makes an inbound call to one of the numbers, the business’s speech-enabled application routes it to an available agent and plays the agent an audible message about the campaign the caller is calling about before connecting the two parties. The agent can pull up the right ad campaign script and be prepared to engage the caller with the proper context, leading to a more effective interaction.

Failover voice delivery of PIN codes in 2FA:

Perhaps most notably, Amazon Polly plays a critical role in Nexmo’s full-service 2FA solution, Verify. Beyond authenticating new users through mobile phone verification—for example, by sending a PIN code to a prospective registrant who then enters the PIN into an app or web service—Verify can employ speech-enabled delivery for PIN codes when the initial text-based verification attempt fails.

Using our patented failover logic, Verify selects the optimal delivery channel and failover sequences including delivering the PIN code as a voice or TTS message.

Our customers have seen a marked improvement in 2FA conversions when moving to the speech-enabled Verify solution from a text-only one. For example, BitQuick, the leading cash-for-Bitcoin marketplace, was able to increase its order success rate from 35% to 55% by using Nexmo Verify, while also doubling the overall transaction volume during its initial 60 days of deployment.

The Nexmo platform gives customers the ability to programmatically augment their communications apps.  This allows developers to enhance any of the use cases I mentioned with features such as simple IVR to capture individual responses and feedback as well as business logic to retry calls or leave voice messages if the recipient doesn’t answer. With the Speech Synthesis Markup Language (SSML) support that Amazon Polly provides, developers can also manipulate the characteristics of their applications’ synthetic speech to make it sound more human. And the effectiveness of the given use case can also be monitored using reports in the Nexmo Dashboard.

Amazon Polly building block model aligns with Nexmo

Amazon Polly checks all the boxes for the technological specifications Nexmo required for our TTS use cases. The highly scalable AWS Cloud infrastructure, the high availability of the Amazon Polly service, and the broad language support made Amazon Polly a logical choice. But there’s a philosophical alignment between the Amazon Polly model and the Nexmo model that also make for a perfect match.

Just as Nexmo democratizes global telephony by abstracting its complexities and offering direct access to the infrastructure via APIs, Amazon democratizes the conversion of text to synthesized speech through the Amazon Polly service. By providing Amazon Polly through a REST API, AWS made it very easy to integrate into our service.

As a global platform, the broad language support is essential for Nexmo to offer its TTS use cases in the native languages of customers around the world.

Our per-second billing is another benefit we can offer our TTS customers, thanks in large part to the Amazon Polly pricing model. Because we pay only for what we use and don’t have to absorb any upfront costs, we are able to offer the same low-cost benefits to our customers.

As businesses increasingly use the voice interface to communicate with customers, we feel well prepared to power their text-to-speech use cases with Amazon Polly as our engine.