AWS Machine Learning Blog

How Astro Built Astrobot Voice, a Chatbot for Email

This is a guest post by Roland Schemers, CTO of Astro Technology, Inc. Astro, in their own words, “creates modern email apps for Mac, iOS and Android, powered by artificial intelligence, built for people and teams. With Astrobot Voice, an in-app email voice assistant, you can now read, manage, and reply to emails without leaving Astro’s apps.”

Recently, Astro launched Astrobot Voice, the first in-app email voice assistant. This means you can now read, manage, and reply to emails without leaving Astro’s iOS or Android apps.

After Astro launched an Amazon Alexa skill in June, we were eager to enable more people to manage email with voice. In this post, we’ll give a technical breakdown of why we went down this path, how we accomplished this, and what technology we used.

Why build in-app voice?

We’re owners and fans of Amazon Echo—we actually give an Echo Dot to every new Astro employee, as a way to say “welcome” and to dogfood our own Alexa skill .We saw the success of the skill, and thought of ways we could engage with more people, in more places. So we decided to explore the feasibility of building in-app voice.

Selecting software

When determining how to build in-app voice, we considered a number of options, but had a few goals in mind:

  1. Reuse as much code and logic as possible from either our text-based assistant (running on api.ai) or our Alexa skill.
  2. Create a smooth user experience with accurate voice recognition.
  3. Let the server do the heavy lifting.

The first goal was important given our timeline and engineering resources. We are a small startup and time savings like these go a long way.

The second goal of creating a smooth user experience was particularly challenging. Amazon Alexa has a definite leg up on Natural Language Processing due to scale. So, in trying to make an experience that felt accurate, we knew we wanted to leverage the AWS services and the deep learning technologies behind them.

For the third goal, we knew Astrobot Voice required a combination of OS-level APIs and server-side development. For our initial implementation, we decided to make sure the server was doing most of the heavy lifting, while keeping in mind costs. The benefits of the server doing the most of the work include shared code for both our iOS and Android apps, and also the ability to make changes to the flow on the server without needing to push updated versions of the Astro apps to the app stores.

The stack

iOS

For our iOS API for voice recognition, we used AVSpeechSynthesizer and SFSpeechRecognizer. SFSpeechRecognizer is only available for iOS 10+, so Astrobot Voice is only available on iOS 10 and 11. This could be a limiting factor for some app developers, but works for us.

Android

For Android, we used the standard Android API for voice recognition, which includes Speech Recognizer and Text To Speech.

For both iOS and Android, we had the option of sending the server recorded text or a text string. We decided to go with the latter option due to cost, time constraints, and latency.

Server

On the server side, we used Amazon Lex. Choosing Amazon Lex over api.ai let us reuse and share of a lot of the same logic we already had for Alexa. While we could have reused some of the logic for the text-based version of Astrobot, ultimately we decided we’d save more time and provide a better experience using Amazon Lex. We estimate this saved us 2-4 weeks of a single developer’s time. As we further develop Astrobot Voice and our Alexa skill, we’ll continue to save time due to this decision.

In the future, when we offer a paid version of Astro (currently our apps are free), we plan to take advantage of more AWS services by replacing our on-device speech recognition with Amazon Lex for voice input and Amazon Polly for voice output. This will improve the quality of the bot experience.

Here’s the flow and an architecture diagram of how these services work together to create the Astrobot Voice experience:

Advice to in-app voice developers

First, make sure your service or app lends itself to voice. Adding voice to certain apps right now likely isn’t a priority given the state of voice. Despite the availability of natural language understanding and automatic speech recognition, voice still isn’t the default way of using apps. So, your use case needs to be a fairly obvious one to get traction and bubble to the top of your product roadmap. We saw a very clear use case for email and voice at home, for example, while getting ready for the day or in the car.

Second, we recommend really considering the technology you use, and not reinventing the wheel yourself. There are lots of services and resources to make the development process easier and to get an MVP out into the world. The flipside of that is making sure you have good abstraction on the server side, so you’re not locked into a particular service for intent detection. Since these services are still fairly new and, therefore, constantly evolving, you might eventually (or even quickly) need to switch services. For Astrobot (both voice and text based) we’ve tried Luis.ai, wit.ai, api.ai, and now Amazon Lex without needing to make significant changes to our server logic.

We’re excited to be the first email app with a voice assistant built in, and look forward to seeing other apps make progress on voice. In many cases, voice is a much faster way to retrieve information and create new information and we’re anxious to see what’s next.