How to approach conversation design with Amazon Lex: Building and testing (Part 3)

In parts one and two of our guide to conversation design with Amazon Lex, we discussed how to gather requirements for your conversational AI application and draft conversational flows. In this post, we help you bring all the pieces together. You’ll learn how draft an interaction model to deliver natural conversational experiences, and how to test and tune your application.

This is the final post in a series on conversational design.

How to approach conversation design: The basics (Part 1)
How to approach conversation design: Getting started with Amazon Lex (Part 2)
How to approach conversational design: Building and testing (Part 3)

Draft an interaction model

In the second post of this series, you identified some use cases that you wanted to automate and wrote sample interactions between a user and your application. In this post, we use these use cases to build an Amazon Lex framework, called an interaction model, but first, let’s review some important definitions.

An interaction model defines the tasks that your application can help a user accomplish in a structure that a bot can understand. In your interaction model, you define the five major components for Amazon Lex: intents, sample utterances, slot names, slot values, and slot synonyms. You can make an interaction model (also called a bot definition) in whatever format works best for you and your team, such as in a spreadsheet or directly on the Amazon Lex console.

We start by defining intents. An intent is a user’s goal—what is the user trying to accomplish when they interact with your bot? Continuing with the banking bot example, a user goal might be something like blocking a card or opening a bank account. For example, in the following dialog, the user’s intent is to set up a recurring payment.

Turn	Participant	Dialog	Notes
1	Amazon Lex	Thanks for contacting Your Favorite Loan Company. What can I help you with today?
1	Customer	I wanna set up a recurring payment.
2	Amazon Lex	Your current balance is [$200.00]. That includes an overdue balance of [$100.00]. A payment of [$200.00] is due on October 15th. To set up a recurring payment, you’ll need the routing number and the account number. Do you have that information ready?	Can customers change their debit date? Or only the account?
2	Customer	No
3	Amazon Lex	Okay. I can wait. Let me know when you have the information.
3	Customer	… I’m ready.
4	Amazon Lex	To start, what’s the 9-digit routing number?
4	Customer	999-999-999
5	Amazon Lex	And what’s the account number?
5	Customer	1234 567 890
…	…	…

Each intent should come directly from a use case that you previously established in the earlier steps. We go into more detail about how to define intents later in this post.

Continuing on, a prompt label is a value defined by the application’s designer that maps to something the bot says.

A sample utterance is something the user says to the bot that is defined in the interaction model to help the bot classify customer intent. For example, if you’re creating an intent for opening a bank account, you’d likely want to include utterances like “open an account,” “help with opening an account,” or “How can I open a bank account?” The idea behind sample utterances is that by defining a class of utterances with similar semantic content, the bot can use these to make an educated guess about what the user’s goal is. Even if you don’t define every possible utterance (and you shouldn’t), the bot can guess what the user is trying to do.

A slot is a piece of information that the user provides in order to accomplish their goal. For example, if a customer wants to open a bank account, we need to know the type of account. We can use a slot to collect those account types, and name it something that builders will understand, like AccountType. Slots can be either required or optional, depending on the use case. For example, you might need a required slot like BirthDate to authenticate your user, but collect an optional slot type of AccountType to disambiguate between the different accounts a user might have. Slot values are the pieces of information that you want the bot to recognize as a slot, like “checking” or “savings.” Synonyms are alternate ways of saying a slot value, like “ISA” or “deposit account.”

Finally, a slot corresponding utterance is an utterance that contains a slot value, but doesn’t contain an intent, such as “to my savings account” or “it’s for my savings account.” In these utterances, you can’t tell what the user is trying to accomplish without the context of the rest of the conversation, but they do contain valuable slot information that you need the bot to capture.

The bot also has some available actions, ElicitIntent and ElicitSlot, which mean that the bot is either trying to capture the user’s intent or the bot is looking to gather some slot information.

Now that you’ve defined the values for all these components and put all those pieces together, you’ve created the first draft of an interaction model. Here’s an example, complete with the bot’s available actions.

Turn user stories into intents and slots

From your user stories, you’ve identified the use cases that you want your application to be able to help your users fulfill, such as blocking a bank card due to fraud or opening a new credit card. Make a complete list of all use cases that you developed. Now, it’s time to work backwards from the use cases to create user intents.

Start by getting a group of people together from all different teams of your organization—business analysts, technical pros, and leadership team members should all be present. Ask each person to create a list of possible things that they might reasonably say to a human agent or to an AI application for help with their use case. For example, if your use case is to open an account, you might list things like “I’d like to open a bank account,” “Can you help me open a new account?” or “Opening a savings account.” Be flexible with what you write. Have each person write 10–20 utterances per use case. Keep in mind the variety available in human language:

Verb variation – Open, start, begin, get started, establish
Noun variation – Account, savings, first credit card, new customer
Phrase or full sentence – Open account, I’d like to open an account
Statements versus questions – I want to open an account, Can you help me with getting started with a new account?
Implicit understanding – I’m a new customer, Help for new customers
Tone (formal or informal) – I need some assistance with opening a new high-yield savings account, I wanna open a new card

Now, compare with each other. Combine all the utterances into a team-wide list, and organize them with the most frequent utterances first. You can use these as a head start on your sample utterances. Try to classify each utterance into a single use case. You might think that this seems easy or obvious, since you just created these utterances directly from a list of use cases, but you might be surprised by how ambiguous human language can be.

Now that you have your utterances and your use cases, decide on which ones you want to turn into intents for your bot. Again, this requires input from your team to complete successfully, but here are some basic strategies. For each use case that you created utterances for and classified utterances for, you should turn that into an intent. If you find that you’re running into lots of ambiguity and having trouble classifying utterances, you should make a judgment call with your team about how you want to handle those tricky cases. You can merge use cases into a single intent if you find that there is too much similarity between the utterances, or you can split use cases up into more fine-grained intents if a single intent has too much variety in the utterances to classify them successfully.

Another strategy for dealing with these ambiguous utterances is to use slots. If you have an assortment of similarly defined intents, like OpenACreditCard and OpenADebitCard, you might find that utterances like “open a card” cause confusion in the model. After all, as a human being, it’s tough to decide just from that utterance whether the card is credit or debit card without more information. You can use slots to help by defining them in the model as a required piece of information, so that the bot looks for the words “credit” or “debit” in the utterance. Then, if those slots aren’t filled, use that information to surface a disambiguation prompt like “Would you like to open a new credit card or a new debit card?” to help get the necessary information. You should keep a running list of utterances that are difficult to classify and use them for testing to see how users navigate these tricky situations.

Remember that design is an iterative process and that no single interaction model will be perfect on the first try. This is why we continue with the next steps of prototyping and testing in order to build a successful conversational application.

Prototype your design

Given the often ambiguous nature of designing a conversational AI system, prototyping your design is crucial. Prototyping is a great way to gather meaningful feedback from real users in realistic contexts. In a design prototype, you want to build a simple way to test your design and gather feedback, without investing too much time building the software, because the design isn’t even finalized yet.

Following our example from earlier, we can build out a simple prototype to evaluate our user experience and amend our design as needed. Let’s build a mini-prototype with two intents: ReportCreditCardFraudIntent and OpenANewCreditCardAccountIntent.

ReportCreditCardFraudIntent

Unknown charge on my account

I think someone stole my card

Credit card fraud department

Fraudulent charges on my account

OpenANewCreditCardAccountIntent

Open a new account

Help with opening a credit card

Open a credit card account

I want to open a credit card

Before we even build these intents on the Amazon Lex console, we can make a prototype to make sure that we’re covering the most common utterances that a user might say. One simple way to do this would be to engage a few potential end-users, provide them with a scenario like pretending their card was stolen, and have them provide a few utterances. You can match this against what you’ve outlined and collected with your team, and use this data to help enhance your design. You might find that users are very unlikely to just say “credit card” at the open menu, or you might find that it’s the most common utterance. Gathering information from a likely pool of users helps you understand your customers better to make your design more robust.

The preceding example is a quick way to test your initial designs without much code. Other examples for prototyping your design would be Wizard of Oz (where the designer plays the role of the bot opposite a user who doesn’t necessarily know they’re talking to a human) or visual prototypes to help visualize the best experience (like a video simulating a chat window).

Test and tune your bot

Now that you’ve gathered all the different elements of your design, and the experience has been built and integrated, you can start testing.

The first step is to test against the design documentation you’ve put together (the sample dialogs, conversation flows, and interaction model). Thoroughly test all the different intents, slots, slot values, paths, and error handling flows that you’ve designed, going step by step through each one. The following is an example list of things to test:

Intent classification – Is the bot correctly predicting the intent for all utterances?
Slot values – Is the bot correctly recognizing all the possible slot values? For example, if you’re using a slot with phone numbers over voice, does the bot recognize both “one zero zero” and “one hundred” as valid inputs?
Error handling – Are there places in the flow where you get stuck in a loop? Does the bot correctly recover if some kind of error occurs?
Prompts – Are the prompts eliciting the expected response? Is the wording clear and understandable for all users?

The following is a sample test plan for a call center bot that you can use to guide your own testing.

Test ID	Scenario	Steps to test	Utterance	Successful?
Sample_100	You notice a fraudulent charge on your account	Call number		yes
Sample_100		Say “credit card fraud”	Credit card fraud	yes
Sample_100		Say or enter date of birth when prompted	January 1, 1980	yes

After testing, you may find that your bot requires some tuning. Go through your interaction model and add in any commonly missed utterances, new intents or slots, or change the wording in problematic prompts that are losing users along the way. This is a great place to explore an automated testing framework to expedite the testing process, but manual testing offers different insights about the user experience that can help alert you to any usability defects before launch.

Finally, you should also provide your users with a way to test what you’ve built against the business requirements that you defined in part one of this series. You need to make sure that before you launch your application to production it handles all customer requests and fulfills the business requirements that you received. Before beginning user testing, define the test plan with all stakeholders so it’s clear to everyone on the team how you define success. Make sure that at this point, you’ve developed your application in an environment that is as close as possible to the production environment, so that any feedback from this testing provides insight for production. Provide testers with the test plan and clearly document the results, so that it’s easy to use the data from testing to make decisions about how best to move forward.

After you’ve launched your application, the work isn’t done! Design is an iterative experience and continually requires fresh perspectives to improve. As part of the business requirements, you should define how you’ll monitor the health of the system in order to identify issues, such as missed utterances. For example, you might want to explore an analytics framework dashboard or a business intelligence dashboard to help spot gaps in utterance coverage or places where users exit early. Use this information to improve your interaction model, test the new design, and ultimately, tune your application.

Conclusion

In this series, we covered all the important basics for creating a great conversational experience using Amazon Lex. We encourage you to test and iterate through your design multiple times to ensure the best possible customer experience. Keeping these best practices in mind, we hope you explore all the different and creative ways that humans interface with the technology around us.

And remember that we at AWS Professional Services and our extensive AWS Partner Network are available to help you and your team through the process. Whether you’re only in need of consultation and advice, or whether you need full access to a designer, our goal is to help you achieve the best conversational interface for you and your customers.

About the Authors

Nancy Clarke is a Conversation Designer with the AWS Professional Services Natural Language AI team. When she’s not at her desk, you’ll find her gardening, hiking, or re-reading the Lord of the Rings for the billionth time.

Rosie Connolly is a Conversation Designer with the AWS Professional Services Natural Language AI team. A linguist by training, she has worked with language in some form for over 15 years. When she’s not working with customers, she enjoys running, reading, and dreaming of her future on American Ninja Warrior.

Claire Mitchell is a Design Strategy Lead with the AWS Professional Services AWS Professional Services Emerging Technologies Intelligence Practice—Solutions team. Occasionally she spends time exploring speculative design practices, textiles, and playing the drums.

AWS Machine Learning Blog