AWS Machine Learning Blog

Managing your expenses with Amazon Lex

This is a guest post by Rob Whelan, Solutions Architect at Relus Cloud.

When people ask me what impact artificial intelligence (AI) will have in the enterprise, I like to say that AI will relieve people from doing repetitive things. We’re not wired to do mindless tasks over and over—but computers are. That’s why self-driving cars are so appealing. Driving a car is repetitive and boring, not to mention dangerous. One day, a computer will be able to handle it.

One thing that many of us do is submit expenses for reimbursement. If you’re like me, you might not use an image recognition app that takes a picture of the receipt and figures it all out from there. Sometimes you don’t have a paper receipt.– For example, this can happen when you go to coffee shops that use Square. Rather than have my receipts pile up for an hours-long paperwork exercise that I dread, I’d rather just send a quick text to a bot.

So for this project we’re going to build an expense recording bot where you text a short description of your expense, and an image of the receipt, just using SMS, and it will transform it into expense report-worthy structured data that can be downloaded from Amazon DynamoDB into a CSV file.

We will use Amazon Lex for the bot, AWS Lambda for validating the input, Amazon DynamoDB for storing the data, and Twilio for SMS / MMS.

Building the bot

We start by building the bot. Open the AWS Management Console and create the bot in the Amazon Lex console. Call it “ExpenseBot” and click into it. Or, you can import the bot using this JSON file: ExpenseBot_Export.json

Bots in Amazon Lex have a few important components. The most important is the intent. Think of an intent as an action that the bot can perform for the user. The strength of a bot is that it can allow a user to fill out a survey-like set of questions, one question at a time, using natural conversation. It’s better than making users navigate a complex online form, which can be a hassle.

Bot Components

There are a few things to work through on this screen:

Intents: An intent is an action to be taken, which is triggered by different phrases. For example, “I spent $30 at Starbucks for breakfast” and “30 for breakfast” both trigger the same intent, RecordExpense, because you are trying to record an expense.

Create an intent called RecordExpense. You can have many different intents for a bot, each of which represents a different action. You just need to make sure your bot can differentiate among them depending on what the user is saying.

Slots: A slot is a placeholder that the bot will try to fill with a value. For an expense report, we need data like the expense amount, the category (meals, travel, lodging, etc.), and whether it is billable. There are two types of slots – custom and built-in. Built-in slots are pre-configured data types for common use cases. For example, the Date built-in slot type will create custom slots when you choose + under Slot Types. There are two ways to create custom slots and capture user inputs:

  1. Expand Values: For ExpenseProject, you can configure the slot type to expand the values. With this, example slot values you provide will be used as training data and the slot is resolved to the value provided by the user if it is similar to the slot values and synonyms. This will allow the bot to capture free-form data like the ExpenseProject, which could be any hundreds of projects. For ExpenseProject, create a free-form slot. Provide sample values and use Expanded Values to capture future, free-form project names.
  2. Restrict to slot values and Synonyms: Choose this to resolve slots to a specific value. So in our case, our expense system has categories like “Meals & Entertainment” but we wouldn’t expect the user to type all that out. Let them type “meals” and “food” and have it resolve to “Meals & Entertainment”. Follow the next screenshot to fill out the ExpenseCategory Slot Type, choosing Restrict to Slot Values and Synonyms for Slot Resolution.

Now that we have created some custom slot types, let’s start adding them to the intent. Scroll to “Slots” and add these slot types. Notice that the ExpenseAmount slot type drop-down list is the AMAZON.NUMBER slot type. It’s a special data type, built by AWS, that helps resolve numbers. Amazon has many built-in types, and they are prefixed with AMAZON in all capital letters. Examples are AMAZON.NUMBER and AMAZON.DATE, which free you up from doing a lot of work to recognize various date formats. If you’ve spent any time developing, you know dates are really hard to manage! The other slot types, like ExpenseCategory and ExpenseProject, are custom, so you as the developer are responsible for resolving the values.

Make Slot types the other data we’ll need:

  • ExpenseImage: check “expand values” for resolution and put in a few sample Amazon S3 URLs. Our bot will be storing the images on Amazon S3.
  • ExpenseIsBillable: check “restrict to slot values and synonyms”. We want to resolve to either “true” or “false” for the values, but we don’t expect the user to say “true” or “false” in their utterances because it will sound unnatural. Instead, have “yes”, “billable”, “y”, “yep” resolve to “true” and “no”, “n” resolve to “false”. Note that in more complex bots, a custom Boolean slot like this can lead to ambiguity between slot values and responses to confirmation prompts, but that in this simple bot, the design works.

Now, check the Required box next to ExpenseAmount and ExpenseCategory. We’re making them required because that’s the minimum we’ll need for recording an expense. This data needs to be collected before the intent can be fulfilled. For example, you can’t record an expense without the cost of the expense, but you could leave out the ExpenseIsBillable slot type and just set a default value of False. While you cannot set default slot values in the console, you can write some custom code in the fulfillment lambda to set slot values to default values.

Utterances: An utterance is something the user is likely to say to express an intent. Since language is so flexible, there are many ways to say the same thing. For example, you could say “I spent $30 at Starbucks for breakfast,” “30 for a Starbucks meal”, or “Starbucks breakfast $30” For our purposes, you’re saying the same thing. Lex encourages you to list many variations of what you anticipate a user to say.

The sentence “I spent $30 at Starbucks for breakfast” is an utterance that will trigger the RecordExpense intent. A note of caution, the utterances have to be unique, or else Amazon Lex will not be able to build the bot. When you create the utterance, you state where you think certain Slots will appear in the phrases. They are color-coded so you can tell if you missed a slot while designing your intent.

A note on utterances in a bot with multiple intents: Choosing utterances is both an art and science. The art is that you must anticipate how users will say things. The science is that utterances must be unique across intents—you can’t have same utterance trigger two different actions—or the bot may choose the wrong intent. We recommend putting all similar utterances in the same intent. 

Validation: Sometimes you need custom validation outside of what Amazon Lex can do. For example, you might check to see if the project entered actually exists in your system, before you record it. Amazon Lex lets you use AWS Lambda functions to validate the input. We highly recommend that you validate all inputs because it protects your backend systems from bad data. What might be appropriate in natural language, might not be good enough for an API.

Choose the drop-down arrow next to Lambda initialization and validation to see this:

We’ve created a Lambda function called ExpenseBotValidationHandler. Create that function in the Lambda console.

Scroll down to the Function Code section, select Edit code Inline for the Code entry type, and Node.js 6.10 for the Runtime, and paste the AWS Lambda code, found in the code samples attached to this blog post.

The following code excerpt, from recordExpense(), reads as follows: If we have the expense amount, category, and receipt, then we confirm the intent. If not, we delegate back to the bot to seek more info.

// expense-bot-validation-light
  if (source === 'DialogCodeHook') {
        // Perform basic validation on the supplied input slots.  Use the elicitSlot dialog action to re-prompt for the first violation detected.
        const slots = intentRequest.currentIntent.slots;

        if (intentRequest.sessionAttributes.receiptUrl !== undefined) {
          intentRequest.currentIntent.slots.ExpenseImage = intentRequest.sessionAttributes.receiptUrl;
        }

        if (ExpenseAmount !== null && ExpenseCategory !== null && ExpenseImage !== null) {
          callback(confirmIntent(intentRequest.sessionAttributes, intentRequest.currentIntent.name, slots))
        } else {
          callback(delegate(intentRequest.currentIntent.slots));
        }

        return;
    }

Lambda configuration – Validation code hook and fulfillment

This AWS Lambda function must have an AWS Identity and Access Management (IAM)

role with read/write permissions to Amazon CloudWatch, Amazon CloudWatch Logs, and Amazon DynamoDB. These permissions are listed to the right underneath “ExpenseBotValidationHandler”. Pick an execution role (or create one) that can read/write to CloudWatch, CloudWatch Logs, and DynamoDB. Leave all other defaults as they are, and save the function. It should now appear back in your Intent as an option to use for validation.

By the way, this is the payload sent to the Lambda function during validation, so you can see what attributes are available to you.

//from the Lambda function’s CloudWatch logs
{ messageVersion: '1.0',
invocationSource: 'DialogCodeHook',
userId: '1xxxxxx5656', //the user ID we have set when we called the bot. This is the user’s cell number.
sessionAttributes: {
receiptUrl: ‘https:.....’
}, //extra data that persists for the life of the interaction.
requestAttributes: null,
bot: { name: 'ExpenseBot', alias: 'Production', version: '3' }, //useful to debug to see which version of the bot you are using
outputDialogMode: 'Text',
currentIntent: 
{ name: 'RecordExpense',
slots: 
{ ExpenseProject: null,
ExpenseAmount: '13',
ExpenseIsBillable: null,
ExpenseReceipt: null,	// attach the URL of the receipt image
ExpenseCategory: 'Meals' },
inputTranscript: '13 for meals ' } //the original utterance

In our example, we need to attach the URL for where the receipt image is stored, and pass that in via session attributes, because you can’t text just an image to the Amazon Lex bot. So, before we finalize the intent, we will perform our validation by looking to make sure an image has been sent, and it is attached to the ExpenseReceipt slot.

Prompting the user for missing slots

For each required slot, you can add additional prompts to retrieve the data if the user hasn’t supplied it yet. So if a user types, “13.50” instead of “13.50 for meals”, the bot will respond with “$13.50 for which category? Meals, travel, or other?” To do this, click the gear to the right of the prompt input for ExpenseCategory, and add the prompts:

  • {ExpenseAmount} for which category? Meals, travel, or other?
  • Which category should I file this under?

And add these prompts for ExpenseAmount:

  • How much was the {ExpenseCategory} expense?
  • How much did this cost?

You can add several prompts that Amazon Lex will randomly select.

A note on user experience

Discoverability: It’s good to give users direction when they are getting prompted, because with bots, discoverability in the design sense is difficult. For example, if you prompt the user for a category, give them a few ideas, like “meals, travel, or other?” This reduces the users’ cognitive load because they are picking from a menu instead of creating from scratch.

User feedback: And what does {ExpenseAmount} and {ExpenseCategory} mean for the prompts? One of the magic things about Amazon Lex is that it will just fill in those data fields automatically if the user has already supplied it. For example, the user could say, “$13 billable” and Lex will replace {ExpenseAmount} in “{ExpenseAmount} for which category? Meals, travel, or other?” with $13, and then respond to the user with, “$13 for which category? Meals, travel, or other?” when prompting for the category. This gives the user some feedback.

Now that we have everything, we’ll send a confirmation prompt to let the user know the session is about to close out. I almost always use these so that the user has feedback that this action is completed. It also gives the user a chance to step in and correct the bot if information was not gathered correctly.

If the user confirms with a “yes,” then the bot tries to fulfill the intent. All the slots are sent to a function to carry out the task. You can use the same Lambda function as for the validation, but just filter to match the source = FulfillmentCodeHook. In our case we will build out some code to insert this data into DynamoDB.

Bugs and troubleshooting

Now we’re ready to build the bot and test, Choose “build” then try it off to the side. You can see how it gets really crazy with situations you didn’t expect! Here’s an example.

In this case it was not recognizing “food” as a synonym for the Meals category.

Typical problems and how to address them:

  • Unanticipated utterances – Users might phrase things differently than expected. This is pretty common, and Amazon Lex captures these in logs as Missed Utterances. They are extremely useful to review so you see how people are actually using your bot. To see them, open the Monitoring tab, then choose Utterances under Tables on the left, and then choose Missed.
  • Session timeout problems – Amazon Lex allows you to set a session timeout variable, where all the slots and session state are forgotten after a period of time; the timer resets with every user interaction. Choose this variable carefully and be aware of it as you test your bot. If you test your bot using the internal testing widget shown above, you can “reset” the session to avoid problems, by clicking “Clear chat history” in the test widget.
  • Broken AWS Lambda integrations – Like all software projects, it’s best to start simple and ensure you know what is being passed to AWS Lambda, and what is being sent back to your bot. The Amazon Lex documentation is very good on this. Use lots of logging and look closely at your CloudWatch logs. Look carefully at DialogState and learn the different paths forward Amazon Lex has at each state.
  • Versioning – Amazon Lex allows you to build different versions of your app. You need to be clear about which version you are editing. In development, it’s best to simply work on the $LATEST version of the bot. When you build a bot, you can test it with the internal widget. But you cannot test through an integration channel like Slack or Twilio until you publish the bot. Building happens before publishing. After the bot is published, it is recommended that you perform automated testing just as you would with any other deployed application.

Deployment

When we’ve worked out the bugs, it’s time to deploy it. Choose Publish and choose an Alias. The alias feature allows you to have multiple versions of your bot in production.

After it’s deployed, there are a few options for interacting with your bot:

  1. Channel integrations. You can integrate with Facebook, Slack, Kik, and Twilio SMS. In fact, you can do all of these at once if you want the bot to be accessible from all of them. These channel integrations work well out of the box.
  2. SDKs for programmatic access. AWS maintains SDKs for many programming languages and frameworks, such as iOS and Android for mobile development, and Node.js, Python, Ruby, and many others for server-side development. For our use case, we will need MMS – texting images and not just text – so we can’t use the Twilio SMS integration, which is limited to text only. We need to use the AWS SDK for JavaScript in Node.js directly from Twilio.

Let’s go over to Twilio and get set up.

Twilio Functions is still in beta at the time of this writing but I have had solid performance with it. We need a function to call our Amazon Lex bot after we send a text to the number.

This code snippet shows how to call our bot from the node.js library. Set the bot params, then instantiate a lexruntime instance and call the postText method. This calls Amazon Lex, which returns a message in data.message. Pass that into a twiml object, which is how you send a return SMS back to the user. Note, scale was not a consideration for this illustration, so this code might not handle, say, 1,000 concurrent users.

// twilio-function-handle-expenses.js (hosted in Twilio.com Functions)
const lexruntime = new AWS.LexRuntime(config);
…
	let params = {
	  botAlias: 'Production',
	  botName: 'ExpenseBot',
	  inputText: Body,
	  userId: From.slice(1,12), //cell number for the user
		sessionAttributes: {},
	};
…
		lexruntime.postText(params, function(err, data) {
		  if (err) console.log(err, err.stack); // an error occurred
		  else {
				let twiml = new Twilio.twiml.MessagingResponse();
				twiml.message(data.message);
				callback(null, twiml);
				console.log(data);           // successful response
			}

And be sure to include the aws-sdk node library for your code to access it. I like the user interface for managing external libraries. You just declare them here, and they get included as opposed to compressing all the libraries into a .zip.

Then, you place your AWS access tokens here. They will be available under the context variable inside your function.

The AWS SDK provides the LexRuntime method to access your bot.

const options = {
		apiVersion: '2016-11-28',
		accessKeyId: context.accessKeyId,
		secretAccessKey: context.secretAccessKey,
		region: 'us-east-1',
	};

	const lexruntime = new AWS.LexRuntime(options);

	const { Body, From, NumMedia } = event;

	let params = {
	  botAlias: 'Production',
	  botName: 'ExpenseBot',
	  inputText: Body,
	  userId: From.slice(1,12),
		sessionAttributes: {},
	};

This params object states which bot you want to use, and which alias. The Body variable holds the actual text content from the SMS message. I’m setting userId to the phone number the text came from, minus the initial + sign, because Amazon Lex will reject it as a user ID. 

Amazon DynamoDB for storing data

To build the DynamoDB table, create a table called “expense-items” with a partition key of “user,”” which will be the cell number of the user, and a sort key of “timestamp.”

Leave defaults as is. Or, for a bonus, create a Local Secondary Index (LSI) so you can query more efficiently off the Project – set the Project as the Range key.

Finally we send off the message to Amazon Lex, and the Amazon Lex response to the user is in the data.message attribute. Twilio uses a function called twiml (Twilio markup language) to send messages through their system. And voila!! We are recording expenses from my Extra Boost Whey Protein Amazing Greens Jamba juice, with our bot.

And here is the structured data in our DynamoDB table, ready for exporting into any other analysis tool.

To export this data as a CSV, choose the checkbox to select all items, then choose the Actions drop-down list, and select Export to .csv.

Also, a quick reminder to delete your instance when you no longer need it. Once the instance is terminated, you will stop incurring charges for that instance.

Conclusion

We’ve learned a lot in this blog post! We started with a business challenge – travelers dealing with the repetitive work of recording their expenses. We thought about how people might capture this data more easily by texting natural language to describe the expense. We went over the core components of an Amazon Lex bot – slots, which are pieces of data we want to extract; utterances, which are phrases the user will say; and intents, which are actions that the user wants to perform. We went over Lambda functions for input validation. Then, we talked about how to test and troubleshoot common issues. Finally, we went through deployment options and even got over to Twilio to build a custom function that integrates with their SMS service. When the bot is deployed, it reduced the number of steps in a repetitive process, and we saw that the bot interface offered a simpler, conversational approach to filing expenses!

What other annoying business processes can we simplify with bots? Please contact me – I love to talk bots.


About the Author

As a Solutions Architect with Relus Cloud, Rob Whelan helps customers think through and act upon what is possible in the AWS ecosystem. Rob studied mathematics as an undergrad, served as a submarine officer in the US Navy, and co-founded a healthcare communications company before coming to Relus Cloud. He is based in Austin, Texas