AWS Machine Learning Blog

Build a document search bot using Amazon Lex and Amazon Elasticsearch Service

People spend a lot of time searching documents. First you go to your document store and then you search for relevant documents. If you’re looking for a text inside the document, then you need to do another search.

In this blog post we’ll describe how you can search for a document using voice or text.  The search results will also show the first few characters from the matching paragraph in the searched document. Later, we will integrate the bot with a web application. You could use this search capability in many practical scenarios. For example, if your technician is working on a faulty machine at a customer’s business, he or she could just use the chatbot to search for the necessary documentation. We discussed chat bots in our QnA bot blog post, but for this blog post we want to show you dialogue management using elicit intent and import/export features. We’ll also show you how you can create a document store using Amazon S3, index the document in Amazon Elasticsearch Service (Amazon ES), and automate the whole process. The final bot will look similar to the following screenshot:

Architecture

The architecture has a document store that was developed using Amazon S3 and the documents are indexed using Amazon ES. The indexing is done by an AWS Lambda function that is subscribed to the Amazon S3 object creation event. Another AWS Lambda function is used for fulfilling the Amazon Lex intent. Finally, the web application integrates with Amazon Lex. The Amazon Cognito service authenticates and authorizes calls to the Amazon Lex service from the web application.

Components used

  • Amazon Lex—To build the conversational interface for our search bot.
  • AWS Lambda—To fulfill the intent given by Amazon Lex service you use the function for fulfillment.
  • Amazon Elasticsearch Service—To index the product FAQ and installation documents.
  • Amazon S3—To store documents. The product manager or owner can upload documents from this document repository.
  • AWS Lambda—To index the document. The function for indexing is triggered when a document is uploaded to Amazon S3. The integration is done through Amazon S3 notifications.
  • Amazon CloudWatch —To monitor the whole solution.
  • Amazon S3—To host the website where our bot will appear.
  • Amazon Cognito—To authenticating users.
  • AWS Identity and Access Management (IAM)—To allow or deny access to AWS resources.

Let’s start building the bot.

Building the search bot

You will build the bot in three stages.

Stage 1 – Automating the document upload flow

The solution uses Amazon S3 as the document store for all product documents. When a user uploads a document to Amazon S3, an AWS Lambda function will be triggered that will index the document in Amazon ES.

You can use this AWS CloudFormation template to set up stage 1. To launch this template in US-East-1 click the Launch Stack button:

The CloudFormation stack will create an Amazon S3 bucket, an Amazon Elasticsearch cluster, two AWS Lambda functions, and any relevant IAM roles to access Amazon Elasticsearch and S3 buckets. Give the CloudFormation stack a name using all lower case letters because the name will be used to create the S3 bucket.

When you run the AWS CloudFormation stack, you will see all the created resources in the Outputs tab of the AWS CloudFormation console.

Download a sample PDF document from here and upload the document to the DocumentStoreS3bucket S3 bucket. You could also upload a Word document or a PowerPoint presentation. The object creation event on S3 bucket notifies the IndexDocumentLambda function, which in turns converts the document to base64 encoding using the inbuilt Tika library in  Amazon ES 6.2 and indexes it in Amazon ES. When you have a closer look at IndexDocument Lambda function. You can see how the http request is prepared to index document on the Amazon ES domain.

The following code snippet is from the putDocumentToES method.

var req = new AWS.HttpRequest(endpoint);
var attachKey = key + '?pipeline=attachment';

var bodyString = JSON.stringify({
"data": doc,
"filePath" : objURL
});

req.method = 'PUT';
req.path = path.join('/', esDomain.index, esDomain.doctype, attachKey);
req.region = esDomain.region;
req.body = bodyString;
req.headers['presigned-expires'] = false;
req.headers['Host'] = endpoint.host;
req.headers['content-type'] = 'application/json';

Stage 2 – Building the search bot

Go to the Amazon Lex console and import the bot by using the zip file available here.

After import, you will see there are two intents. “Welcome” and “DocumentSearch.” The first intent is used to greet the user and the second intent is for searching documents. We used the import method to create a bot to save some time and give you a default template to start with. If you want to add the intents manually then you can do it by following the documentation here and copy the utterances from the zipped file. Now, go ahead and configure the DocumentSearch AWS Lambda function that was created as part of Step 1 for both WelcomeIntent and DocumentSearch intents in your bot. You need to select the ‘WelcomeIntent’ intent from the left menu, and go to Fulfillment section under it, choose the AWS Lambda function option and then select the fulfillment lambda function, the function name should look like this ‘<your CloudFormation Stackname>-DocumentSearchBotFullfillment’ and save the intent. Similarly, configure the same lambda function <your CloudFormation Stackname>-DocumentSearchBotFullfillment for the ‘DocumentSearch’ intent and save the intent.

Now, go ahead and build the bot by clicking Build. It should take few seconds for the model to build. On successful build, you should see this message:

You are all set to test the bot. Go to the Test Bot section on the right side of the screen and enter ‘hi’. The conversion will look something like the screenshot that follows. The bot will search the text inside the document and will give the first sentence from the paragraph. It will also send the complete document link, which is the Amazon S3 link in this case.

You can look at the queryES method in the LexFullfillmentLambda function to see how the search happens. Amazon ES also sends the search score that you can use to decide which relevant results to show to the user.

Congratulations! You just built a document search bot using Amazon Lex and Amazon ES.

Stage 3- Integrate it with the web UI

Now your bot is ready to be deployed. You can choose to deploy it on a mobile application or on messaging platforms like Facebook, Slack, and Twilio by using instructions at this link.  You can also use this  blog that shows you how you can integrate your Amazon Lex bot with a web application. It gives you an AWS CloudFormation template to deploy the bot.

How to take it to next level?

You can integrate your bot with Amazon Connect. Amazon Connect helps you in building a cloud-based contact center quickly and easily. There is a direct integration available with AWS services like Amazon Lex. This allows any user to call a designated phone number and the entire communication can be directed to the bot. This makes an interesting use case in which a user can now call the phone number and search the document using voice and the bot can send the document via email. For integration steps, you can have a look here.

Considerations

We created this solution to show you the power of Amazon Lex and AWS Lambda, but there are few things to consider before you take something like this to production.

  • Consider scenarios in which multiple documents can be output for a text-based search and show the output accordingly.
  • Add security around your solution to protect from anonymous access and also meet compliance requirements.
  • Follow best practices to index document at scale.

Clean up

You can delete the entire CloudFormation stack by selecting your stack in the AWS CloudFormation console and choosing the Delete Stack option on the Actions menu. It will delete all the resources created as part of this blog.

Additional reading

Conclusion

Today, we showed how you can integrate Amazon ES with Amazon Lex to build a powerful document search chatbot. We also showed you how you can build a document store using Amazon S3 and Amazon ES and how you can do dialog management with multiple intents.

About the Authors

Akash Jain and Rahul Kulkarni are Partner Solutions Architects. They work with our partners to convey best practices for application and systems design on AWS.