AWS Machine Learning Blog

Enhancing enterprise search with Amazon Kendra

Amazon Kendra is an easy-to-use enterprise search service that allows you to add search capabilities to your applications so end-users can easily find information stored in different data sources within your company. This could include invoices, business documents, technical manuals, sales reports, corporate glossaries, internal websites, and more. You can harvest this information from storage solutions like Amazon Simple Storage Service (Amazon S3) and OneDrive; applications such as SalesForce, SharePoint and Service Now; or relational databases like Amazon Relational Database Service (Amazon RDS)

When you type a question, the service uses machine learning (ML) algorithms to understand the context and return the most relevant results, whether that’s a precise answer or an entire document. Most importantly, you don’t need to have any ML experience to do this—Amazon Kendra also provides you with the code that you need to easily integrate with your new or existing applications.

This post shows you how to create your internal enterprise search by using the capabilities of Amazon Kendra. This enables you to build a solution to create and query your own search index. For this post, you use Amazon.com help documents in HTML format as the data source, but Amazon Kendra also supports MS Office (.doc, .ppt), PDF, and text formats.

Overview of solution

This post provides the steps to help you create an enterprise search engine on AWS using Amazon Kendra. You can provision a new Amazon Kendra index in under an hour without much technical depth or ML experience.

The post also demonstrates how to configure Amazon Kendra for a customized experience by adding FAQs, deploying Amazon Kendra in custom applications, and synchronizing data sources. This post addresses and answers these questions in the subsequent sections.

Prerequisites

For this walkthrough, you should have the following prerequisites:

Creating and configuring your document repository

Before you can create an index in Amazon Kendra, you need to load documents into an S3 bucket. This section contains instructions to create an S3 bucket, get the files, and load them into the bucket. After completing all the steps in this section, you have a data source that Amazon Kendra can use.

  1. On the AWS Management Console, in the Region list, choose US East (N. Virginia) or any Region of your choice that Amazon Kendra is available in.
  2. Choose Services.
  3. Under Storage, choose S3.
  4. On the Amazon S3 console, choose Create bucket.
  5. Under General configuration, provide the following information:
    • Bucket name: kendrapost-{your account id}.
    • Region: Choose the same Region that you use to deploy your Amazon Kendra index (this post uses US East (N. Virginia) us-east-1).
  6. Under Bucket settings for Block Public Access, leave everything with the default values.
  7. Under Advanced settings, leave everything with the default values.
  8. Choose Create bucket.
  9. Download amazon_help_docs.zip and unzip the files.
  10. On the Amazon S3 console, select the bucket that you just created and choose Upload.
  11. Upload the unzipped files.

Inside your bucket, you should now see two folders: amazon_help_docs (with 3,100 objects) and faqs (with one object).

The following screenshot shows the contents of amazon_help_docs.

The following screenshot shows the contents of faqs.

Creating an index

An index is the Amazon Kendra component that provides search results for documents and frequently asked questions. After completing all the steps in this section, you have an index ready to consume documents from different data sources. For more information about indexes, see Index.

To create your first Amazon Kendra index, complete the following steps:

  1. On the console, choose Services.
  2. Under Machine Learning, choose Amazon Kendra.
  3. On the Amazon Kendra main page, choose Create an Index.
  4. In the Index details section, for Index name, enter kendra-blog-index.
  5. For Description, enter My first Kendra index.
  6. For IAM role, choose Create a new role.
  7. For Role name, enter -index-role (your role name has the prefix AmazonKendra-YourRegion-).
  8. For Encryption, don’t select Use an AWS KMW managed encryption key.

(Your data is encrypted with an Amazon Kendra-owned key by default.)

  1. Choose Next.

For more information about the IAM roles Amazon Kendra creates, see Prerequisites.

Amazon Kendra offers two editions. Kendra Enterprise Edition provides a high-availability service for production workloads. Kendra Developer Edition is suited for building a proof-of-concept and experimentation. For this post, you use the Developer edition.

  1. In the Provisioning editions section, select Developer edition.
  2. Choose Create.

For more information on the free tier, document size limits, and total storage for each Amazon Kendra edition, see Amazon Kendra pricing.

The index creation process can take up to 30 minutes. When the creation process is complete, you see a message at the top of the page that you successfully created your index.

Adding a data source

A data source is a location that stores the documents for indexing. You can synchronize data sources automatically with an Amazon Kendra index to make sure that searches correctly reflect new, updated, or deleted documents in the source repositories.

After completing all the steps in this section, you have a data source linked to Amazon Kendra. For more information, see Adding documents from a data source.

Before continuing, make sure that the index creation is complete and the index shows as Active.

  1. On the kendra-blog-index page, choose Add data sources.

Amazon Kendra supports six types of data sources: Amazon S3, SharePoint Online, ServiceNow, OneDrive, Salesforce online, and Amazon RDS. For this post, you use Amazon S3.

  1. Under Amazon S3, choose Add connector.

For more information about the different data sources that Amazon Kendra supports, see Adding documents from a data source.

  1. In the Define attributes section, for Data source name, enter amazon_help_docs.
  2. For Description, enter AWS services documentation.
  3. Choose Next.
  4. In the Configure settings section, for Enter the data source location, enter the S3 bucket you created: kendrapost-{your account id}.
  5. Leave Metadata files prefix folder location

By default, metadata files are stored in the same directory as the documents. If you want to place these files in a different folder, you can add a prefix. For more information, see S3 document metadata.

  1. For Select decryption key, leave it deselected.
  2. For Role name, enter source-role (your role name is prefixed with AmazonKendra-).
  3. For Additional configuration, you can add a pattern to include or exclude certain folders or files. For this post, keep the default values.
  4. For Frequency, choose Run on demand.

This step defines the frequency with which the data source is synchronized with the Amazon Kendra index. For this walkthrough, you do this manually (one time only).

  1. Choose Next.
  2. On the Review and create page, choose Create.
  3. After you create the data source, choose Sync now to synchronize the documents with the Amazon Kendra index.

The duration of this process depends on the number of documents that you index. For this use case, it may take 15 minutes, after which you should see a message that the sync was successful.

In the Sync run history section, you can see that 3,099 documents were synchronized.

Exploring the search index using the search console

The goal of this section is to let you explore possible search queries via the built-in Amazon Kendra console.

To search the index you created above, complete the following steps:

  1. Under Indexes, choose kendra-blog-index.
  2. Choose Search console.

Kendra can answer three types of questions: factoid, descriptive, and keyword. For more information, see Amazon Kendra FAQs. You can ask some questions using the Amazon.com help documents that you uploaded earlier.

In the search field, enter What is Amazon music unlimited?

With a factoid question (who, what, when, where), Amazon Kendra can answer and also offer a link to the source document.

As a keyword search, enter shipping rates to Canada. The following screenshot shows the answer Amazon Kendra gives.

Adding FAQs

You can also upload a list of FAQs to provide direct answers to common questions your end-users ask. To do this, you need to load a .csv file with the information related to the questions. This section contains instructions to create and configure that file and load it into Amazon Kendra.

  1. On the Amazon Kendra console, navigate to your index.
  2. Under Data management, choose FAQs.
  3. Choose Add FAQ.
  4. In the Define FAQ project section, for FAQ name, enter kendra-post-faq.
  5. For Description, enter My first FAQ list.

Amazon Kendra accepts .csv files formatted with each row beginning with a question followed by its answer. For example, see the following table.

Question Answer URL (optional)
What is the height of the Space Needle?  605 feet  https://www.spaceneedle.com/
How tall is the Space Needle?  605 feet  https://www.spaceneedle.com/
What is the height of the CN Tower? 1815 feet https://www.cntower.ca/
How tall is the CN Tower? 1815 feet https://www.cntower.ca/

This is how the .CSV file included for this use case looks like:

"How do I sign up for the Amazon Prime free Trial?"," To sign up for the Amazon Prime free trial, your account must have a current, valid credit card. Payment options such as an Amazon.com Corporate Line of Credit, checking accounts, pre-paid credit cards, or gift cards cannot be used. "," https://www.amazon.com/gp/help/customer/display.html/ref=hp_left_v4_sib?ie=UTF8&nodeId=201910190”
  1. Under FAQ settings, for S3, enter s3://kendrapost-{your account id}/faqs/kendrapost.csv.
  2. For IAM role, choose Create a new role.
  3. For Role name, enter faqs-role (your role name is prefixed with AmazonKendra-).
  4. Choose Add.
  5. Wait until you see the status show as Active.

You can now see how the FAQ works on the search console.

  1. Under Indexes, choose your index.
  2. Under Data management, choose Search console.
  3. In the search field, enter How do I sign up for the Amazon Prime free Trial?
  4. The following screenshot shows that Amazon Kendra added the FAQ that you uploaded previously to the results list, and provides an answer and a link to the related documentation.

Using Amazon Kendra in your own applications

You can add the following components from the search console in your application:

  • Main search page The main page that contains all the components. This is where you integrate your application with the Amazon Kendra API.
  • Search bar The component where you enter a search term and that calls the search function.
  • Results The component that displays the results from Amazon Kendra. It has three components: suggested answers, FAQ results, and recommended documents.
  • Pagination The component that paginates the response from Amazon Kendra.

Amazon Kendra provides source code that you can deploy in your website. This is offered free of charge under a modified MIT license so you can use it as is or change it for your own needs.

This section contains instructions to deploy Amazon Kendra search to your website. You use a Node.js demo application that runs locally in your machine. This use case is based on a MacOS environment.

To run this demo, you need the following components:

  1. Download amazon_aws-kendra-sample-app-master.zip and unzip the file.
  2. Open a terminal window and go to the aws-kendra-sample-app-master folder:
    cd /{folder path}/aws-kendra-sample-app-master
  3. Create a copy of the .env.development.local.example file as .env.development.local:
    cp .env.development.local.example .env.development.local
  4. Edit the .env.development.local file and add the following connection parameters:
    • REACT_APP_INDEX – Your Amazon Kendra index ID (you can find this number on the Index home page)
    • REACT_APP_AWS_ACCESS_KEY_ID – Your account access key
    • REACT_APP_AWS_SECRET_ACCESS_KEY – Your account secret access key
    • REACT_APP_AWS_SESSION_TOKEN – Leave it blank for this use case
    • REACT_APP_AWS_DEFAULT_REGION – The Region that you used to deploy the Kendra index (for example, us-east-1)
  5. Save the changes.
  6. Install the Node.js dependencies:
    npm install
  7. Launch the local development server:
    npm start
  8. View the demo app at http://localhost:3000/. You should see the following screenshot.
  9. Enter the same question you used to test the FAQs: How do I sign up for the Amazon Prime free Trial?

The following screenshot shows that the result is the same as the one you got from the Amazon Kendra console, even though the demo webpage is running locally in your machine.

Cleaning up

To avoid incurring future charges and to clean out unused roles and policies, delete the resources you created: the Amazon Kendra index, S3 bucket, and corresponding IAM roles.

 

  1. To delete the Amazon Kendra index, under Indexes, choose kendra-blog-index.
  2. In the index settings section, from the Actions drop-down menu, choose Delete.
  3. To confirm deletion, enter Delete in the field and choose Delete.

Wait until you get the confirmation message; the process can take up to 15 minutes.

For instructions on deleting your S3 bucket, see How do I delete an S3 Bucket?

Conclusion

In this post, you learned how to use Amazon Kendra to deploy an enterprise search service. You can use Amazon Kendra to improve the search experience in your company, powered by ML. You can enable rapid look for your documents using natural language, without any previous ML/AI experience. For more information about Amazon Kendra, see AWS re:Invent 2019 – Keynote with Andy Jassy on YouTube, Amazon Kendra FAQs, and What is Amazon Kendra?


About the Author

Leonardo Gómez is a Big Data Specialist Solutions Architect at AWS. Based in Toronto, Canada, He works with customers across Canada to design and build big data architectures.