AWS Machine Learning Blog

Getting started with the Amazon Kendra Box connector

Amazon Kendra is a highly accurate and easy-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides.

For many organizations, Box Content Cloud is a core part of their content storage and lifecycle management strategy. An enterprise Box account often contains a treasure trove of assets, such as documents, presentations, knowledge articles, and more. Now, with the new Amazon Kendra data source connector for Box, these assets and any associated tasks or comments can be indexed by Amazon Kendra’s intelligent search service to reveal content and unlock answers in response to users’ queries.

In this post, we show you how to set up the new Amazon Kendra Box connector to selectively index content from your Box Enterprise repository.

Solution overview

The solution consists of the following high-level steps:

  1. Create a Box app for Amazon Kendra via the Box Developer Console.
  2. Add sample documents to your Box account.
  3. Create a Box data source via the Amazon Kendra console.
  4. Index the sample documents from the Box account.

Prerequisites

To try out the Amazon Kendra connector for Box, you need the following:

Create a Box app for Amazon Kendra

Before you configure an Amazon Kendra Box data source connector, you must first create a Box app.

  1. Log in to the Box Enterprise Developer Console.
  2. Choose Create New App.
  3. Choose Custom App.
  4. Choose Server Authentication (with JWT).
  5. Enter a name for your app. For example, KendraConnector.
  6. Choose Create App.
  7. In your created app in My Apps, choose the Configuration tab.
  8. In the App Access Level section, choose App + Enterprise Access.
  9. In the Application Scopes section, check that the following permissions are enabled:
    1. Write all files and folders stored in a Box
    2. Manage users
    3. Manage groups
    4. Manage enterprise properties
  10. In the Advanced Features section, select Make API calls using the as-user header.
  11. In the Add and Manage Public Keys section, choose Generate a Public/Private Keypair.

This requires two-step verification. A JSON text file is downloaded to your computer.

  1. Choose OK to accept this download.
  2. Choose Save Changes.
  3. On the Authorization tab, choose Review and Submit.
  4. Select Submit app within this enterprise and choose Submit.

Your Box Enterprise owner needs to approve the app before you can use it.

Go to the downloads directory on your computer to review the downloaded JSON file. It contains the client ID, client secret, public key ID, private key, pass phrase, and enterprise ID. You need these values to create the Box data source in a later step.

Add sample documents to your Box account

In this step, you upload sample documents to your Box account. Later, we use the Amazon Kendra Box data source to crawl and index these documents.

  1. Download AWS_Whitepapers.zip to your computer.
  2. Extract the files to a folder called AWS_Whitepapers.
  3. Upload the AWS_Whitepapers folder to your Box account.

Create a Box data source

To add a data source to your Amazon Kendra index using the Box connector, you can use an existing Amazon Kendra index, or create a new Amazon Kendra index. Then complete the following steps to create a Box data source:

  1. On the Amazon Kendra console, choose Indexes in the navigation pane.
  2. From the list of indexes, choose the index that you want to add the data source to.
  3. Choose Add data sources.
  4. From the list of data source connectors, choose Add connector under Box.
  5. On the Specify data source details page, enter a data source name and optional description.
  6. Choose Next.
  7. Open the JSON file you downloaded from the Box Developer Console.

It contains values for clientID, clientSecret, publicKeyID, privateKey, passphrase, and enterpriseID.

  1. On the Define access and security page, in the Source section, for Box enterprise ID, enter the value of the enterpriseID field.
  2. In the Authentication section, under AWS Secrets Manager secret, choose Create and add a new secret.
  3. For Secret name, enter a name for the secret, for example, boxsecret1.
  4. For the remaining fields, enter the corresponding values from the downloaded JSON file.
  5. Choose Save and add secret.
  6. In the IAM role section, choose Create a new role (Recommended) and enter a role name, for example, box-role.

For more information on the required permissions to include in the IAM role, see IAM roles for data sources.

  1. Choose Next.
  2. On the Configure sync settings page, in the Sync scope section, you can include Box web links, comments, and tasks in your index, in addition to file contents. Use the default setting (unchecked) for this post.
  3. For Additional configuration (change log)optional, use the default setting (unchecked).
  4. For Additional configuration (regex patterns) – optional, choose Include patterns.
  5. For Type, choose Path
  6. For Path – optional, enter the path to the sample documents you uploaded earlier: AWS_Whitepapers/.
  7. Choose Add.
  8. In the Sync run schedule section, choose Run on demand.
  9. Choose Next.
  10. On the Set fields mapping page, you can define how the data source maps attributes from Box objects to your index. Use the default settings for this post.
  11. Choose Next.
  12. On the Review and create page, review the details of your Box data source.
  13. To make changes, choose the Edit button next to the item that you want to change.
  14. When you’re done, choose Add data source to add your Box data source.

After you choose Add data source, Amazon Kendra starts creating the data source. It can take several minutes for the data source to be created. When it’s complete, the status of the data source changes from Creating to Active.

Index sample documents from the Box account

You configured the data source sync run schedule to run on demand, so you need to start it manually.

  1. On the Amazon Kendra console, navigate to your index.
  2. Choose your new data source.
  3. Choose Sync now.

The current sync state changes to Syncing – crawling, then to Syncing – indexing.

After about 10 minutes, the current sync state changes to idle, the last sync status changes to Successful, and the Sync run history panel shows more details, including the number of documents added.

Test the solution

Now that you have ingested the AWS whitepapers from your Box account into your Amazon Kendra index, you can test some queries.

  1. On the Amazon Kendra console, choose Search indexed content in the navigation pane.
  2. In the query field, enter a test query, such as What databases are offered by AWS?

You can try your own queries too.

Congratulations! You have successfully used Amazon Kendra to surface answers and insights based on the content indexed from your Box account.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution.

  1. If you created a new Amazon Kendra index while testing this solution, delete it.
  2. If you added a new data source using the Amazon Kendra connector for Box, delete that data source.
  3. Delete the AWS_Whitepapers folder and its contents from your Box account.

Conclusion

With the Amazon Kendra Box connector, organizations can make invaluable information trapped in their Box accounts available to their users securely using intelligent search powered by Amazon Kendra.

In this post, we introduced you to the basics, but there are many additional features that we didn’t cover. For example:

  • You can enable user-based access control for your Amazon Kendra index, and restrict access to Box documents based on the access controls you have already configured in Box
  • You can index additional Box object types, such as tasks, comments, and web links
  • You can map Box object attributes to Amazon Kendra index attributes, and enable them for faceting, search, and display in the search results
  • You can integrate the Box data source with the Custom Document Enrichment (CDE) capability in Amazon Kendra to perform additional attribute mapping logic and even custom content transformation during ingestion

To learn about these possibilities and more, refer to the Amazon Kendra Developer Guide.


About the Authors

Bob StrahanBob Strahan is a Principal Solutions Architect in the AWS Language AI Services team.