Intelligently search Alfresco content using Amazon Kendra

Amazon Kendra is an intelligent search service powered by machine learning (ML). With Amazon Kendra, you can easily aggregate content from a variety of content repositories into a centralized index that lets you quickly search all your enterprise data and find the most accurate answer. Many organizations use the content management platform Alfresco to store their content. One of the key requirements for many enterprise customers using Alfresco is the ability to easily and securely find accurate information across all the documents in the data source.

We are excited to announce the public preview of the Amazon Kendra Alfresco connector. You can index Alfresco content, filter the types of content you want to index, and easily search your data in Alfresco with Amazon Kendra intelligent search and its Alfresco OnPrem connector.

This post shows you how to use the Amazon Kendra Alfresco OnPrem connector to configure the connector as a data source for your Amazon Kendra index and search your Alfresco documents. Based on the configuration of the Alfresco connector, you can synchronize the connector to crawl and index different types of Alfresco content such as wikis and blogs. The connector also ingests the access control list (ACL) information for each file. The ACL information is used for user context filtering, where search results for a query are filtered by what a user has authorized access to.

Prerequisites

To try out the Amazon Kendra connector for Alfresco using this post as a reference, you need the following:

An AWS account with privileges to create AWS Identity and Access Management (IAM) roles and policies. For more information, see Overview of access management: Permissions and policies and IAM roles for Alfresco data sources.
Basic knowledge of AWS and working knowledge of Alfresco administration.
Alfresco OnPrem set up with a user added to the Alfresco_Adminstrators group. We will store the admin user name and password in AWS Secrets Manager.

Configure the data source using the Amazon Kendra connector for Alfresco

To add a data source to your Amazon Kendra index using the Alfresco OnPrem connector, you can use an existing index or create a new index. Then complete the following steps. For more information on this topic, refer to the Amazon Kendra Developer Guide.

On the Amazon Kendra console, open your index and choose Data sources in the navigation pane.
Choose Add data source.
Under Alfresco, choose Add connector.
In the Specify data source details section, enter a name and description and choose Next.
In the Define access and security section, for Alfresco site URL, enter the Alfresco host name.
To configure the SSL certificates, you can create a self-signed certificate for this setup utilizing openssl x509 -in pattern.pem -out alfresco.crt and add this certificate to an Amazon Simple Storage Service (Amazon S3) bucket. Choose Browse S3 and choose the S3 bucket with the SSL certificate.
For Site ID, enter the Alfresco site ID where you want to search documents.
Under Authentication, you have two options:
1. Use Secrets Manager to create new Alfresco authentication credentials. You need an Alfresco admin user name and password.
2. Use an existing Secrets Manager secret that has the Alfresco authentication credentials you want the connector to access.
Choose Save and add secret.
For IAM role, choose Create a new role or choose an existing IAM role configured with appropriate IAM policies to access the Secrets Manager secret, Amazon Kendra index, and data source.
Choose Next.
In the Configure sync settings section, provide information about your sync scope and run schedule.
You can include the files to be crawled using inclusion patterns or exclude them using exclusion patterns.
Choose Next.
In the Set field mappings section, you can optionally configure the field mappings to specify how the Alfresco field names are mapped to Amazon Kendra attributes or facets.
Choose Next.
Review your settings and confirm to add the data source.
After the data source is added, choose Data sources in the navigation pane, select the newly added data source, and choose Sync now to start data source synchronization with the Amazon Kendra index.

The sync process can take about 10–15 minutes. You can now search indexed Alfresco content using the search console or a search application. Optionally, you can search with ACL with the following additional steps.
Go to the index page that you created and on the User access control tab, choose Edit settings.
Under Access control settings, select Yes.
For Token type, choose JSON.
Choose Next.
Choose Update.

Wait a few minutes for the index to get updated by the changes. Now let’s see how you can perform intelligent search with Amazon Kendra.

Perform intelligent search with Amazon Kendra

Before you try searching on the Amazon Kendra console or using the API, make sure that the data source sync is complete. To check, view the data sources and verify if the last sync was successful.

To start your search, on the Amazon Kendra console, choose Search indexed content in the navigation pane.
You’re redirected to the Amazon Kendra Search console. Now you can search information from the Alfresco documents you indexed using Amazon Kendra.
For this post, we search for a document stored in Alfresco, AWS.
Expand Test query with an access token and choose Apply token.
For Username, enter the email address associated with your Alfresco account.
Choose Apply.

Now the user can only see the content they have access to. In our example, user test@amazon.com doesn’t have access to any documents on Alfresco, so none are visible.

Limitations

The connector has the following limitations:

As of this writing, we only support Alfresco OnPrem. Alfresco PAAS is not supported.
The connector doesn’t crawl the following entities: calendars, discussions, data lists, links, and system files.
During public preview, we only support basic authentication. For support for other forms of authentication please contact your Amazon representative.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it. If you only added a new data source using the Amazon Kendra connector for Alfresco, delete that data source.

Conclusion

With the Amazon Kendra Alfresco connector, your organization can search contents securely using intelligent search powered by Amazon Kendra.

To learn more about the Amazon Kendra Alfresco connector, refer to the Amazon Kendra Developer Guide.

For more information on other Amazon Kendra built-in connectors to popular data sources, refer to Amazon Kendra native connectors.

About the author

Vikas Shah is an Enterprise Solutions Architect at Amazon web services. He is a technology enthusiast who enjoys helping customers find innovative solutions to complex business challenges. His areas of interest are ML, IoT, robotics and storage. In his spare time, Vikas enjoys building robots, hiking, and traveling.

AWS Machine Learning Blog