Automate Amazon ES synonym file updates
September 8, 2021: Amazon Elasticsearch Service has been renamed to Amazon OpenSearch Service. See details.
Search engines provide the means to retrieve relevant content from a collection of content. However, this can be challenging if certain exact words aren’t entered. You need to find the right item from a catalog of products, or the correct provider from a list of service providers, for example. The most common method of specifying your query is through a text box. If you enter the wrong terms, you won’t match the right items, and won’t get the best results.
Synonyms enable better search results by matching words that all match to a single indexable term. In Amazon OpenSearch Service, you can provide synonyms for keywords that your application’s users may look for. For example, your website may provide medical practitioner searches, and your users may search for “child’s doctor” instead of “pediatrician.” Mapping the two words together enables either search term to match documents that contain the term “pediatrician.” You can achieve similar search results by using synonym files. Amazon OpenSearch Service custom packages allow you to upload synonym files that define the synonyms in your catalog. One best practice is to manage the synonyms in Amazon Relational Database Service (Amazon RDS). You then need to deploy the synonyms to your Amazon OpenSearch Service domain. You can do this with AWS Lambda and Amazon Simple Storage Service (Amazon S3).
In this post, we discuss an approach using Amazon Aurora and Lambda functions to automate updating synonym files for improved search results.
Overview of solution
Amazon OpenSearch Service is a fully managed service that makes it easy to deploy, secure, and run Elasticsearch cost-effectively and at scale. You can build, monitor, and troubleshoot your applications using the tools you love, at the scale you need. The service supports open-source Elasticsearch API operations, managed Kibana, integration with Logstash, and many AWS services with built-in alerting and SQL querying.
For search engineers, the synonym file’s content is usually stored within a database or in a data lake. You may have data in tabular format in Amazon RDS (in this case, we use Amazon Aurora MySQL). When updates to the synonym data table occur, the change triggers a Lambda function that pushes data to Amazon S3. The S3 event triggers a second function, which pushes the synonym file from Amazon S3 to Amazon OpenSearch Service. This architecture automates the entire synonym file update process.
To achieve this architecture, we complete the following high-level steps:
- Create a stored procedure to trigger the Lambda function.
- Write a Lambda function to verify data changes and push them to Amazon S3.
- Write a Lambda function to update the synonym file in Amazon OpenSearch Service.
- Test the data flow.
We discuss each step in detail in the next sections.
Make sure you complete the following prerequisites:
- Configure an Amazon OpenSearch Service domain. We use a domain running Elasticsearch version 7.9 for this architecture.
- Set up an Aurora MySQL database. For more information, see Configuring your Amazon Aurora DB cluster.
Create a stored procedure to trigger a Lambda function
You can invoke a Lambda function from an Aurora MySQL database cluster using a native function or a stored procedure.
The following script creates an example synonym data table:
You can now populate the table with sample data. To generate sample data in your table, run the following script:
Create a Lambda function
You can use two different methods to send data from Aurora to Amazon S3: a Lambda function or SELECT INTO OUTFILE S3.
To demonstrate the ease of setting up integration between multiple AWS services, we use a Lambda function that is called every time a change occurs that must be tracked in the database table. This function passes the data to Amazon S3. First create an S3 bucket where you store the synonym file using the Lambda function.
When you create your function, make sure you give the right permissions using an AWS Identity and Access Management (IAM) role for the S3 bucket. These permissions are for the Lambda execution role and S3 bucket where you store the synonyms.txt file. By default, Lambda creates an execution role with minimal permissions when you create a function on the Lambda console. The following is the Python code to create the synonyms.txt file in S3:
Note the Amazon Resource Name (ARN) of this Lambda function to use in a later step.
Give Aurora permissions to invoke a Lambda function
To give Aurora permissions to invoke your function, you must attach an IAM role with the appropriate permissions to the cluster. For more information, see Invoking a Lambda function from an Amazon Aurora DB cluster.
When you’re finished, the Aurora database has access to invoke a Lambda function.
Create a stored procedure and a trigger in Aurora
To create a new stored procedure, return to MySQL Workbench. Change the ARN in the following code to your Lambda function’s ARN before running the procedure: