How Crayon Uses AWS Language Technologies to Build Intelligent Decision Support Systems

By Ashith Bhandary, Pouya Ghiasnezhad Omran, and Kiriti Yelamanchali, Data Scientists – Crayon
By Prithy Yathavamurthy, Head of Language Technologies, APJ – Crayon
By Armin Haller, Director CoE Data & AI, APAC – Crayon
By Ling Chang and Vasileios Vonikakis, AI/ML Specialists – AWS

Crayon

In today’s digital age, we have access to an unprecedented amount of information and it can be overwhelming to keep up with the latest news and events. An intelligent decision support system (IDSS) offers an efficient way for organizations to stay informed on the latest news across different fields, such as real estate, business, finance, geopolitical, science, and engineering.

With an IDSS using language technologies on Amazon Web Services (AWS), companies can easily find relevant articles, industry reports, and other valuable information to help you make informed business decisions.

In this post, we will discuss how Crayon developed an IDSS tool for Savills Vietnam, allowing the leadership team to stay up-to-date on the latest real estate trends and opportunities in the region. Savills is a global real estate services company with offices around the world. As a real estate specialist, staying current is of paramount importance since information can help businesses identify new opportunities.

“As an organization that needs to sift through a lot of information from different sources in different languages, it was clear that having an intelligent decision support system was going to help my staff increase productivity,” said Matthew Powell, Director of Savills Hanoi.

Crayon is an AWS Premier Tier Services Partner and AWS Marketplace Seller with Competencies in Machine Learning, DevOps, and other key areas. Crayon is a customer-centric innovation and IT services company that provides guidance on clients’ business needs and budgets with software, cloud, artificial intelligence (AI), and big data.

Components of an Intelligent Search Engine

An IDSS tool is based on an intelligent search engine (ISE), which ISE offers natural language and semantic understanding of queries, going beyond simple keyword-based searching. Questions can be asked in natural language, and results can be more relevant to the user. Figure 1 shows a high-level diagram of an IDSS based on an intelligent search engine.

Figure 1 – High-level diagram of an IDSS.

Data is ingested into the system by crawling various sources, such as news agencies, blogs, portals, reports, and social media. The ingested unstructured text is pre-processed and enriched to extract important information that assists the answering process later. This can include classifying documents into different categories and extracting relevant entities or metadata.

The enriched documents are then indexed into the ISE, and users can ask questions in natural language and receive direct answers or relevant excerpts that are related to their questions. Users can also provide feedback on the relevance of the answers (via thumbs up/down, for example) and allows the ISE to improve over time.

Solution Requirements and Approach

Following a similar approach to Figure 1, the Savills team needed to ingest daily CommSights news articles, in Vietnamese, covering different categories, including commercial leasing, industrial, residential, and disaster news.

The team needed to differentiate between categories and required a filtering mechanism that would enable users to quickly narrow down their search to any of these topics. They also required the search to be conducted using natural language to quickly and seamlessly provide the necessary insights to users.

To address these requirements, Crayon opted to base its IDSS on Amazon Kendra, an intelligent search service that uses natural language processing (NLP) and advanced machine learning (ML) algorithms to return specific answers to search questions from data. Unlike traditional keyword-based search, Amazon Kendra uses its semantic and contextual understanding capabilities to decide whether a document is relevant to a search query. It returns specific answers to questions, giving users an experience that’s close to interacting with a human expert.

To categorize the incoming documents into different categories, Crayon used Amazon Comprehend, which leverages NLP to extract insights about the content of documents. It develops insights by recognizing entities, key phrases, language, sentiments, and other common elements in a document. Using Amazon Comprehend, Crayon trained a custom document classifier using a set of categorized historical documents.

The classifier exhibits an F1 score of 95% and can classify new incoming documents to different key topics. The output of the document classifier (class metadata) is included in Amazon Kendra, providing users with filtering options to easily narrow down their search to a specific topic.

For example, if the Savills commercial leasing team needs to know more about the latest commercial leasing news, it can narrow down its search to only this particular document category by utilizing the built-in filtering functionality of Amazon Kendra.

To translate the ingested Vietnamese documents to English, Crayon used Amazon Translate, a text translation service that uses advanced ML technologies to provide high-quality translation on demand.

Implementation

Figure 2 shows a diagram of the IDSS architecture that Crayon implemented for the customer.

Figure 2 – Architecture of the Savills IDSS system.

In the first stage, the ingested documents go through a pre-processing and enrichment phase, before they are ingested to Amazon Kendra. An AWS Lambda function acts as an orchestrator during this phase.

Incoming documents are passed to Amazon Translate to be translated from Vietnamese to English. The translated documents are stored in an Amazon Simple Storage Service (Amazon S3) bucket.
The translated documents are also classified into different topics by a custom document classifier in Amazon Comprehend, which has been trained on historical documents across different categories. The extracted metadata (document topics) are also stored in the same S3 bucket.
The English documents, along with their extracted metadata, are ingested periodically to an Amazon Kendra index. Kendra offers a large number of built-in connectors including Amazon S3. Periodic ingestion of new documents can be easily scheduled from within Kendra based on the frequency of the new incoming documents.
In the next retrieval phase, users authenticate through Amazon Cognito and use the IDSS web application. Through the search user interface (UI), they can submit questions in natural language and narrow down their search by filtering specific topics of interest. Each question is propagated through Amazon API Gateway and Lambda to Amazon Kendra. Based on the filtered topics and the question, Amazon Kendra will either return a specific answer or multiple relevant text excerpts from the knowledge base. Optionally, a large language model (LLM) can be used to further improve the answers returned by Kendra. This could include summarization of all the text excerpts, or synthesizing an answer to the asked question, by using the consolidated text excerpts as context (similar to what you can learn in this AWS blog post). The LLM can be either hosted in an Amazon SageMaker endpoint, using SageMaker JumpStart, or interfaced through an API using Amazon Bedrock. The final answer (either directly from Amazon Kendra or from the LLM) is returned to the user.
Answer generation:
1. Based on the filtered topics and the question, Amazon Kendra will either return a specific answer or multiple relevant text excerpts from the knowledge base.
2. Optionally, an LLM can be used to further improve the answers returned by Kendra. This could include summarization of all text excerpts or synthesizing an answer to the asked question by using the consolidated text excerpts as context (similar to what you can learn in this AWS blog post). The LLM can be hosted in an Amazon SageMaker endpoint, using SageMaker JumpStart, or interfaced through an API using Amazon Bedrock.
3. The final answer (either directly from Kendra or from the LLM) is returned to the user.
Optionally, an Amazon QuickSight interactive dashboard can be used to display consolidated statistics about the ingested documents (topic ratio of the ingested documents) or search-related analytics that Amazon Kendra offers natively (click-through rate, top queries, top clicked documents, total queries).

As an alternate option to the pre-processing and enrichment phase, instead of using Amazon Translate and Amazon Comprehend, an LLM could be used to translate from Vietnamese to English and classify documents into different topics, thus simplifying the architecture.

The two figures below depict an example of the actual IDSS system built for Savills. After a user is signed in, they can ask any question in the main search box.

On the left, they can filter results across different topics (which were extracted during the enrichment phase). In the main part of the page, relevant documents (Figure 3a) or a direct suggested answer (Figure 3b) will appear. Users can click the title of each retrieved result and be taken to the original document.

Below each item, thumbs up/down icons allow the users to indicate whether this particular result is helpful or not. This direct feedback is important because it closes the loop, allowing Amazon Kendra to improve its responses over time.

Figure 3a – Snapshot of the IDSS showing retrieved relevant articles, along with filtering topics.

Figure 3b – Snapshot of the IDSS showing a suggested answer, along with filtering topics.

Conclusion

Crayon’s intelligent decision support system (IDSS) provides real estate professionals in Savills with a convenient and efficient way to stay up-to-date with the latest industry news. The IDSS that Crayon built for Savills is a valuable tool for real estate professionals looking to make informed business decisions and stay ahead of the curve in their industry.

By incorporating natural language processing, machine learning, and search indexing, the solution’s intelligent search engine offers relevant and accurate search results to users. With the ability to classify news articles into various key topics, users can easily filter their search to find the information they need.

Worldwide, Crayon has established strong generative AI capabilities and operating model within its AI Center of Excellences, counting 150 technical resources. Crayon helps customers make use of these innovative technologies to harness the power of generative AI algorithms and large language models to generate high-quality content, automate tasks, and enhance decision making.

.

.

Crayon – AWS Partner Spotlight

Crayon is an AWS Premier Tier Services Partner and customer-centric innovation and IT services company that provides guidance on clients’ business needs and budgets with software, cloud, artificial intelligence (AI), and big data.

Contact Crayon | Partner Overview | AWS Marketplace | Case Studies