Enhance Amazon Lex with LLMs and improve the FAQ experience using URL ingestion

In today’s digital world, most consumers would rather find answers to their customer service questions on their own rather than taking the time to reach out to businesses and/or service providers. This blog post explores an innovative solution to build a question and answer chatbot in Amazon Lex that uses existing FAQs from your website. This AI-powered tool can provide quick, accurate responses to real-world inquiries, allowing the customer to quickly and easily solve common problems independently.

Single URL ingestion

Many enterprises have a published set of answers for FAQs for their customers available on their website. In this case, we want to offer customers a chatbot that can answer their questions from our published FAQs. In the blog post titled Enhance Amazon Lex with conversational FAQ features using LLMs, we demonstrated how you can use a combination of Amazon Lex and LlamaIndex to build a chatbot powered by your existing knowledge sources, such as PDF or Word documents. To support a simple FAQ, based on a website of FAQs, we need to create an ingestion process that can crawl the website and create embeddings that can be used by LlamaIndex to answer customer questions. In this case, we will build on the bot created in the previous blog post, which queries those embeddings with a user’s utterance and returns the answer from the website FAQs.

The following diagram shows how the ingestion process and the Amazon Lex bot work together for our solution.

In the solution workflow, the website with FAQs is ingested via AWS Lambda. This Lambda function crawls the website and stores the resulting text in an Amazon Simple Storage Service (Amazon S3) bucket. The S3 bucket then triggers a Lambda function that uses LlamaIndex to create embeddings that are stored in Amazon S3. When a question from an end-user arrives, such as “What is your return policy?”, the Amazon Lex bot uses its Lambda function to query the embeddings using a RAG-based approach with LlamaIndex. For more information about this approach and the pre-requisites, refer to the blog post, Enhance Amazon Lex with conversational FAQ features using LLMs.

After the pre-requisites from the aforementioned blog are complete, the first step is to ingest the FAQs into a document repository that can be vectorized and indexed by LlamaIndex. The following code shows how to accomplish this:

import logging
import sys
import requests
import html2text
from llama_index.readers.schema.base import Document
from llama_index import GPTVectorStoreIndex
from typing import List

logging.basicConfig(stream=sys.stdout, level=logging.INFO)
logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))


class EZWebLoader:
    def __init__(self, default_header: str = None):
        self._html_to_text_parser = html2text()
        if default_header is None:
            self._default_header = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}
        else:
            self._default_header = default_header
    
    def load_data(self, urls: List[str], headers: str = None) -> List[Document]:
        if headers is None:
            headers = self._default_header
    
        documents = []
        for url in urls:
            response = requests.get(url, headers=headers).text
            response = self._html2text.html2text(response)
            documents.append(Document(response))
            return documents

url = "http://www.zappos.com/general-questions"
loader = EZWebLoader()
documents = loader.load_data([url])
index = GPTVectorStoreIndex.from_documents(documents)

In the preceding example, we take a predefined FAQ website URL from Zappos and ingest it using the EZWebLoader class. With this class, we have navigated to the URL and loaded all the questions that are in the page into an index. We can now ask a question like “Does Zappos have gift cards?” and get the answers directly from our FAQs on the website. The following screenshot shows the Amazon Lex bot test console answering that question from the FAQs.

We were able to achieve this because we had crawled the URL in the first step and created embedddings that LlamaIndex could use to search for the answer to our question. Our bot’s Lambda function shows how this search is run whenever the fallback intent is returned:

import time
import json
import os
import logging
import boto3
from llama_index import StorageContext, load_index_from_storage

logger = logging.getLogger()
logger.setLevel(logging.DEBUG)


def download_docstore():
    # Create an S3 client
    s3 = boto3.client('s3')

# List all objects in the S3 bucket and download each one
try:
    bucket_name = 'faq-bot-storage-001'
    s3_response = s3.list_objects_v2(Bucket=bucket_name)
    
    if 'Contents' in s3_response:
        for item in s3_response['Contents']:
            file_name = item['Key']
            logger.debug("Downloading to /tmp/" + file_name)
            s3.download_file(bucket_name, file_name, '/tmp/' + file_name)
            
            logger.debug('All files downloaded from S3 and written to local filesystem.')
    
except Exception as e:
    logger.error(e)
raise e

#download the doc store locally
download_docstore()

storage_context = StorageContext.from_defaults(persist_dir="/tmp/")
# load index
index = load_index_from_storage(storage_context)
query_engine = index.as_query_engine()


def lambda_handler(event, context):
"""
Route the incoming request based on intent.
The JSON body of the request is provided in the event slot.
"""
    # By default, treat the user request as coming from the America/New_York time zone.
    os.environ['TZ'] = 'America/New_York'
    time.tzset()
    logger.debug("===== START LEX FULFILLMENT ====")
    logger.debug(event)
    slots = {}
    if "currentIntent" in event and "slots" in event["currentIntent"]:
        slots = event["currentIntent"]["slots"]
        intent = event["sessionState"]["intent"]
    
    dialogaction = {"type": "Delegate"}
    message = []
    if str.lower(intent["name"]) == "fallbackintent":
        #execute query from the input given by the user
        response = str.strip(query_engine.query(event["inputTranscript"]).response)
        dialogaction["type"] = "Close"
        message.append({'content': f'{response}', 'contentType': 'PlainText'})

    final_response = {
        "sessionState": {
            "dialogAction": dialogaction,
            "intent": intent
        },
        "messages": message
        }
    
    logger.debug(json.dumps(final_response, indent=1))
    logger.debug("===== END LEX FULFILLMENT ====")
    
    return final_response

This solution works well when a single webpage has all the answers. However, most FAQ sites are not built on a single page. For instance, in our Zappos example, if we ask the question “Do you have a price matching policy?”, then we get a less-than-satisfactory answer, as shown in the following screenshot.

In the preceding interaction, the price-matching policy answer isn’t helpful for our user. This answer is short because the FAQ referenced is a link to a specific page about the price matching policy and our web crawl was only for the single page. Achieving better answers will mean crawling these links as well. The next section shows how to get answers to questions that require two or more levels of page depth.

N-level crawling

When we crawl a web page for FAQ knowledge, the information we want can be contained in linked pages. For example, in our Zappos example, we ask the question “Do you have a price matching policy?” and the answer is “Yes please visit <link> to learn more.” If someone asks “What is your price matching policy?” then we want to give a complete answer with the policy. Achieving this means we have the need to traverse links to get the actual information for our end-user. During the ingestion process, we can use our web loader to find the anchor links to other HTML pages and then traverse them. The following code change to our web crawler allows us to find links in the pages we crawl. It also includes some additional logic to avoid circular crawling and allow a filter by a prefix.

import logging
import requests
import html2text
from llama_index.readers.schema.base import Document
from typing import List
import re


def find_http_urls_in_parentheses(s: str, prefix: str = None):
    pattern = r'\((https?://[^)]+)\)'
    urls = re.findall(pattern, s)
    
    matched = []
    if prefix is not None:
        for url in urls:
            if str(url).startswith(prefix):
                matched.append(url)
            else:
                matched = urls
    
    return list(set(matched)) # remove duplicates by converting to set, then convert back to list



class EZWebLoader:
    
    def __init__(self, default_header: str = None):
        self._html_to_text_parser = html2text
        if default_header is None:
            self._default_header = {"User-agent":"Mozilla/5.0 (X11; Linux x86_64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/47.0.2526.80 Safari/537.36"}
        else:
            self._default_header = default_header

    def load_data(self,
        urls: List[str],
        num_levels: int = 0,
        level_prefix: str = None,
        headers: str = None) -> List[Document]:

        logging.info(f"Number of urls: {len(urls)}.")

        if headers is None:
            headers = self._default_header

        documents = []
        visited = {}
        for url in urls:
            q = [url]
            depth = num_levels
            for page in q:
                if page not in visited: #prevent cycles by checking to see if we already crawled a link
                    logging.info(f"Crawling {page}")
                    visited[page] = True #add entry to visited to prevent re-crawling pages
                    response = requests.get(page, headers=headers).text
                    response = self._html_to_text_parser.html2text(response) #reduce html to text
                    documents.append(Document(response))
                if depth > 0:
                    #crawl linked pages
                    ingest_urls = find_http_urls_in_parentheses(response, level_prefix)
                    logging.info(f"Found {len(ingest_urls)} pages to crawl.")
                    q.extend(ingest_urls)
                    depth -= 1 #reduce the depth counter so we go only num_levels deep in our crawl
                else:
                    logging.info(f"Skipping {page} as it has already been crawled")
                    logging.info(f"Number of documents: {len(documents)}.")
        return documents

url = "http://www.zappos.com/general-questions"
loader = EZWebLoader()
#crawl the site with 1 level depth and prefix of "/c/" for customer service root
documents = loader.load_data([url], num_levels=1, level_prefix="https://www.zappos.com/c/")
index = GPTVectorStoreIndex.from_documents(documents)

In the preceding code, we introduce the ability to crawl N levels deep, and we give a prefix that allows us to restrict crawling to only things that begin with a certain URL pattern. In our Zappos example, the customer service pages all are rooted from zappos.com/c, so we include that as a prefix to limit our crawls to a smaller and more relevant subset. The code shows how we can ingest up to two levels deep. Our bot’s Lambda logic remains the same because nothing has changed except the crawler ingests more documents.

We now have all the documents indexed and we can ask a more detailed question. In the following screenshot, our bot provides the correct answer to the question “Do you have a price matching policy?”

We now have a complete answer to our question about price matching. Instead of simply being told “Yes see our policy,” it gives us the details from the second-level crawl.

Clean up

To avoid incurring future expenses, proceed with deleting all the resources that were deployed as part of this exercise. We have provided a script to shut down the Sagemaker endpoint gracefully. Usage details are in the README. Additionally, to remove all the other resources you can run cdk destroy in the same directory as the other cdk commands to deprovision all the resources in your stack.

Conclusion

The ability to ingest a set of FAQs into a chatbot enables your customers to find the answers to their questions with straightforward, natural language queries. By combining the built-in support in Amazon Lex for fallback handling with a RAG solution such as a LlamaIndex, we can provide a quick path for our customers to get satisfying, curated, and approved answers to FAQs. By applying N-level crawling into our solution, we can allow for answers that could possibly span multiple FAQ links and provide deeper answers to our customer’s queries. By following these steps, you can seamlessly incorporate powerful LLM-based Q and A capabilities and efficient URL ingestion into your Amazon Lex chatbot. This results in more accurate, comprehensive, and contextually aware interactions with users.

About the authors

Max Henkel-Wallace is a Software Development Engineer at AWS Lex. He enjoys working leveraging technology to maximize customer success. Outside of work he is passionate about cooking, spending time with friends, and backpacking.

Song Feng is a Senior Applied Scientist at AWS AI Labs, specializing in Natural Language Processing and Artificial Intelligence. Her research explores various aspects of these fields including document-grounded dialogue modeling, reasoning for task-oriented dialogues, and interactive text generation using multimodal data.

John Baker is a Principal SDE at AWS where he works on Natural Language Processing, Large Language Models and other ML/AI related projects. He has been with Amazon for 9+ years and has worked across AWS, Alexa and Amazon.com. In his spare time, John enjoys skiing and other outdoor activities throughout the Pacific Northwest.

AWS Machine Learning Blog

Enhance Amazon Lex with LLMs and improve the FAQ experience using URL ingestion

Single URL ingestion

N-level crawling

Clean up

Conclusion

About the authors

Resources

Blog Topics

Follow