AWS Machine Learning Blog

Accenture creates a custom memory-persistent conversational user experience using Amazon Q Business

Traditionally, finding relevant information from documents has been a time-consuming and often frustrating process. Manually sifting through pages upon pages of text, searching for specific details, and synthesizing the information into coherent summaries can be a daunting task. This inefficiency not only hinders productivity but also increases the risk of overlooking critical insights buried within the document’s depths.

Imagine a scenario where a call center agent needs to quickly analyze multiple documents to provide summaries for clients. Previously, this process would involve painstakingly navigating through each document, a task that is both time-consuming and prone to human error.

With the advent of chatbots in the conversational artificial intelligence (AI) domain, you can now upload your documents through an intuitive interface and initiate a conversation by asking specific questions related to your inquiries. The chatbot then analyzes the uploaded documents, using advanced natural language processing (NLP) and machine learning (ML) technologies to provide comprehensive summaries tailored to your questions.

However, the true power lies in the chatbot’s ability to preserve context throughout the conversation. As you navigate through the discussion, the chatbot should maintain a memory of previous interactions, allowing you to review past discussions and retrieve specific details as needed. This seamless experience makes sure you can effortlessly explore the depths of your documents without losing track of the conversation’s flow.

Amazon Q Business is a generative AI-powered assistant that can answer questions, provide summaries, generate content, and securely complete tasks based on data and information in your enterprise systems. It empowers employees to be more creative, data-driven, efficient, prepared, and productive.

This post demonstrates how Accenture used Amazon Q Business to implement a chatbot application that offers straightforward attachment and conversation ID management. This solution can speed up your development workflow, and you can use it without crowding your application code.

“Amazon Q Business distinguishes itself by delivering personalized AI assistance through seamless integration with diverse data sources. It offers accurate, context-specific responses, contrasting with foundation models that typically require complex setup for similar levels of personalization. Amazon Q Business real-time, tailored solutions drive enhanced decision-making and operational efficiency in enterprise settings, making it superior for immediate, actionable insights”

– Dominik Juran, Cloud Architect, Accenture

Solution overview

In this use case, an insurance provider uses a Retrieval Augmented Generation (RAG) based large language model (LLM) implementation to upload and compare policy documents efficiently. Policy documents are preprocessed and stored, allowing the system to retrieve relevant sections based on input queries. This enhances the accuracy, transparency, and speed of policy comparison, making sure clients receive the best coverage options.

This solution augments an Amazon Q Business application with persistent memory and context tracking throughout conversations. As users pose follow-up questions, Amazon Q Business can continually refine responses while recalling previous interactions. This preserves conversational flow when navigating in-depth inquiries.

At the core of this use case lies the creation of a custom Python class for Amazon Q Business, which streamlines the development workflow for this solution. This class offers robust document management capabilities, keeping track of attachments already shared within a conversation as well as new uploads to the Streamlit application. Additionally, it maintains an internal state to persist conversation IDs for future interactions, providing a seamless user experience.

The solution involves developing a web application using Streamlit, Python, and AWS services, featuring a chat interface where users can interact with an AI assistant to ask questions or upload PDF documents for analysis. Behind the scenes, the application uses Amazon Q Business for conversation history management, vectorizing the knowledge base, context creation, and NLP. The integration of these technologies allows for seamless communication between the user and the AI assistant, enabling tasks such as document summarization, question answering, and comparison of multiple documents based on the documents attached in real time.

The code uses Amazon Q Business APIs to interact with Amazon Q Business and send and receive messages within a conversation, specifically the qbusiness client from the boto3 library.

In this use case, we used the German language to test our RAG LLM implementation on 10 different documents and 10 different use cases. Policy documents were preprocessed and stored, enabling accurate retrieval of relevant sections based on input queries. This testing demonstrated the system’s accuracy and effectiveness in handling German language policy comparisons.

The following is a code snippet:

import boto3
import json
from botocore.exceptions import ClientError
from os import environ

class AmazonQHandler:
    def __init__(self, application_id, user_id, conversation_id, system_message_id):
        self.application_id = application_id
        self.user_id = user_id
        self.qbusiness = boto3.client('qbusiness')
        self.prompt_engineering_instruction = "Ansage: Auf Deutsch, und nur mit den nötigsten Wörter ohne ganze Sätze antworten, bitte"
        self.parent_message_id = system_message_id
        self.conversation_id = conversation_id

    def process_message(self, initial_message, input_text):
        print('Please ask as many questions as you want. At the end of the session write exit\n')
        message = f'{self.prompt_engineering_instruction}: {input_text}'
        return message


def send_message(self, input_text, uploaded_file_names=[]):
        attachments = []
        message = f'{self.prompt_engineering_instruction}: {input_text}'
        if len(uploaded_file_names) > 0:
            for file_name in uploaded_file_names:
                in_file = open(file_name, "rb")
                data =
                    'data': data,
                    'name': file_name

        if self.conversation_id:
            print("we are in if part of send_message")
            if len(attachments) > 0:
                resp = self.qbusiness.chat_sync(
                resp = self.qbusiness.chat_sync(
            if len(attachments) > 0:
                resp = self.qbusiness.chat_sync(
                resp = self.qbusiness.chat_sync(
            self.conversation_id = resp.get("conversationId")

        print(f'Amazon Q: "{resp.get("systemMessage")}"\n')
        self.parent_message_id = resp.get("systemMessageId")
        return resp.get("systemMessage")

if __name__ == '__main__':
    application_id = environ.get("APPLICATION_ID", "a392f5e9-50ed-4f93-bcad-6f8a26a8212d")
    user_id = environ.get("USER_ID", "AmazonQ-Administrator")

    amazon_q_handler = AmazonQHandler(application_id, user_id)

The architectural flow of this solution is shown in the following diagram.

Q business

The workflow consists of the following steps:

  1. The LLM wrapper application code is containerized using AWS CodePipeline, a fully managed continuous delivery service that automates the build, test, and deploy phases of the software release process.
  2. The application is deployed to Amazon Elastic Container Service (Amazon ECS), a highly scalable and reliable container orchestration service that provides optimal resource utilization and high availability. Because we were making the calls from a Flask-based ECS task running Streamlit to Amazon Q Business, we used Amazon Cognito user pools rather than AWS IAM Identity Center to authenticate users for simplicity, and we hadn’t experimented with IAM Identity Center on Amazon Q Business at the time. For instructions to set up IAM Identity Center integration with Amazon Q Business, refer to Setting up Amazon Q Business with IAM Identity Center as identity provider.
  3. Users authenticate through an Amazon Cognito UI, a secure user directory that scales to millions of users and integrates with various identity providers.
  4. A Streamlit application running on Amazon ECS receives the authenticated user’s request.
  5. An instance of the custom AmazonQ class is initiated. If an ongoing Amazon Q Business conversation is present, the correct conversation ID is persisted, providing continuity. If no existing conversation is found, a new conversation is initiated.
  6. Documents attached to the Streamlit state are passed to the instance of the AmazonQ class, which keeps track of the delta between the documents already attached to the conversation ID and the documents yet to be shared. This approach respects and optimizes the five-attachment limit imposed by Amazon Q Business. To simplify and avoid repetitions in the middleware library code we are maintaining on the Streamlit application, we decided to write a custom wrapper class for the Amazon Q Business calls, which keeps the attachment and conversation history management in itself as class variables (as opposed to state-based management on the Streamlit level).
  7. Our wrapper Python class encapsulating the Amazon Q Business instance parses and returns the answers based on the conversation ID and the dynamically provided context derived from the user’s question.
  8. Amazon ECS serves the answer to the authenticated user, providing a secure and scalable delivery of the response.


This solution has the following prerequisites:

  • You must have an AWS account where you will be able to create access keys and configure services like Amazon Simple Storage Service (Amazon S3) and Amazon Q Business
  • Python must be installed on the environment, as well as all the necessary libraries such as boto3
  • It is assumed that you have Streamlit library installed for Python, along with all the necessary settings

Deploy the solution

The deployment process entails provisioning the required AWS infrastructure, configuring environment variables, and deploying the application code. This is accomplished by using AWS services such as CodePipeline and Amazon ECS for container orchestration and Amazon Q Business for NLP.

Additionally, Amazon Cognito is integrated with Amazon ECS using the AWS Cloud Development Kit (AWS CDK) and user pools are used for user authentication and management. After deployment, you can access the application through a web browser. Amazon Q Business is called from the ECS task. It is crucial to establish proper access permissions and security measures to safeguard user data and uphold the application’s integrity.

We use AWS CDK to deploy a web application using Amazon ECS with AWS Fargate, Amazon Cognito for user authentication, and AWS Certificate Manager for SSL/TLS certificates.

To deploy the infrastructure, run the following commands:

  • npm install to install dependencies
  • npm run build to build the TypeScript code
  • npx cdk synth to synthesize the AWS CloudFormation template
  • npx cdk deploy to deploy the infrastructure

The following screenshot shows our deployed CloudFormation stack.

UI demonstration

The following screenshot shows the home page when a user opens the application in a web browser.

The following screenshot shows an example response from Amazon Q Business when no file was uploaded and no relevant answer to the question was found.

The following screenshot illustrates the entire application flow, where the user asked a question before a file was uploaded, then uploaded a file, and asked the same question again. The response from Amazon Q Business after uploading the file is different from the first query (for testing purposes, we used a very simple file with randomly generated text in PDF format).

Solution benefits

This solution offers the following benefits:

  • Efficiency – Automation enhances productivity by streamlining document analysis, saving time, and optimizing resources
  • Accuracy – Advanced techniques provide precise data extraction and interpretation, reducing errors and improving reliability
  • User-friendly experience – The intuitive interface and conversational design make it accessible to all users, encouraging adoption and straightforward integration into workflows

This containerized architecture allows the solution to scale seamlessly while optimizing request throughput. Persisting the conversation state enhances precision by continuously expanding dialog context. Overall, this solution can help you balance performance with the fidelity of a persistent, context-aware AI assistant through Amazon Q Business.

Clean up

After deployment, you should implement a thorough cleanup plan to maintain efficient resource management and mitigate unnecessary costs, particularly concerning the AWS services used in the deployment process. This plan should include the following key steps:

  • Delete AWS resources – Identify and delete any unused AWS resources, such as EC2 instances, ECS clusters, and other infrastructure provisioned for the application deployment. This can be achieved through the AWS Management Console or AWS Command Line Interface (AWS CLI).
  • Delete CodeCommit repositories – Remove any CodeCommit repositories created for storing the application’s source code. This helps declutter the repository list and prevents additional charges for unused repositories.
  • Review and adjust CodePipeline configuration – Review the configuration of CodePipeline and make sure there are no active pipelines associated with the deployed application. If pipelines are no longer required, consider deleting them to prevent unnecessary runs and associated costs.
  • Evaluate Amazon Cognito user pools – Evaluate the user pools configured in Amazon Cognito and remove any unnecessary pools or configurations. Adjust the settings to optimize costs and adhere to the application’s user management requirements.

By diligently implementing these cleanup procedures, you can effectively minimize expenses, optimize resource usage, and maintain a tidy environment for future development iterations or deployments. Additionally, regular review and adjustment of AWS services and configurations is recommended to provide ongoing cost-effectiveness and operational efficiency.

If the solution runs in AWS Amplify or is provisioned by the AWS CDK, you don’t need to take care of removing everything described in this section; deleting the Amplify application or AWS CDK stack is enough to get rid all of the resources associated with the application.


In this post, we showcased how Accenture created a custom memory-persistent conversational assistant using AWS generative AI services. The solution can cater to clients developing end-to-end conversational persistent chatbot applications at a large scale following the provided architectural practices and guidelines.

The joint effort between Accenture and AWS builds on the 15-year strategic relationship between the companies and uses the same proven mechanisms and accelerators built by the Accenture AWS Business Group (AABG). Connect with the AABG team at to drive business outcomes by transforming to an intelligent data enterprise on AWS.

For further information about generative AI on AWS using Amazon Bedrock or Amazon Q Business, we recommend the following resources:

You can also sign up for the AWS generative AI newsletter, which includes educational resources, post posts, and service updates.

About the Authors

Dominik Juran works as a full stack developer at Accenture with a focus on AWS technologies and AI. He also has a passion for ice hockey.

Milica Bozic works as Cloud Engineer at Accenture, specializing in AWS Cloud solutions for the specific needs of clients with background in telecommunications, particularly 4G and 5G technologies. Mili is passionate about art, books, and movement training, finding inspiration in creative expression and physical activity.

Zdenko Estok works as a cloud architect and DevOps engineer at Accenture. He works with AABG to develop and implement innovative cloud solutions, and specializes in infrastructure as code and cloud security. Zdenko likes to bike to the office and enjoys pleasant walks in nature.

Selimcan “Can” Sakar is a cloud first developer and solution architect at Accenture with a focus on artificial intelligence and a passion for watching models converge.

Shikhar Kwatra is a Sr. AI/ML Specialist Solutions Architect at Amazon Web Services, working with leading Global System Integrators. He has earned the title of one of the Youngest Indian Master Inventors with over 500 patents in the AI/ML and IoT domains. Shikhar aids in architecting, building, and maintaining cost-efficient, scalable cloud environments for the organization, and supports the GSI partner in building strategic industry solutions on AWS. Shikhar enjoys playing guitar, composing music, and practicing mindfulness in his spare time.