AWS Machine Learning Blog
Use Amazon Q to find answers on Google Drive in an enterprise
Amazon Q Business is a generative AI-powered assistant designed to enhance enterprise operations. It’s a fully managed service that helps provide accurate answers to users’ questions while adhering to the security and access restrictions of the content. You can tailor Amazon Q Business to your specific business needs by connecting to your company’s information and enterprise systems using built-in connectors to a variety of enterprise data sources. It enables users in various roles, such as marketing managers, project managers, and sales representatives, to have tailored conversations, solve business problems, generate content, take action, and more, through a web interface. This service aims to help make employees work smarter, move faster, and drive significant impact by providing immediate and relevant information to help them with their tasks.
One such enterprise data repository you can use to store and manage content is Google Drive. Google Drive is a cloud-based storage service that provides a centralized location for storing digital assets, including documents, knowledge articles, and spreadsheets. This service helps your teams collaborate effectively by enabling the sharing and organization of important files across the enterprise. To use Google Drive within Amazon Q Business, you can configure the Amazon Q Business Google Drive connector. This connector allows Amazon Q Business to securely index files stored in Google Drive using access control lists (ACLs). These ACLs make sure that users only access the documents they’re permitted to view, allowing them to ask questions and retrieve information relevant to their work directly through Amazon Q Business.
This post covers the steps to configure the Amazon Q Business Google Drive connector, including authentication setup and verifying the secure indexing of your Google Drive content.
Index Google Drive documents using the Amazon Q Google Drive connector
The Amazon Q Google Drive connector can index Google Drive documents hosted in a Google Workspace account. The connector can’t index documents stored on Google Drive in a personal Google Gmail account. Amazon Q Business can authenticate with your Google Workspace using a service account or OAuth 2.0 authentication. A service account enables indexing files for user accounts across an enterprise in a Google Workspace. Using OAuth 2.0 authentication allows for crawling and indexing files in a single Google Workspace account. This post shows you how to configure Amazon Q Business to authenticate using a Google service account.
Google prescribes that in order to index multiple users’ documents, the crawler must support the capability to authenticate with a service account with domain-wide delegation. This allows the connector to index the documents of all users in your drive and shared drives. Amazon Q Business connectors only crawl the documents that the Amazon Q Business application administrator specifies need to be crawled. Administrators can specify the paths to crawl, specific file name patterns, or types. Amazon Q Business doesn’t use customer data to train any models. All customer data is indexed only in the customer account. Also, Amazon Q Business Connectors will only index content specified by the administrator. It won’t index any content on its own without explicitly being configured to do so by the administrator of Amazon Q Business.
You can configure the Amazon Q Google Drive connector to crawl and index file types supported by Amazon Q Business. Google Write documents are exported as Microsoft Word and Google Sheet documents are exported as Microsoft Excel during the crawling phase.
Metadata
Every document has structural attributes—or metadata—attached to it. Document attributes can include information such as document title, document author, time created, time updated, and document type.
When you connect Amazon Q Business to a data source, it automatically maps specific data source document attributes to fields within an Amazon Q Business index. If a document attribute in your data source doesn’t have an attribute mapping already available, or if you want to map additional document attributes to index fields, you can use the custom field mappings to specify how a data source attribute maps to an Amazon Q Business index field. You can create field mappings by editing your data source after your application and retriever are created.
There are four default metadata attributes indexed for each Google Drive document: authors, source URL, creation date, and last update date. You can also select additional reserved data field mappings.
Amazon Q Business crawls Google Drive ACLs defined in a Google Workspace for document security. Google Workspace users and groups are mapped to the _user_id
and _group_ids
fields associated with the Amazon Q Business application in AWS IAM Identity Center. These user and group associations are persisted in the user store associated with the Amazon Q Business index created for crawled Google Drive documents.
Overview of ACLs in Amazon Q Business
In the context of knowledge management and generative AI chatbot applications, an ACL plays a crucial role in managing who can access information and what actions they can perform within the system. They also facilitate knowledge sharing within specific groups or teams while restricting access to others.
In this solution, we deploy an Amazon Q web experience to demonstrate that two business users can only ask questions about documents they have access to according to the ACL. With the Amazon Q Business Google Drive connector, the Google Workspace ACL will be ingested with documents. This enables Amazon Q Business to control the scope of documents that each user can access in the Amazon Q web experience.
Authentication types
An Amazon Q Business application requires you to use IAM Identity Center to manage user access. Although it’s recommended to have an IAM Identity Center instance configured (with users federated and groups added) before you start, you can also choose to create and configure an IAM Identity Center instance for your Amazon Q Business application using the Amazon Q console.
You can also add users to your IAM Identity Center instance from the Amazon Q Business console, if you aren’t federating identity. When you add a new user, make sure that the user is enabled in your IAM Identity Center instance and that they have verified their email ID. They need to complete these steps before they can log in to your Amazon Q Business web experience.
Your identity source in IAM Identity Center defines where your users and groups are managed. After you configure your identity source, you can look up users or groups to grant them single sign-on access to AWS accounts, applications, or both.
You can have only one identity source per organization in AWS Organizations. You can choose one of the following as your identity source:
- IAM Identity Center directory – When you enable IAM Identity Center for the first time, it’s automatically configured with an IAM Identity Center directory as your default identity source. This is where you create your users and groups, and assign their level of access to your AWS accounts and applications. For more details, see Manage identities in IAM Identity Center.
- Active Directory – Choose this option if you want to continue managing users in either your AWS Managed Microsoft AD directory using AWS Directory Service or your self- managed directory in Active Directory (AD).
- External identity provider – Choose this option if you want to manage users in other external identity providers (IdPs) through the SAML 2.0 standard, such as Okta.
- IAM identity provider – Amazon Q Business applications can now federate with an enterprise’s IAM IdP. For more information, refer to Build private and secure enterprise generative AI applications with Amazon Q Business using IAM Federation.
Overview of solution
With Amazon Q Business, you can configure multiple data sources to provide a central place to search across your document repository. For our solution, we demonstrate how to index Google Drive data using the Amazon Q Business Google Drive connector. We complete the following steps:
- Configure Google Workspace prerequisites.
- Configure an Amazon Q Business application.
- Connect Google Drive to Amazon Q Business.
- Create users and index the data in the Google Drive.
- Run a sample query to test the solution.
Configure Google Workspace prerequisites
For this solution, Amazon Q will connect to a Google Workspace and crawl Google Drive documents owned by business users in different groups using a service account. Complete the following steps to configure your Google Workspace:
- Log in to the Google API console as an admin user.
- Choose the dropdown menu next to the search box, then choose New Project.
- Enter the project name, choose the Google organization, and choose Create.
The Google Drive and Admin SDK APIs need to be enabled for Amazon Q to crawl Google Drive files.
- Search for each API on the Google Cloud console and choose Enable.
- Search for Service Accounts to access the IAM & Admin navigation pane and choose Create Service Account.
- Enter the service account name, service account ID, and description, and choose Done.
- Choose the email of the service account created in the previous step.
- On the Keys tab, choose Add Key, then choose Create New Key.
- For Key type, select JSON, and choose Create to download and locally save a new private key.
Now we enable domain-wide delegation for the five required API scopes on the Domain-wide Delegation page.
- Choose Add new.
- Add the following comma delimited API scopes for client ID generated for the private key created in the previous step:
https://www.googleapis.com/auth/drive.readonly,
https://www.googleapis.com/auth/drive.metadata.readonly,
https://www.googleapis.com/auth/admin.directory.group.readonly,
https://www.googleapis.com/auth/admin.directory.user.readonly,
https://www.googleapis.com/auth/cloud-platform
- Choose Authorize.
Now we create users and add them to groups.
- Navigate to the Google Workspace Admin console and choose Users in the navigation pane.
- Choose Add new user to create two new business users.
- Choose Groups in the navigation pane.
- Choose Create group to create two Google groups and add one business user to each group.
- Upload files that Amazon Q supports into each business user’s Google Drive.
In this solution, we upload the Amazon 2020 annual report to the first business user’s Google Drive and upload the Amazon 2021 annual report and Amazon 2022 annual report to the second business user’s Google Drive.
The business user that uploaded the Amazon 2021 annual report can also share it with the other business user’s Google group.
- Choose the options menu (three vertical dots) for the Google Drive file and choose Share.
- Enter the name of the other Google group and choose Send.
Create an Amazon Q Business application with a Google Drive connector
An Amazon Q Business application needs to be created with a Google Drive connector to crawl and index Google Drive files. To create an Amazon Q application, complete the following steps:
- On the Amazon Q console, choose Applications in the navigation pane.
- Choose Create application.
- For Application name, enter a name.
- Leave application configuration settings as defaults.
- Choose Create.
- After the application is created, choose Data Sources.
- Then choose Select retriever and Confirm to use a Native retriever and Enterprise provisioning.
- After confirming retriever settings, choose Add data source, and then choose the plus sign next to Google Drive.
- Under Name and description, enter a data source name and optional description.
- Under Authentication, select Google service account and choose Create a new secret from the AWS Secrets Manager secret drop down to create an AWS Secrets Manager secret.
- Enter a secret name, admin account email, client email, and the JSON key you downloaded earlier, then choose Save.
- Under IAM role, choose Create a new service role.
- Under Additional Configuration, choose User email, and add the two recently created Google Workspace business user email addresses.
- Under Sync run schedule, for Frequency, choose Run on demand.
- Choose Add data source.
Create and manage users
To create an Amazon Q web experience accessible by Google Workspace users, you need to create corresponding users in IAM Identity Center. Amazon Q applications are only accessible by IAM Identity Center users with user identities that own indexed documents. To create the IAM Identity Center users, complete the following steps:
- On the IAM Identity Center console, choose Users in the navigation pane.
- Choose Add user.
- Create IAM Identity Center users that mirror your Google Workspace users by entering the required user information.
- Accept the IAM Identity Center invitation sent through email to each new business user and set each business user’s IAM Identity Center password.
- On the Amazon Q Business console, navigate to the application with the Google Drive data source.
- Choose Manage user access.
- Choose Add groups and users, select Assign existing users and groups, and choose Next.
- Assign users to the Amazon Q application, choose Assign, and choose Confirm if each business user is subscribed to Q Business Pro.
After you add IAM Identity Center users to your Amazon Q application, its web experience URL will appear in the Q Business applications list. You can use the URL to connect to the Amazon Q web experience with either of your Google business users. By default, each user can only ask questions about documents in their Google Drive.
Run sample queries in Amazon Q
To test the Amazon Q application with the Amazon annual reports you uploaded to Google Drive, complete the following steps:
- On the Amazon Q Business console, navigate to the data source you created.
- Run an on-demand sync of the data source by choosing Sync now.
- Navigate to the web experience URL in a new private browser window and log in as the first business user.
- Ask Amazon Q a question, such as how many employees work at Amazon.
The source documents should be the Amazon 2020 and 2021 annual reports, assuming the first business user uploaded the Amazon 2020 annual report and the second business user shared the Amazon 2021 annual report with the first business user.
- Navigate to the web experience URL in a new private browser window and log in as the second business user.
- Ask Amazon Q the same question (how many employees work at Amazon).
The source documents should be the Amazon 2021 and 2022 annual reports.
Troubleshooting
In this section, we share some common issues and troubleshooting tips.
IAM Identity Center login error
You might receive an error on the IAM Identity Center login page that says “We couldn’t verify your sign-in credentials.”
To troubleshoot, complete the following steps:
- Confirm that the business users that mirror the Google Workspace users were created in IAM Identity Center.
- If the users exist, navigate to the user in IAM Identity Center and choose Reset password, then select Generate a one-time password and share the password with the user.
A password will be provided for login and the user will be asked to change their password after a successful login.
Google Drive data source crawling or indexing failure
If the Google Drive data source crawling or indexing fails, complete the following steps:
- Confirm the business users provisioned in the Google Workspace are members of the Google groups.
- Inspect the Amazon CloudWatch logs for the last time the Google Drive data source was crawled for users with Google Drive files in the Google Workspace.
- If the crawler didn’t successfully log the indexing of an expected user’s files, check the IAM Identity Center users, then compare the attributes in the Secrets Manager secret to the corresponding Google Workspace attributes, including client ID, service account email, and service account private key.
- Use the Amazon Q Business document-level sync reports to confirm the intended Google Drive documents were indexed by Amazon Q.
Google Drive data source crawling and indexing job doesn’t crawl and index documents
If the Google Drive data source crawling and indexing job doesn’t crawl and index any documents, complete the following steps:
- Confirm the business users provisioned in the Google Workspace are members of the Google groups.
- Confirm there are IAM Identity Center users that mirror the Google Workspace users.
- Confirm both IAM Identity Center users subscribe to Q Business Pro.
- Confirm the Google Workspace admin user has enabled the Google Drive API.
Amazon Q web experience doesn’t return expected answers from the expected source
If the Amazon Q web experience doesn’t return expected answers from the expected source, complete the following steps:
- Upload the expected source document into an Amazon Q Business chat session by choosing the paperclip icon in the Amazon Q chat interface and then choosing the file.
After you upload the document into the session, if the expected answers are generated from the expected document, the document wasn’t successfully indexed from the Google Drive data source.
- If Amazon Q doesn’t return the expected answer for the uploaded document, modify the prompt used to ask the question.
Clean up
To prevent incurring additional costs, it’s essential to clean up and remove any resources created during the implementation of this solution. Specifically, you should delete the Amazon Q application, which will consequently remove the associated index and data connectors. However, any Secrets Manager secrets created during the Amazon Q application setup process need to be removed separately. Failing to clean up these resources may result in ongoing charges, so it’s crucial to take the necessary steps to completely remove all components related to this solution.
Complete the following steps to delete the Amazon Q application, secret, and IAM Identity Center users in your AWS account:
- On the Amazon Q Business console, choose Applications in the navigation pane.
- Select the application that you created and on the Actions menu, choose Delete and confirm the deletion.
- On the Secrets Manager console, choose Secrets in the navigation pane.
- Select the secret that was created for the Google Drive connector and on the Actions menu, choose Delete.
- Specify the waiting period as 7 days and choose Schedule deletion.
- On the IAM Identity Center console, choose Users in the navigation pane.
- Select the two users that you created and choose Delete users to remove these users.
Additionally, you should remove the business users added to your Google Workspace during the implementation of this solution because Google Workspaces costs are billed on a per-user basis.
Conclusion
In this post, you created an Amazon Q application that indexed Google Drive documents using the Google Drive connector. You were able to connect to the Amazon Q conversational interface as each of your business users and ask questions about the documents each user could access in accordance with the ACL.
You can continue to experiment by adding more PDF documents to your business users’ Google Drives and re-syncing your Amazon Q Google Drive data source.
Amazon Q Business offers other connectors, such as for Confluence Cloud. To learn more about the Amazon Q Business Confluence Cloud connector, refer to Connecting Confluence (Cloud) to Amazon Q Business.
About the Authors
Glen Ireland is a Senior Enterprise Account Engineer at AWS in the Worldwide Public Sector. Glen’s areas of focus include empowering customers interested in building generative AI solutions using Amazon Q.
Julia Hu is a Specialist Solutions Architect who helps AWS customers and partners build generative AI solutions using Amazon Q Business on AWS. Julia has over 4 years of experience developing solutions for customers adopting AWS services on the forefront of cloud technology.