AWS Marketplace

Transform enterprise search and knowledge discovery with Glean and Amazon Bedrock

Enterprise search and knowledge discovery are critical to the success of any business and become particularly challenging as businesses grow. Enterprise data may spread across a wide range of sources. These data sources frequently include chat applications like Slack or Microsoft Teams, storage solutions such as Google Drive or Microsoft OneDrive, and enterprise solutions like Atlassian Jira, Confluence, SalesForce, or WorkDay. Increasingly, businesses have the need to unlock and gain insights not just from a single source, but tie all these sources together.

Creating an enterprise-ready search and knowledge discovery experience for your employees requires a robust solution that will provide precise, permissions-aware, and personalized answers. When implementing such solutions, organizations often encounter challenges in establishing connections to diverse enterprise data sources, as well as efficiently indexing and organizing this data for search, ranking, and knowledge discovery.

In this blog post, we introduce you to Glean – an enterprise-ready search and knowledge discovery solution that’s tailor-made for the enterprise workplace. Glean has been adopted by leading enterprise customers, including Databricks, Okta, and Grammarly, to solve their internal search and knowledge discovery needs. Now available in AWS Marketplace, Glean uses powerful large language models (LLMs) hosted by Amazon Bedrock to deliver generative AI solutions to the millions of customers building on AWS.

Solution overview

Glean is easy to set up and connects to over 100 different data sources. You download an AWS CloudFormation template from AWS Marketplace listing. The template will set up the bootstrap environment in your AWS account that allows Glean Central to have very limited access to your AWS account. Glean Central is a service orchestrating the deployment of the Glean instance in each customer’s AWS account.

After the template is installed, Glean Central will invoke an AWS Lambda function that is used to set up all of the resources needed by Glean. Glean Central will trigger this AWS Lambda function to launch AWS CodeBuild projects that set up additional resources in your AWS account. The CodeBuild projects will launch an Amazon Elastic Kubernetes Service (Amazon EKS) cluster for the Glean server, as well as Internet Gateway, Network Address Translation Gateway (NAT Gateway), AWS Web Application Firewall (WAF), Elastic Load Balancing (ELB), Amazon Relational Database Service (RDS), and Amazon S3. The CodeBuild projects will also fetch and deploy container images to the EKS cluster.

The following reference architecture illustrates how these AWS services work together with Glean.

Glean delivers enterprise search through vector embeddings and the knowledge graph stored in the search index and Amazon RDS databases. Crawlers and connectors enable Glean to connect to a diverse set of data sources that house your organization’s data and knowledge. Once a data source is added to Glean, the crawlers and connectors pull the data and store it in Amazon RDS and Amazon S3 that are launched in your own account. The data is encrypted in transit and at rest. The data is then converted into vector embeddings by models trained on Amazon SageMaker using an Apache Flink data processing pipeline.

When you send a search query to Glean, the query engine breaks down the query and searches the search index. Once receiving the search results, the query engine ranks and returns the search results. When you query Glean Chat, the query engine further sends the search results along with the query to the LLM on Amazon Bedrock, and returns the answer. Glean Chat is Glean’s generative AI solution, which will be discussed in detail later in this post.

Glean’s knowledge graph

Glean’s knowledge graph understands all the contents, people, and activity in your organization. The following diagram shows an example of Glean’s knowledge graph personalized for a user in order to deliver maximum relevancy. Through the personalized knowledge graph, every search result can be naturally tuned for relevancy. It considers factors such as the popularity of certain files among users in certain teams, recency of creation or alteration, along with user behavior patterns and signals to help reduce your chances of repeated searches and queries.

Glean Chat

Glean Chat is Glean’s enterprise-ready generative AI solution – a Retrieval Augmented Generation (RAG) chatbot that integrates Glean’s core search function with LLMs on Amazon Bedrock. RAG reduces the risk of hallucinations and ensures answers generated by AI are updated and grounded in your company’s data. You can generate actionable insights, new contents, and precise answers for everyday work without concern for the legitimacy and accuracy of the results.

The following screenshot shows a Q&A example of Glean Chat. When asked, “What’s the status of the citations project in Glean Chat?” Glean searched the latest and most relevant data from Slack conversations, as well as Pull Requests, or code changes, from GitHub.

Solution walkthrough: Transform enterprise search and knowledge discovery with Glean and Amazon Bedrock

Prerequisites

You need an empty AWS account to install the CloudFormation bootstrap template to set up Glean.

You need to contact Glean sales through AWS Marketplace. Once agreed on pricing, you download and install Glean’s CloudFormation bootstrap template. Glean Central will then deploy all necessary infrastructure for the system. You will receive an email from Glean when your instance is ready to use.

Sign in to Glean

After completing the deployment, you sign in to the Glean application and start to configure your Glean workspace. A usual first step would be to connect to your preferred single sign-on (SSO) provider if you use one.

Connect Glean to a data source

You now are able to connect to over 100 enterprise applications without any engineering help or requirements. We show you how to connect Glean to your organization’s Slack instance.

Glean requires authentication to the Slack instance in order to fetch relevant information from Slack. Follow the authorized link in the workspace setup and follow the on-screen instructions. The following screenshot displays the page for setup and authentication.

Using Glean Chat

We show how Glean Chat helps employees with different roles in their everyday work. With related references and regularly updated data sources, employees can work without worrying about the legitimacy and accuracy of the generated results.

The following screenshot shows the conversation between an engineer and Glean Chat. In response to the engineer’s question, “What’s the status of this week’s backend release?” Glean crafts an answer based on the appropriate Slack conversations, a Google Doc of release schedules, GitHub pull requests, and an Atlassian Confluence page on the release process. It includes hyperlinks to its sources.

The following screenshot shows another example. Someone from your customer support team asks Glean Chat to write up a bug report for a new issue. Glean quickly generates a bug report that pulls in discussions on the issue from the support team’s Slack channel and a ZenDesk ticket that is created during a live call. Again, it includes hyperlinks to its references.

In our last example use case, picture a sales manager asking Glean Chat about the latest status of a large sales deal. Glean generates a quick summary showing where we are with that account and the key blockers that are holding it up.

Clean up

You can go to the AWS CloudFormation console and delete the stack created by Glean. Note this stack only comprises the bootstrapping resources needed by Glean Central to set up the account and deploy software upgrades. You can delete the other resources created by Glean (e.g. Amazon EKS cluster, VPC resources, etc.) as needed. Consider backing up your data in Amazon RDS and Amazon S3 if necessary.

Responsible and dependable AI

Glean utilizes Amazon Bedrock, a fully managed service that enables AWS customers to easily build generative AI applications by accessing foundation models through API calls. This reduces concerns about provisioning and maintaining infrastructure. Amazon Bedrock allows you to easily customize models with fine-tuning, native support, or RAG, while maintaining the privacy and security of your data both in transit and at rest.

Furthermore, Amazon Bedrock aligns with Glean’s continued commitment to responsible AI–enabling users worldwide to take advantage of the best of artificial intelligence while curbing hallucinations and ensuring strict data security.

Conclusion

Glean and Amazon Bedrock provides a simple way to integrate responsible and dependable generative AI into the workplace. Glean’s out-of-the-box capabilities can help your teams improve communication and productivity in the digital workplace by ensuring employees have all the right information–at all times, across all applications.

To get started with Glean, visit AWS Marketplace. To learn more about how Glean operates and is implemented throughout your organization, refer to our eBook, How Glean search works.

About the authors

Arvind Jain

Arvind Jain is Glean’s CEO. He founded Glean to make it easy for people to find the information they need to be more productive and happier at work. Prior to Glean, Arvind co-founded Rubrik, one of the fastest growing companies in cloud data management, and served as a Distinguished Engineer at Google, where he spent over a decade leading various teams in Search, Maps, and YouTube.

Connor Lafferty

Connor is a software engineer at Glean focusing on platform infrastructure.

Stephen Chu

Stephen is an engineer manager working on infrastructure at Glean.

Peter Kim

Peter is the content marketing manager at Glean.

Qiong Zhang

Qiong (Jo) Zhang is a Senior Partner Solutions Architect at Amazon Web Services (AWS), specializing in AI/ML. She received her Ph. D. Degree in Computer Science from The University of Texas at Dallas. She holds 30+ patents and has co-authored 100+ journal/conference papers. She is also the recipient of the Best Paper Award at IEEE NetSoft 2016, IEEE ICC 2011, ONDM 2010, and IEEE GLOBECOM 2005.

Cole Calistra

Cole Calistra is a Principal Startup Solutions Architect at Amazon Web Services (AWS), leveraging over two decades of experience as a tech leader and entrepreneur. He regularly partners with AI/ML startups to help them build, deploy and scale innovative solutions on AWS, drawing on his expertise in cloud architecture and artificial intelligence. Prior to AWS, Cole has served as CTO of a venture-backed health startup, CTO and co-founder of an AI/ML startup, and held Director level Architecture roles at Fortune 500 retailers.

Shane Thompson

Shane Thompson is a Sr. Machine Learning Solutions architect working in the Generative AI Startups team at Amazon Web Services (AWS). He specializes in leveraging AI and ML to drive innovation and develop solutions on AWS. With over 19 years of experience in the technology field he brings a diverse perspective to his role, working on problems ranging from designing complex systems for SaaS organizations to deploying ML algorithms to optimize micro mobility scooters in cities. In his free time, Shane loves to spend time with his family and travel around the world.