AWS Cloud Operations Blog
Using Amazon Q Business to streamline your operations
Amazon Q, is a new generative artificial intelligence- (AI)-powered assistant designed for work that can be tailored to your business. You can use Amazon Q to have conversations, solve problems, generate content, gain insights, and take action by connecting to your company’s information repositories, code, data, and enterprise systems. Amazon Q provides immediate, relevant information and advice to employees to streamline tasks, accelerate decision-making and problem-solving, and help spark creativity and innovation at work.
In this blog post, we will show you how Amazon Q can be applied to enable your operational team during an issue or an outage. In an organization there are both internal and external applications that run on different services with multiple dependencies.The application/DevOps team creates runbooks that outline details about the application, its dependencies, and information that helps the operations team understand, troubleshoot, and resolve issues. The primary goal of the operations team is to expedite application recovery. Recovery steps could include identifying the key infrastructure components of the application, perform troubleshooting, failing over to a disaster recovery environment and escalating to application owners. All the required actions should be performed as quickly as possible to reduce the MTTI (Mean Time To Identify) and MTTR (Mean Time To Recovery).
Amazon Q offers user-based plans, so you get features, pricing, and options tailored to how you use the product. Amazon Q can adapt its interactions to each individual user based on the existing identities, roles, and permissions of your business. AWS never uses customer content from Amazon Q to train the underlying models. In other words, your company information remains secure and private.
Customers can use Amazon Q Business to build an application that can help reduce MTTI and MTTR. This application can be connected to a centralized Amazon S3 bucket containing the application runbooks. It can also be connected to AWS documentation for services used by the applications to assist further in understanding and resolving issues.
Sample application
For the blog post, we will be using a PetAdoption application that is available on GitHub. It is built using a microservice architecture, and different components of the application are deployed on various services, such as Amazon Elastic Kubernetes Service, Amazon Elastic Container Service, AWS Lambda, Amazon API Gateway, Amazon DynamoDB, Amazon Simple Queue Service, Amazon Simple Notification Service, and AWS Step Functions. The application architecture is shown in the following diagram.
Building the Amazon Q Business Application
Prerequisites
- AWS IAM Identity Center as the SAML 2.0-compliant identity provider (IdP). Please ensure that you have Enabled an IAM Identity Center instance, provisioned at least one user, and provided each user with a valid email address. For more details, see Configure user access with the default IAM Identity Center directory.
- Amazon S3 bucket that will act as a central repository to store your Application runbooks.
- Let’s upload the sample runbook to the S3 bucket. The sample runbook captures known issues, and other application information required to help the operations team with triage and escalation. You can choose to do this using AWS CloudShell and the commands listed below or by manually copying and uploading the runbook into the S3 bucket using the S3 service console.
cat << EOF > petadoption-runbook.doc
Application Name: PetAdoptions Production Application
Application Description: This Application enables people to easily adopt pets, It is a digital marketplace of over 10,000 animal shelters and rescue groups across the US, Canada, and Mexico. As the leading pet adoption platform, This application has helped millions of pets find their forever homes.
Account info: 111111111111 (Primary Account), 222222222222 (DR Account)
Application Owners:
Development Lead: Puneeth Komaragiri
Development Manager: Vikram Venkataraman
Program/Product Manager: Puneeth Komaragiri
On-call Alias: petadoptionsoncall@example.com
Severity: Critical
Public Facing: Yes
DR/Backup Environment: Yes; (us-west-2 is the backup/DR for us-east-1 env)
Core AWS Services used:
* S3
* EKS
* Dynamo-Db
* ELB
* SQS
* SNS
* CloudWatch
* CloudFront
Regions: us-east-1 (Northern Virginia) & us-west-2 (Oregon)
Core infrastructure components:
Dynamo-DB Global Table : pet-adoption-table
EKS Clusters: PetSite-FrontEnd,PetSearch-API, PetListAdoptions-API,PayForAdoptions-API, PetAdoptionStatusUpdater, PetAdoptionsHistory-API
Lambda Functions: PriceLessThan55, PriceGreaterThan55,PetAdoptionStatusUpdater
S3 Bucket: petadoptionss3bucket
Previously Occurred/ Known Issues & Fixes:
In very rare cases, you might encounter a behavior where the site does not show any pet images. Click on Perform Housekeeping in the PetSite home page upper right corner.
Failing over to DR Region:
Description: The RTO (Recovery Time Objective) & RPO(Recovery Point Objective) Requirement for this application is 45mins. The Application needs to be failed-over to the active DR region in us-west-2 in case the outage lasts more than 30mins. The Application will be failed back after 24 hours of observing the primary region.
Procedure: To failover to the DR region, The User will need to run the “DR-FAILOVER“ workflow from the Central-DevOps-Account.
Troubleshooting:
* Is there an AWS Outage?
* Check https://health.aws.amazon.com/health/status for AWS service health.
* How to reach AWS?
* If Application is Down for customers, Cut an AWS Support ticket using link https://console.aws.amazon.com/support
* See https://docs.aws.amazon.com/awssupport/latest/user/case-management.html for support case severity
* Always open a Phone/Chat case for high, Urgent & Critical severity cases
* Reach out to the Account Team Alias Email (sampleaccountteamemail@amazon.com) for additional help.
* How to escalate to the Application team?
* Please reach out to the on-call via phone/email oncallpetadoptionapplication@example.com
---
EOF
aws s3 cp petadoption-runbook.doc s3://<Your S3 Bucket used as Datasource>/
Creating the Amazon Q Business application
First, let’s create a new application in the Amazon Q console and name it petadoptions-ops-app
. For the access management prompt, let’s choose the recommended path which is using IAM Identity Center
In the next step, we will choose the retriever
for the Amazon Q application. We will use native retriever
which creates an Amazon Q Business index that can connect to the Amazon Q Business supported data sources that you choose.
In the next step, we will be adding data sources
that contain relevant data required for this use-case. For this blog post, we will be using two data sources: Amazon S3
bucket and Web crawler
. Let’s first add the Web crawler
data source. We will name the data source as aws-core-services-crawlers
and will add the URLs listed below.
Sample AWS documentation links:
[EKS Best Practices] : https://aws.github.io/aws-eks-best-practices/
[EKS Knowledge Center Articles ] : https://repost.aws/knowledge-center/all?view=all&search=EKS&sort=recent
[Load Balancer Troubleshooting] : https://docs.aws.amazon.com/elasticloadbalancing/latest/application/load-balancer-troubleshooting.html
[EC2 Troubleshooting] : https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/ec2-instance-troubleshoot.html
[Lambda Troubleshooting] : https://docs.aws.amazon.com/lambda/latest/dg/lambda-troubleshooting.html
[S3 Troubleshooting] : https://docs.aws.amazon.com/AmazonS3/latest/userguide/troubleshooting.html
[DynamoDB Troubleshooting] : https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Troubleshooting.html
[AWS Support Case] : https://docs.aws.amazon.com/awssupport/latest/user/case-management.html
Other options can be left as default. Please choose the Create a new service role
option from the dropdown for the IAM role as shown in the screenshot below.
For the sync run schedule, you can choose the frequency depending on the rate at which the data changes.
Next, let’s add the Amazon S3 bucket which was identified in the prerequisites as a data source. Please choose the Create a new service role
option from the dropdown for the IAM role as shown in the screenshot below.
In the sync scope, specify the S3 bucket that was created as part of the prerequisites. For the sync run schedule, you can choose the frequency depending on the rate at which the data changes.
Once you have added both the data sources, click on ‘Next’.
In this step, we will add users and groups from your IAM Identity Center directory. Let’s click on Add Users
and select Assign existing users and groups
Now, let’s search and select for the User/Group name that was created as part of the prerequisites.
Click on Create Application
.
You should see the application status as Created successfully
with the Web experience URL.
Now, click into the newly created application and select each of the data sources and click on the Sync now
button to initiate the data sync. This Data Sync might take a few minutes. The data sources post syncing should have a Completed
sync status like below :
Accessing the Amazon Q Application’s web experience endpoint
In the next steps, we will be interacting with the interface of petadoption-ops-app
application to get insights into the PetAdoption application.
Click the Web experience settings tab in the Amazon Q application console to copy the deployed URL of the application.
Use your browser to access the Deployed URL, It should take you to the IAM Identity center for authentication. Post authentication, you should see the user interface of the Amazon Q application that looks like the screenshot below:
Let’s see the petadoption-ops-app
in action. Let’s assume you are a new member of the SRE team and you are supporting the PetAdoption
application. Let’s interact with the application to get an overview of the application.
The petadoption-ops-app
was able to crawl through the data sources and provide a quick summary of the PetAdoption application.
Next, let’s say you are seeing errors specific to Amazon EKS services and would like to know the services that leverage EKS services and the respective contacts of the application.
Now that we have the necessary information, let’s share the error messages we see to get the root cause of the issue.
petadoption-ops-app
was able to provide insight into the potential root cause of the issue and the metrics that need to be captured to monitor the throttling on the API Server.
Let’s say the PetAdoption
application is down and you have to switch the application to your DR site. You are not sure of the process. Let’s try this scenario with petadoption-ops-app
Finally, Let’s ask the petadoption-ops-app
on details pertaining to opening an AWS Support Case.
Conclusion
This purpose of this blog post is to implore you to think about different ways you can use Amazon Q to enable your teams to operate more effectively. In this instance, the Amazon Q Business application can can further be enhanced by connecting it to more data sources, like your content repositories, business applications and collaboration tools. You can also leverage the same application for change management where operational teams often rely on runbooks for executing specific procedures. You can learn more about Amazon Q using the links below.
Learn more
Amazon Q main product page
Amazon Q details for IT pros and developers
Get started with Amazon Q
Read more about Amazon Q
Introducing Amazon Q, a new generative AI-powered assistant (preview)
Improve developer productivity with generative-AI powered Amazon Q in Amazon CodeCatalyst (preview)
Upgrade your Java applications with Amazon Q Code Transformation (preview)
New generative AI features in Amazon Connect, including Amazon Q, facilitate improved contact center service
New Amazon Q in QuickSight uses generative AI assistance for quicker, easier data insights (preview)
Amazon Q brings generative AI-powered assistance to IT pros and developers (preview)