AWS Business Intelligence Blog
Monitor and optimize your Amazon Bedrock usage with Amazon Athena and Amazon QuickSight
Amazon Bedrock assists in generative AI application development by providing API access to foundation models (FMs). Using Amazon Bedrock, you can improve the performance of your applications by tailoring the prompt or switching between FMs. These adjustments, while potentially improving output quality, can impact both latency and cost.
Monitoring these crucial metrics is essential for maintaining an optimal end user experience and detecting changes early. For instance, while modifying a prompt might boost the output accuracy of the application, it might simultaneously increase the response time and the operational costs. If these metrics are monitored, you can detect the impact of the changes and respond quickly.
In this post, I will show you how to use Amazon Bedrock model invocation logs to enhance the observability of model usage by using Amazon Athena for efficient log querying and Amazon QuickSight for insightful visualizations.
Solution overview
The solution uses QuickSight for analyzing and visualizing Amazon Bedrock model invocation logs to get insights on metrics such as latency and token count. The services used in this solution include:
- Amazon Bedrock: A fully managed, serverless service that provides access to multiple high-performing FMs from leading AI companies through a single API. It offers tools for building generative AI applications with security, privacy, and responsible AI features.
- Amazon Simple Storage Service (Amazon S3): An object storage service that offers industry-leading scalability, data availability, security, and performance.
- Amazon Athena: An interactive query service that enables users to analyze data stored in Amazon S3 using standard SQL, without the need for complex extract, transform, and load (ETL) processes.
- Amazon QuickSight: A comprehensive business intelligence (BI) service that enables organizations to unify their data analysis and visualization capabilities at scale.
The following figure shows the high-level Bedrock log monitoring flow that is explained in this post.
The process shown in the figure is:
- Users invoke models in Amazon Bedrock
- The invocations are written into an S3 bucket by Amazon Bedrock
- The logs are queried by Athena
- The invocation data is visualized by QuickSight
Solution walkthrough
In this post, you will walk through the following steps:
- Configure Amazon Bedrock for logging to an S3 bucket
- Create an Athena table for querying the logs
- Create a QuickSight analysis
Prerequisites
For this walkthrough, you need to have the following prerequisites in place:
- An AWS account
- At least one FM with text modality enabled. In this post, I will use Anthropic’s Claude v3 Sonnet.
- A QuickSight account enabled with Admin permissions
- Create an S3 bucket that you will be using in the next step.
Configure Amazon Bedrock for logging to an S3 bucket
To put logs into an S3 bucket, enable the S3 logging feature within Amazon Bedrock.
- Go to the AWS Management Console for Amazon Bedrock and choose Settings from the navigation pane.
- In the settings page, turn on Model invocation logging, select Text as the data type to include with the logs, select S3 only as the logging destination, and enter or select an S3 bucket as the logging destination.
- After enabling the logging, go to the Chat/text playground from the left panel. Select a model and run a few prompts.
- Now, view the S3 bucket configured for Amazon Bedrock logging in the console. You can find some log files in JSON format in the bucket. Log files are typically under an S3 bucket prefix (folder) named
AWSLogs/<your-aws-account-id>/BedrockModelInvocationLogs/<aws-region>/year/month/date/hour/<log-file-name>.json.gz
Log creation might take around a minute.
An invocation log file looks like the following JSON document. To be able to get inputBodyJson and outputBodyJson fields in the log data as shown in the following document, the Text data type must be selected when configuring log settings in the Bedrock console. Note that this is an example for Anthropic’s Claude v3 Sonnet. Some fields might be different in other models.
Create an Athena table for querying the logs
In this part, I show you how to use Amazon Athena to query the logs created by Amazon Bedrock.
First, go to the Athena console and choose Query editor from the navigation pane. Then, follow these steps by copying and pasting the queries to Query editor and choosing Run:
- Create a database in Athena: The following query creates a database in Athena. If it’s your first time using Athena, follow the Create a database documentation for the necessary settings and database creation.
- Create a table in the database: This query uses the Amazon Bedrock invocation logging JSON format to create a table. <bucket-created-for-S3-logging> is the S3 bucket that was configured as the logging destination in the Bedrock settings, <account-id> is the AWS account ID of the account, and <region> is the AWS Region name where the Amazon Bedrock logging is configured. The full Amazon S3 path of the logging data can be retrieved from the S3 console. See Create a table for more information about creating a table.
The query partitions the data based on the datehour format used by Amazon Bedrock when creating the logs. Partitioning restricts the amount of data scanned by each query according to the chosen time interval. See Partition your data for more information about Athena data partitioning.
- Test the query: The following SQL query is used for querying Amazon Bedrock invocations with some insights such as timestamp, model ID of the used model, latency in milliseconds, and input and output token counts starting from a specific time.
The query result looks like the following figure and includes fields such as timestamp, model ID, ARN, latency, and input and output tokens (some fields have been removed from the query for better visualization):
Visualizing the log data in QuickSight
The next step is to use Amazon QuickSight to visualize the log data from Athena.
- First, authorize QuickSight to connect to Athena as shown in Authorizing connections to Amazon Athena.
- After the authorization, go to the QuickSight console.
- Choose Datasets and then choose New dataset.
- In the Create a Dataset page, select Athena.
- Enter a Data source name such as bedrock_logs_dataset and leave the workgroup as [primary]. Then choose Create data source.
- In the Choose your table dialog, select Use custom SQL query.
- Enter a name for the query such as bedrock-logs-query, copy the following SQL query and paste it into the query field, then choose Confirm query.
- In the Finish dataset creation dialog, select Import to SPICE for quicker analytics and choose Visualize. A dataset will be created and a visualization will be shown in the Analyses page of the QuickSight console.
- In the data visualization view, select Line chart as the visualization tool, add timestamp to the X axis and latencyms to the Value axis. By default, analysis sets latencyms to Count and timestamp to Day. You can change these settings by choosing the three dots next to each value and modifying Aggregate In the following image, the timestamp is set to Minute and latencyms is set to Average. You can run multiple Amazon Bedrock invocations in different times to see a more meaningful time series graph.
- After setting up the analysis, you can choose PUBLISH to publish it as a QuickSight dashboard.
Using the insights
As can be observed in the Amazon Bedrock invocation log sample JSON document that I shared previously, you can get many insights, such as:
- Input, output, and total token usage: You can use the input and output token usage to calculate the cost of the invocation. See the Amazon Bedrock pricing page for detailed pricing information based on the token utilization. For example, an SQL query can be formed to get the average token usage and calculate the average price of the invocation.
- Invocation latency in milliseconds: The latency gives the time it took to generate the response. You can monitor this metric to keep track of the user experience.
- Model ID and inference config such as temperature, topP, and topK: These parameters can be correlated to the latency and token usage. You can monitor these metrics continuously to help improve the user experience and optimize the cost of the application.
- Identity of the caller: The identity of the caller is the AWS Identity and Access Management (IAM) role used for the invocation. In a large language model (LLM) pipeline where multiple AWS Lambda functions are used, you can calculate the metrics for each Lambda function separately. For example, in intelligent document processing (IDP) applications, you might want to invoke a separate LLM for document classification, optical character recognition (OCR), data extraction, and summarization processes. If each of the Lambda functions uses a separate IAM role as a recommended best practice, you can use the identity field to determine which Lambda function invoked the LLM. This helps you calculate the metrics for each of these processes separately.
Using Athena federated queries for enhancing the insights
Athena helps you to query data in S3 buckets without extract, transform, and load (ETL) processes. The Federated Query feature of Athena helps you query other data sources including Amazon DynamoDB, PostgreSQL, Amazon OpenSearch, Amazon Neptune, and many others. You can combine Amazon Bedrock logs with data from other sources using federated queries. For example, you can combine Amazon Bedrock logs data such as prompt, token count, latency, and inference configuration parameters with the user feedback data stored in another data source. This might help you to measure the user satisfaction while optimizing the cost of the solution by evaluating different prompts.
A sample QuickSight dashboard
In the following image, you can see a sample dashboard that gives insights into the cost, token usage, and feedback in a single view. The sample application processes four types of documents that users upload to the system. The dashboard gives details for each type of document separately by using the IAM role Amazon Resource Name (ARN) of the Lambda functions.
The dashboard shows that the feedback from the users improved, whereas the average token usage and cost increased over time. This is because of the improvements that were made to the prompts to improve accuracy.
Clean up
To avoid incurring future charges:
- Delete the logs created by Amazon Bedrock from the S3 bucket used for logging.
- Disable the model invocation logging from the Amazon Bedrock console.
Conclusion
In this post, I showed you how to enhance observability of Amazon Bedrock usage by using model invocation logs, Amazon Athena, and Amazon QuickSight. You walked through configuring Amazon Bedrock logging, querying logs with Athena, and creating insightful visualizations in QuickSight. By using this approach, you can monitor crucial metrics such as latency, token usage, and costs, allowing you to optimize your generative AI applications’ performance and user experience.
You can use this solution to gain valuable insights into your Amazon Bedrock model usage patterns, identify areas for improvement, and make data-driven decisions to enhance your AI-powered applications. As you continue to develop and refine your generative AI solutions, maintaining strong observability will be key to ensuring optimal performance and cost-efficiency.
You can read more about monitoring Amazon Bedrock in Monitor the health and performance of Amazon Bedrock. I also recommend the Monitoring Generative AI applications using Amazon Bedrock and Amazon CloudWatch integration post to learn about using Amazon CloudWatch for Amazon Bedrock logging as an alternative to Amazon S3 logging. You can use CloudWatch for an off-the-shelf dashboard or use S3 with Athena and QuickSight for more complex use cases such as custom dashboards or enrichment of the data with other data sources.
About the Author
Ozan Cihangir is a Prototyping Engineer at AWS. He helps customers to build innovative solutions for their emerging technology projects in the cloud.