Guidance for a Multi-Tenant, Generative AI Gateway with Cost and Usage Tracking on AWS
Overview
Please note: [Disclaimer]
How it works
Configure a multi-tenant SaaS model for generative AI
This architecture diagram shows how to configure an internal Software-as-a-Service (SaaS) model for access to AI models. The next tab shows how model usage and costs can be managed and tracked for each tenant.

Track cost and usage for generative AI models
This architecture diagram shows how application inference profiles can be used to track and manage model usage and associated costs for each tenant. Tenants can represent individual users, projects, teams, or departments within an organization.

Deploy with confidence
Ready to deploy? Review the sample code on GitHub for detailed deployment instructions to deploy as-is or customize to fit your needs.
Well-Architected Pillars
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as amany Well-Architected best practices as possible.
Operational Excellence
Amazon Bedrock, Lambda, DynamoDB, and Athena are AWS services that help you achieve operational excellence through automated scaling as well as simplified deployment and management, providing a fully managed and scalable database. Together, these services allow the SaaS layer to manage increasing demands, provide comprehensive usage and cost reporting, and be more responsive to changing business requirements.
Security
AWS Identity and Access Management (IAM), CloudTrail, the VPC, and the SaaS layer allow you to take advantage of cloud technologies to protect your data, systems, and assets, improving your security posture. Specifically, IAM helps ensure users and services have the necessary permissions; CloudTrail provides a comprehensive audit trail; the VPC enhances security by providing a secure network boundary; and the SaaS offerings provide managed security services. These services collectively help protect this Guidance from potential threats and help protect the confidentiality, integrity, and availability of your data and resources.
Reliability
DynamoDB and Amazon S3 provide scalable and durable data storage for the SaaS service, supporting the long-term availability and integrity of critical usage and cost data. Lambda offers serverless, event-driven architectures, automatically scaling to handle increasing workloads and providing reliable and responsive performance. Athena provides serverless, scalable access to analyze data stored in Amazon S3, enabling reliable reporting and insights. Amazon Bedrock abstracts away the complexity of managing the foundation model infrastructure, improving the overall reliability of the SaaS service. Collectively, these AWS services establish a robust and resilient foundation for the internal SaaS service.
Performance Efficiency
The internal SaaS service utilizes Lambda, DynamoDB, Athena, and Amazon Bedrock to enhance its performance efficiency. Lambda enables the implementation of serverless, event-driven architectures, automatically scaling compute resources based on demand. DynamoDB, a fully managed NoSQL database service, provides low-latency data storage and retrieval capabilities for the SaaS service's usage and cost data. Athena, a serverless, interactive query service, enables efficient and scalable analysis of the SaaS service's usage and cost data stored in Amazon S3. Amazon Bedrock, a managed service for deploying and running foundation models, optimizes the performance and efficiency of the SaaS service's access to the latest AI models. By using these AWS services, the internal SaaS service can achieve high performance efficiency by taking advantage of serverless architectures and managed database services, allowing the service to scale and perform effectively without the overhead of managing the underlying infrastructure.
Cost Optimization
Amazon Bedrock, DynamoDB, Athena, and Amazon S3 collectively optimize costs for the SaaS layer in several ways. Amazon Bedrock is a managed service that simplifies the deployment and running of foundation models. It abstracts away the infrastructure management tasks, allowing the SaaS service to focus on its core functionality without the need to manage the underlying resources. This helps reduce costs associated with infrastructure management and enables the SaaS service to scale efficiently as its usage grows.
DynamoDB is a fully managed NoSQL database service that provides a cost-efficient data storage solution for the SaaS service's usage and cost data. The scalability and automatic provisioning capabilities of DynamoDB help ensure that the SaaS service only incurs charges for the resources it actually consumes, minimizing over-provisioning and reducing storage costs.
Athena is a serverless, interactive query service that enables efficient and cost-effective analysis of the SaaS service's usage and cost data stored in Amazon S3. The pay-per-query pricing model of Athena allows the SaaS service to run ad-hoc queries and generate reports without incurring the overhead of managing a separate data warehouse. This helps reduce the costs associated with data storage and analysis.
Lastly, Amazon S3 is a scalable and durable object storage service that provides a cost-optimized way for storing the SaaS service's usage and cost data, as well as other operational logs and artifacts. The flexible pricing options of Amazon S3 and the ability to implement lifecycle policies help minimize storage costs by automatically scaling resources based on usage patterns and deleting unused data.
Sustainability
The services selected for this Guidance contribute to sustainability by providing a foundation for building and operating environmentally friendly and resource-efficient applications. Specifically, Amazon Bedrock enables sustainable AI-driven automation by optimizing resource utilization and minimizing energy consumption through efficient model inference and workload orchestration. AWS CloudFormation promotes sustainability by automating infrastructure provisioning and management, reducing the need for manual intervention and minimizing resource wastage. DynamoDB offers a serverless and highly scalable database solution, allowing users to optimize resource utilization and minimize energy consumption by automatically scaling resources based on demand. Lambda supports sustainability by providing a serverless compute environment that dynamically allocates resources in response to workload demands, eliminating the need for idle servers and reducing overall energy consumption. And finally, Amazon S3 promotes sustainability by offering durable and scalable storage, optimizing resource utilization, and minimizing energy consumption through efficient data storage and retrieval mechanisms.
Related Content
Disclaimer
Did you find what you were looking for today?
Let us know so we can improve the quality of the content on our pages