[SEO Subhead]
This Guidance demonstrates how to build an internal Software-as-a-Service (SaaS) platform that provides access to foundation models, like those available through Amazon Bedrock, to different business units or teams within your organization. It allows you to centrally manage and govern the use of these artificial intelligence (AI) models while still permitting diverse teams to utilize them. The internal SaaS layer is designed for internal use, not public-facing access, and provides a standardized mechanism for teams to use foundation models. This approach helps teams track their usage in near real-time and the aggregated usage once every day.
Please note: [Disclaimer]
Architecture Diagram

-
Configure a multi-tenant SaaS model for generative AI
-
Track cost and usage for generative AI models
-
Configure a multi-tenant SaaS model for generative AI
-
This architecture diagram shows how to configure an internal Software-as-a-Service (SaaS) model for access to AI models. The next tab shows how model usage and costs can be managed and tracked for each tenant.
Step 1
The Software-as-a-Service (SaaS) AWS Cloud Development Kit (CDK) template deploys all required resources to the AWS account. After deployment, a tenant’s application sends a POST request to Amazon API Gateway, invoking a specific model by passing the model_id as a request parameter and the appropriate model payload in the request body. The tenant can be an individual user, a specific project, a team, or even an entire department within a company. -
Track cost and usage for generative AI models
-
This architecture diagram shows how application inference profiles can be used to track and manage model usage and associated costs for each tenant. Tenants can represent individual users, projects, teams, or departments within an organization.
Step 1
AWS Organizations is used to manage multiple AWS accounts, enabling centralized governance, resource management, and cost allocation across different environments or departments for Amazon Bedrock and application inference profiles. Within this multi-account architecture, a tenant's application sends a POST request to API Gateway. API Gateway first validates the tenant’s API key, then invokes a Lambda authorizer function which can perform custom authentication and authorization logic based on the key and other request details before allowing access to the API endpoint.
Get Started

Deploy this Guidance
Well-Architected Pillars

The AWS Well-Architected Framework helps you understand the pros and cons of the decisions you make when building systems in the cloud. The six pillars of the Framework allow you to learn architectural best practices for designing and operating reliable, secure, efficient, cost-effective, and sustainable systems. Using the AWS Well-Architected Tool, available at no charge in the AWS Management Console, you can review your workloads against these best practices by answering a set of questions for each pillar.
The architecture diagram above is an example of a Solution created with Well-Architected best practices in mind. To be fully Well-Architected, you should follow as many Well-Architected best practices as possible.
-
Operational Excellence
Amazon Bedrock, Lambda, DynamoDB, and Athena are AWS services that help you achieve operational excellence through automated scaling as well as simplified deployment and management, providing a fully managed and scalable database. Together, these services allow the SaaS layer to manage increasing demands, provide comprehensive usage and cost reporting, and be more responsive to changing business requirements.
-
Security
AWS Identity and Access Management (IAM), CloudTrail, the VPC, and the SaaS layer allow you to take advantage of cloud technologies to protect your data, systems, and assets, improving your security posture. Specifically, IAM helps ensure users and services have the necessary permissions; CloudTrail provides a comprehensive audit trail; the VPC enhances security by providing a secure network boundary; and the SaaS offerings provide managed security services. These services collectively help protect this Guidance from potential threats and help protect the confidentiality, integrity, and availability of your data and resources.
-
Reliability
DynamoDB and Amazon S3 provide scalable and durable data storage for the SaaS service, supporting the long-term availability and integrity of critical usage and cost data. Lambda offers serverless, event-driven architectures, automatically scaling to handle increasing workloads and providing reliable and responsive performance. Athena provides serverless, scalable access to analyze data stored in Amazon S3, enabling reliable reporting and insights. Amazon Bedrock abstracts away the complexity of managing the foundation model infrastructure, improving the overall reliability of the SaaS service. Collectively, these AWS services establish a robust and resilient foundation for the internal SaaS service.
-
Performance Efficiency
The internal SaaS service utilizes Lambda, DynamoDB, Athena, and Amazon Bedrock to enhance its performance efficiency. Lambda enables the implementation of serverless, event-driven architectures, automatically scaling compute resources based on demand. DynamoDB, a fully managed NoSQL database service, provides low-latency data storage and retrieval capabilities for the SaaS service's usage and cost data. Athena, a serverless, interactive query service, enables efficient and scalable analysis of the SaaS service's usage and cost data stored in Amazon S3. Amazon Bedrock, a managed service for deploying and running foundation models, optimizes the performance and efficiency of the SaaS service's access to the latest AI models. By using these AWS services, the internal SaaS service can achieve high performance efficiency by taking advantage of serverless architectures and managed database services, allowing the service to scale and perform effectively without the overhead of managing the underlying infrastructure.
-
Cost Optimization
Amazon Bedrock, DynamoDB, Athena, and Amazon S3 collectively optimize costs for the SaaS layer in several ways. Amazon Bedrock is a managed service that simplifies the deployment and running of foundation models. It abstracts away the infrastructure management tasks, allowing the SaaS service to focus on its core functionality without the need to manage the underlying resources. This helps reduce costs associated with infrastructure management and enables the SaaS service to scale efficiently as its usage grows.
DynamoDB is a fully managed NoSQL database service that provides a cost-efficient data storage solution for the SaaS service's usage and cost data. The scalability and automatic provisioning capabilities of DynamoDB help ensure that the SaaS service only incurs charges for the resources it actually consumes, minimizing over-provisioning and reducing storage costs.
Athena is a serverless, interactive query service that enables efficient and cost-effective analysis of the SaaS service's usage and cost data stored in Amazon S3. The pay-per-query pricing model of Athena allows the SaaS service to run ad-hoc queries and generate reports without incurring the overhead of managing a separate data warehouse. This helps reduce the costs associated with data storage and analysis.
Lastly, Amazon S3 is a scalable and durable object storage service that provides a cost-optimized way for storing the SaaS service's usage and cost data, as well as other operational logs and artifacts. The flexible pricing options of Amazon S3 and the ability to implement lifecycle policies help minimize storage costs by automatically scaling resources based on usage patterns and deleting unused data.
-
Sustainability
The services selected for this Guidance contribute to sustainability by providing a foundation for building and operating environmentally friendly and resource-efficient applications. Specifically, Amazon Bedrock enables sustainable AI-driven automation by optimizing resource utilization and minimizing energy consumption through efficient model inference and workload orchestration. AWS CloudFormation promotes sustainability by automating infrastructure provisioning and management, reducing the need for manual intervention and minimizing resource wastage. DynamoDB offers a serverless and highly scalable database solution, allowing users to optimize resource utilization and minimize energy consumption by automatically scaling resources based on demand. Lambda supports sustainability by providing a serverless compute environment that dynamically allocates resources in response to workload demands, eliminating the need for idle servers and reducing overall energy consumption. And finally, Amazon S3 promotes sustainability by offering durable and scalable storage, optimizing resource utilization, and minimizing energy consumption through efficient data storage and retrieval mechanisms.
Related Content

Build an internal SaaS service with cost and usage tracking for foundation models on Amazon Bedrock
Build a multi-tenant generative AI environment for your enterprise on AWS
Disclaimer
The sample code; software libraries; command line tools; proofs of concept; templates; or other related technology (including any of the foregoing that are provided by our personnel) is provided to you as AWS Content under the AWS Customer Agreement, or the relevant written agreement between you and AWS (whichever applies). You should not use this AWS Content in your production accounts, or on production or other critical data. You are responsible for testing, securing, and optimizing the AWS Content, such as sample code, as appropriate for production grade use based on your specific quality control practices and standards. Deploying AWS Content may incur AWS charges for creating or using AWS chargeable resources, such as running Amazon EC2 instances or using Amazon S3 storage.
References to third-party services or organizations in this Guidance do not imply an endorsement, sponsorship, or affiliation between Amazon or AWS and the third party. Guidance from AWS is a technical starting point, and you can customize your integration with third-party services when you deploy the architecture.