Partitioning and Isolating Multi-Tenant SaaS Data with Amazon S3
By Kevin Hakanson, Sr. Solutions Architect – AWS World Wide Public Sector
By Tod Golding, Principal Partner Solutions Architect – AWS SaaS Factory
Many software-as-a-service (SaaS) applications store multi-tenant data with Amazon Simple Storage Service (Amazon S3). Landing multi-tenant data in Amazon S3 requires you to think about how tenant data will be distributed across buckets and keys without undermining the security, manageability, and performance of your SaaS solution.
In this post, we’ll explore the various strategies that can be applied when partitioning tenant data with Amazon S3. We’ll highlight the considerations that may influence how and when you apply these mechanisms in your own solution, and we’ll look at how this influences tenant isolation and the accessibility of S3 objects.
Amazon S3 Data Partitioning
As you look at different architecture patterns for representing multi-tenant data, you must make choices about how that data is organized. These multi-tenant storage mechanisms and patterns are typically referred to as data partitioning.
Each Amazon Web Services (AWS) storage technology typically has its own unique collection of data partitioning models. This applies to Amazon S3 when looking at how tenant objects can be organized to support the various needs of your solution.
It’s important to note the pattern you pick is influenced by a number of factors. Your anticipated total tenant count, the isolation model of your tenant environments, and your application access patterns are amongst the considerations that can influence the option you select.
In the sections that follow, we’ll look at common multi-tenant S3 strategies and highlight how and where these strategies are frequently applied.
The most straightforward approach to partitioning tenant data with Amazon S3 is to assign a separate bucket per tenant. The diagram below provides an example of this model.
Figure 1 – The bucket-per-tenant model.
With this approach, each tenant would be assigned a bucket that holds its data. This bucket would be given a name that uniquely binds it to the tenant.
This model works well when you’re working with a smaller collection of tenants (tens or hundreds). However, it does not scale well for environments that need to support a much larger population of tenants. Amazon S3 has a default quota of 100 buckets and the hard quota of 1,000 buckets per AWS account.
The other consideration here is bucket naming. Since each S3 bucket name must be globally unique across all AWS accounts, a bucket-per-tenant model would require a naming convention that ensured your tenant bucket names would support this requirement. Since bucket names are public, you should generally avoid using names that include tenant-specific information.
Lastly, S3 bucket quotas are not exclusive to this instance of your SaaS application. You may have other environments (production, staging, development) and AWS services requiring dedicated buckets consuming some of your quota.
Overall, while there is some simplicity to this model, it’s also clear the bucket-per-tenant model introduces challenges that could impact the scale and agility of your SaaS environment.
Object Key Prefix-Per-Tenant Model
To achieve better scale and overcome some of the limitations of the bucket-per-tenant model, SaaS providers may use key name prefixes to associate objects with tenants. This approach allows you to scale to a much larger collection of tenants without compromising on the structure or organization of your data partitioning scheme.
Figure 2 – The prefix-per-tenant model.
Here, you will notice that two tenants are sharing a single bucket. Each tenant has a unique prefix which identifies the objects that belong to that tenant. The good news is there is no limit to the number of objects you can store in a bucket or the number of prefixes you can have.
One challenge that can surface is that the activity for your S3 keys is unlikely to be evenly distributed across your tenants. Your solution can achieve thousands of transactions per second per partitioned prefix. You can increase your performance by sharding your prefixes for individual tenants to better distribute the load.
Database-Mapped Tenant Objects
In some instances, the object access patterns of your application could influence how you choose to partition your S3 objects. Imagine a scenario where you want to find the objects for a tenant that meets some application-specific criteria (e.g. find all tenant-1 objects that belong to project-a).
The idea here is that you move the searchable elements to a database and query that database to find references to S3 objects. The diagram below provides an example of this use case.
Figure 3 – Database-mapped tenant objects.
In this example, we’ve introduced an object access service in the diagram that will process requests for S3 objects. This service could be a microservice in your application that supports the ability to request S3 objects based on some range of criteria that are managed by your application.
Tenants will submit requests with their parameters and the service will query a database that contains the tenant identified along with the parameters needed for the query (such as ProjectID). These parameters will be used to query the database and return references to specific tenant S3 objects.
Since we’re using the database to manage access S3 object, our example stores all tenants’ S3 objects in a flat, co-mingled structure. If the view into these objects is always through the lens of the database, then the tenant identifier column in the database could represent your partitioning model where isolation is applied based on the tenant identifier. That allows your S3 objects to be stored as a global pool of objects that are connected to tenants via this mapping database table. A flat structure requires unique object keys with random prefixes to avoid request throttling.
Additionally, this approach can be a hybrid model combined with prefix-per-tenant where we partitioned the tenant objects by prefix. Another approach to data partitioning uses S3 object tags to add tenant metadata to each stored object. Amazon S3 object tagging incurs additional cost.
If there are other access patterns outside the object access service, such as directly by a customer or an AWS service integration, then it would make sense to add either key prefixes or tags on your objects.
Tenant isolation is one of the foundational topics that every SaaS provider must address. It’s how your architecture ensures one tenant is prevented from accessing the resources of another tenant. Failure here would represent a significant and potentially unrecoverable event for a SaaS business.
As part of choosing an S3 data partitioning model, you must also consider how a given partitioning model would influence the tenant isolation footprint of your solution. For the bucket-per-tenant and prefix-per-tenant strategies, you can define tenant-specific AWS Identity and Access Management (IAM) policies that will be used to prevent cross-tenant access to resources using service-specific resources, actions, and condition context keys.
The IAM policies that are used for your different partitioning model can be statically created or dynamically generated based on the needs of your SaaS environment and policy size limits. To learn more about this strategy, read Isolating SaaS Tenants with Dynamically Generated IAM Policies.
Endpoint-Based Partitioning and Isolation
Amazon S3 access points are named network endpoints that enable access to S3 objects. This moves us further away from thinking our S3 data as a series of buckets and/or keys. Instead, the focus shifts to using an access point to control access to each tenant’s data. There is a default quota of 1,000 access points you can have in an account.
This approach allows you to define endpoints for individual tenants with policies that can manage access to the objects that are associated with a given tenant. Access points use statically configured IAM policies which support S3 bucket name, object key prefix, or object tag restrictions.
This enables access to S3 objects by other AWS services or accounts while maintaining tenant isolation. Access points work with some, but not all, AWS services and features. They can also allow a SaaS customer direct access to their S3 objects from their own AWS accounts.
Securing Tenant Objects with Encryption Keys
The S3 partitioning model of your SaaS solution may also be influenced by additional security considerations. For some environments, the compliance and data sensitivity needs of an organization may require objects to be further protected through encryption.
Here, the focus is on how we can provide each tenant with a key that protects their data. In these scenarios, Amazon S3 can be used with the AWS Key Management Service (AWS KMS) to provide server-side encryption of S3 objects.
The S3 partitioning model you choose impacts how keys are applied. For example, with the bucket-per-tenant model, you can assign a unique encryption key for each bucket.
With the prefix-per-tenant model, you can still share a root encryption key using envelope encryption to separately encrypt each object. Envelope encryption is the practice of encrypting your plaintext data with a data key, and then encrypting that data key with a root key.
When your application code handles object write requests, you can specify the AWS KMS key to use for encryption by object. You can have up to 10,000 customer-managed keys in each region of your AWS account. You can also optionally provide an additional encryption context per object to support authenticated encryption.
Amazon S3 bucket keys can reduce AWS KMS request costs by up to 99% when using AWS Key Management Service for server-side encryption (SSE-KMS). This S3 bucket key is used for a time-limited period within S3, which will only share an S3 bucket key for objects encrypted by the same AWS KMS key. This helps stay below KMS API request quotas.
Encryption and IAM policies can be combined as part of your overall security and tenant isolation model.
Tenant Activity and Consumption
SaaS solutions are often sold in a pay-as-you-go model where the cost of a product is determined based on the consumption profile of a customer. Tracking cost information at the tenant level allows you to make consumption-based decisions on metering or pricing.
In the bucket-per-tenant model, you can track the storage cost of tenants by labeling the separate S3 buckets using cost allocation tags.
In the prefix-per-tenant approach, you can use Amazon S3 Inventory to track S3 consumption. It contains a list of the objects in the source bucket and metadata for each object. Metadata fields include the key name and size. However, the inventory list does not contain object tags. Inventory can be generated on a daily or weekly basis for an S3 bucket or a shared prefix.
You can set up Amazon S3 Event Notifications for inventory completion. Amazon S3 Inventory incurs additional cost.
You can also use Amazon Athena to quickly analyze and query your S3 inventory. With Athena, you would only pay for the queries you run and are charged based on the amount of data scanned by each query.
Server access logging provides detailed records for the requests that are made to an S3 bucket. You can use these logs to profile your S3 bill for both API requests and data transfer costs of tenants. This represents another case where you can use Amazon Athena to analyze and query server access logs. Log record fields include time, bucket, key, bytes sent, and object size, but does not include object tags.
Note that these logs are delivered on a best-effort basis. It’s rare to lose log records, and the log record for a particular request may be delivered long after the request was actually processed.
Lifecycle Management Using Configuration Rules
With Amazon S3, you also have the option of defining the lifecycle of your S3 objects. In SaaS environments, you may choose to have different lifecycle policies applied to different tenants (based on tiers or other application requirements).
These S3 lifecycle management rules can be used to determine when an object transitions to another storage model expires. For example, a premium-tier tenant might have different transition rules than a basic-tier tenant.
A lifecycle rule can apply to all or a subset of objects in a bucket based on the <Filter> element that you specify in the lifecycle rule. You can filter objects by key prefix, object tags, or a combination of both. An S3 lifecycle configuration can have up to 1,000 rules and this limit is not adjustable.
Additional S3 Bucket Configurations
All Amazon S3 buckets, including those used by multi-tenant SaaS applications, can benefit from the cost management and security configurations suggested in the AWS re:Invent 2021 session Deep dive on Amazon S3 security and access management.
- The Amazon S3 Intelligent-Tiering storage class is designed to optimize storage costs by automatically moving data to the most cost-effective access tier when access patterns change, without operational overhead or impact on performance.
- We recommend you disable access control lists (ACLs) on your S3 buckets.
- With S3 Block Public Access, account administrators and bucket owners can easily set up centralized controls to limit public access to their S3 resources that are enforced regardless of how the resources are created.
Storing multi-tenant data with Amazon S3 correctly is critical for many SaaS applications. This post presented multiple options for data partitioning along with their key considerations. Strategies supporting tenant isolation such as access policies and encryption were reviewed.
Related information was included about tenant activity and cost tracking, lifecycle management for objects, and additional bucket security configurations.
About AWS SaaS Factory
AWS SaaS Factory helps organizations at any stage of the SaaS journey. Whether looking to build new products, migrate existing applications, or optimize SaaS solutions on AWS, we can help. Visit the AWS SaaS Factory Insights Hub to discover more technical and business content and best practices.
SaaS builders are encouraged to reach out to their account representative to inquire about engagement models and to work with the AWS SaaS Factory team.
Sign up to stay informed about the latest SaaS on AWS news, resources, and events.