Multi-Tenant Storage with Amazon DynamoDB
Editor’s note: For the latest information, visit the DynamoDB website.
By Tod Golding, Partner Solutions Architect at AWS
If you’re designing a true multi-tenant software as a service (SaaS) solution, you’re likely to devote a significant amount of time to selecting a strategy for effectively partitioning your system’s tenant data.
On Amazon Web Services (AWS), your partitioning options mirror much of what you see in the wild. However, if you’re looking at using Amazon DynamoDB, you’ll find that the global, managed nature of this NoSQL database presents you with some new twists that will likely influence your approach.
Before we dig into the specifics of the DynamoDB options, let’s look at the traditional models that are generally applied to achieve tenant data partitioning. The list of partitioning solutions typically includes the following variations:
- Separate database – each tenant has a fully isolated database with its own representation of the data
- Shared database, separate schema – tenants all reside in the same database, but each tenant can have its own representation of the data
- Shared everything – tenants all reside in the same database and all leverage a universal representation of the data
These options all have their strengths and weaknesses. If, for example, you’d like to support the ability for tenants to have their own data customizations, you might want to lean toward a model that supports separate schemas. If that’s not the case, you’ll likely prefer a more unified schema. Security and isolation requirements are also key factors that could shape your strategy.
Ultimately, the specific needs of your solutions will steer you toward one or more of these approaches. In some cases, where a system is decomposed into more granular services, you may see situations where multiple strategies are applied. The requirements of each service may dictate which flavor of partitioning best suits that service.
With this as a backdrop, let’s look at how these partitioning models map to the different partitioning approaches that are available with DynamoDB.
Linked Account Partitioning (Separate Database)
This model is by far the most extreme of the available options. Its focus is on providing each tenant with its own table namespace and footprint with DynamoDB. While this seems like a fairly basic goal, it is not easily achieved. DynamoDB does not have the notion of an instance or some distinct, named construct that can be used to partition a collection of tables. In fact, all the tables that are created by DynamoDB are global to a given region.
Given these scoping characteristics, the best option for achieving this level of isolation is to introduce separate linked AWS accounts for each tenant. To leverage this approach, you need to start by enabling the AWS Consolidated Billing feature. This option allows you to have a parent payer account that is then linked to any number of child accounts.
Once the linked account mechanism is established, you can then provision a separate linked account for each new tenant (shown in the following diagram). These tenants would then have distinct AWS account IDs and, in turn, have a scoped view of DynamoDB tables that are owned by that account.
While this model has its advantages, it is often cumbersome to manage. It introduces a layer of complexity and automation to the tenant provisioning lifecycle. It also seems impractical and unwieldy for environments where there might be a large collection of tenants.
Caveats aside, there are some nice benefits that are natural byproducts of this model. Having this hard line between accounts makes it a bit simpler to manage the scope and schema of each tenant’s data. It also provides a rather natural model for evaluating and metering a tenant’s usage of AWS resources.
Tenant Table Name Partitioning (Shared Database, Separate Schema)
The linked account model represents a more concrete separation of tenant data. A less invasive approach would be to introduce a table naming schema that adds a unique tenant context to each DynamoDB table. The following diagram represents a simplified version of this approach, prepending a tenant ID (T1, T2, and T3) to each table name to identify the tenant’s ownership of the table.
This model embraces all the freedoms that come with an isolated tenant scheme, allowing each tenant to have its own unique data representation. With this level of granularity, you’ll also find that this aligns your tenants with other AWS constructs. These include:
- The ability to apply AWS Identity and Access Management (IAM) roles at the table level allows you to constrain table access to a given tenant role.
- Amazon CloudWatch metrics can be captured at the table level, simplifying the aggregation of tenant metrics for storage activity.
- IOPS is applied at the table level, allowing you to create distinct scaling policies for each tenant.
Provisioning also can be somewhat simpler under this model since each tenant’s tables can be created and managed independently.
The downside of this model tends to be more on the operational and management side. Clearly, with this approach, your operational views of a tenant will require some awareness of the tenant table naming scheme in order to filter and present information in a tenant-centric context. The approach also adds a layer of indirection to any code you might have that is metering tenant consumption of DynamoDB resources.
Tenant Index Partitioning (Shared Everything)
Index-based partitioning is perhaps the most agile and common technique that is applied by SaaS developers. This approach places all the tenant data in the same table(s) and partitions it with a DynamoDB index. This is achieved by populating the hash key of an index with a tenant’s unique ID.
What this essentially means is that the keys that would typically be your hash key (Customer ID, Account ID, etc.) are now represented as range keys. The following example provides a simplified view of an index that introduces a tenant ID as a hash key. Here, the customer ID is now represented as a range key.
This model, where the data for every tenant resides in a shared representation, simplifies many aspects of the multi-tenant model. It promotes a unified approach to managing and migrating the data for all tenants without requiring a table-by-table processing of the information. It also enables a simpler model for performing tenant-wide analytics of the data. This can be extremely helpful in assessing and profiling trends in the data.
Of course, there are also limitations with this model. Chief among these is the inability to have more granular, tenant-centric control over access, performance, and scaling. However, some may view this as an advantage since it allows you to have a more global set of policies that respond to the load of all tenants instead of absorbing the load of maintaining policies on a tenant-by-tenant basis. When you choose your partitioning approach, you’ll likely strike a balance between these tradeoffs.
Another consideration here is that this approach could be viewed as creating a single point of failure. Any problem with the shared table could affect the entire population of tenants.
Abstracting Client Access
Each technique outlined in this blog post requires some awareness of tenant context. Every attempt to access data for a tenant requires acquiring a unique tenant identifier and injecting that identifier into any requests to manage data in DynamoDB.
Of course, in most cases, end-users of the data should have no direct knowledge that their provider is a tenant of your service. Instead, the solution you build should introduce an abstraction layer that acquires and applies the tenant context to any DynamoDB interactions.
This data access layer will also enhance your ability to add security checks and business logic outside of your partitioning strategies, with minimal impact to end-users.
Supporting Multiple Environments
As you think about partitioning, you may also need to consider how the presence of multiple environments (development, QA, production, etc.) might influence your approach. Each partitioning model we’ve discussed here would require an additional mechanism to associate tables with a given environment.
The strategy for addressing this problem varies based on the partitioning scheme you’ve adopted. The linked account model is the least affected, since the provisioning process will likely just create separate accounts for each environment. However, with table name and index-based partitioning, you’ll need to introduce an additional qualifier to your naming scheme that will identify the environment associated with each table.
The key takeaway is that you need to be thinking about whether and how environments might also influence your entire build and deployment lifecycle. If you’re building for multiple environments, the context of those environments likely need to be factored into your overall provisioning and naming scheme.
With the shift toward microservice architectures, teams are decomposing their SaaS solutions into small, autonomous services. A key tenant of this architectural model is that each service must encapsulate, manage, and own its representation of data. This means that each service can leverage whichever partitioning approach best aligns with the requirements and performance characteristics of that service.
The other factor to consider is how microservices might influence the identity of your DynamoDB tables. With each service owning its own storage, the provisioning process needs assurance that the tables it’s creating for a given service are guaranteed to be unique. This typically translates into adding some notion of the service’s identity into the actual name of the table.
A catalog manager service, for example, might have a table that is an amalgam of the tenant ID, the service name, and the logical table name. This may or may not be necessary, but it’s certainly another factor you’ll want to keep in mind as you think about the naming model you’ll use when tables are being provisioned.
Agility vs. Isolation
It’s important to note that there is no single preferred model for the solutions that are outlined in this blog post. Each model has its merits and applicability to different problem domains.
That being said, it’s also important to consider agility when you’re building SaaS solutions. Agility is fundamental to the success of many SaaS organizations and it’s essential that teams consider how each partitioning model might influence its ability to continually deploy and evolve both applications and business.
Each variation outlined here highlights some of the natural tension that exists in SaaS design. In picking a partitioning strategy, you must balance the simplicity and agility of a fully shared model with the security and variability offered by more isolated models.
The good news is that DynamoDB supports all the mechanisms you’ll need to implement each of the common partitioning models. As you dig deeper into DynamoDB, you’ll find that it actually aligns nicely with many of the core SaaS values.
As a managed service, DynamoDB allows you to shift the burden of management, scale, and availability directly to AWS. The schemaless nature of DynamoDB also enables a level of flexibility and agility that is crucial to many SaaS organizations.
Kicking the Tires
The best way to really understand the merits of each of these partitioning models is to simply dig in and get your hands dirty. It’s important to examine the overall provisioning lifecycle of each partitioning approach and determine how and where it would fit into a broader build and deployment lifecycle.
You’ll also want to look more carefully at how these partitioning models interact with AWS constructs. Each approach has nuances that can influence the experience you’ll get with the console, IAM roles, CloudWatch metrics, billing, and so on. Naturally, the fundamentals of how you’re isolating tenants and the requirements of your domain are also going to have a significant impact on the approach you choose.
Are you building SaaS on AWS? Check out the AWS SaaS Partner Program, an APN Program providing Technology Partners with support to build, launch, and grow SaaS solutions on AWS.