Optimizing SaaS Tenant Workflows and Costs
Designing robust software as a service (SaaS) solutions often requires developers to find new and creative strategies for optimizing their applications. Pouring all your customers into a shared, multi-tenant environment places an even higher premium on identifying opportunities to remove bottlenecks and improve each customer’s experience. The smallest flaw can have a compounding impact that cascades across all your tenants, creating a collective negative impression of your system’s overall stability.
As a SaaS developer, you can approach optimization from many angles. There are certainly basic things that can be done to tweak the configuration of your system’s footprint to improve performance. However, the emphasis in this blog post is on SaaS-specific optimizations that address the complexities introduced by a diverse set of tenants who might be imposing continually evolving loads on your system. In the world of SaaS, the makeup of your customers, their profiles, and their demands on your system can shift all the time. Each tenant might have unique usage patterns associated with how and when they exercise the moving parts of your SaaS environment. Our goal, then, is to identify strategies for reacting and responding to these tenant dynamics in real time.
Amazon Web Services (AWS) offers SaaS developers a rich collection of services that can be applied in a number of different permutations to help optimize a multi-tenant environment. These services represent the dials and knobs that can be leveraged to refine the performance and general profile of your SaaS offering. In this discussion, we will illustrate a few of these models to give you a better sense of what’s possible.
Not All Tenants Are Created Equal
Many SaaS application providers offer their solutions in a tiered model where tenants are provided access to varying levels of performance and functionality based on the tier they have selected. This tiering strategy is often fundamental to SaaS organizations, allowing them to align the cost footprint of tenants with the level of stress they apply and the amount of resources they consume.
You can imagine that the entry-level, $49-per-month tenant may have different expectations than the enterprise customer who is paying substantially more. In some cases, these tiers distinguish your tenants and become points of inflection for introducing optimizations.
Supporting tenant-level optimizations often requires a centralized tenant policy management service. This service enables the introduction of more targeted and more granular policies that can shape each tenant’s experience. With these policies in place, you’ll be able to tailor the performance of your system and align it with the varying requirements of each tenant tier.
In this blog post, we’ll discuss two optimization strategies that leverage tenant polices: workflow-driven optimization and load-driven optimization. These two examples illustrate how services and polices can be combined to achieve your optimization goals.
AWS provides developers with a diverse set of storage technologies, each of which has its sweet spot for addressing the varying needs of your SaaS application. These storage solutions are prime candidates for optimizing the experience of your tenants, enabling a unique ability to match storage performance with specific workflows and tenant service-level agreements (SLAs).
The following diagram provides an example of how you might leverage a combination of AWS storage solutions to optimize your SaaS environment. The example looks at how two tenants could be optimized to support the three e-commerce workflows specified in the green box. The fact that users might have different performance expectations for each workflow is what really identifies this area as a candidate for optimization. For example, users might expect that getting their orders from yesterday would be pretty snappy. In contrast, getting orders from 30 days ago might be a bit less responsive, and they might expect that getting their entire order history would take even longer.
Now, if we’ve identified areas where there are clear lines in the user’s performance expectations, we have also identified natural opportunities to leverage separate AWS storage models for each of these scenarios. That’s what has been done here in the diagram.
Let’s start with Tenant 1 on the left. We have applied a separate AWS storage technology for each workflow: This tenant will use Amazon DynamoDB to retrieve orders from yesterday, Amazon Relational Database Service (Amazon RDS) to support requests for orders in the last 30 days, and Amazon Simple Storage Service (Amazon S3) to acquire the full order history.
Tenant 1 represents an entry-level customer who is at one of the starter tiers of our system. Tenant 2, on the other hand, has paid a premium to be in a higher tier and, as such, demands a higher level of throughput for his/her operations. To accommodate this experience, Tenant 2 will process all order requests for the last 30 days with Amazon DynamoDB, and will use Amazon RDS for the entire order history.
Don’t get lost in the details of which AWS services are better for optimization in this instance. Clearly, the specific needs and profile of your application and its storage footprint might be optimized differently. The real emphasis here is on moving away from viewing storage as an all-or-nothing choice. Instead, you can customize your storage optimization strategy based on the different workflows of your application.
The example I’ve outlined illustrates two dimensions of SaaS optimization. First, it stresses the importance of looking at application workflows in isolation. There are often cases where performance can be enhanced dramatically for a vital workflow without requiring you to apply this optimization globally. The more granular view opens the door to applying AWS services that best align with the needs of a given usage scenario. The other dimension the example highlights is the ability to vary your solution on a tenant-by-tenant basis.
You’ll notice that this example introduces the notion of a separate tenant policy management service that is consulted to determine which storage model is used by each tenant. This service would be created as a highly available, shared service that could be accessed by any tier of your solution. It is fundamental to enabling your ability to distinguish the value of each tier of your application and offer optimizations that are specific to a tenant’s profile.
This strategy also relies heavily on the introduction of an abstraction of your data access interface that hides the details of storage from your clients. Generally, this is a good strategy for storage in AWS, because it enables you to use alternate storage strategies without affecting the code of your clients.
If you look across the spectrum of tenants who are using your system, you’ll often find, at a given moment in time, that a fraction of your tenants are placing the bulk of the load on your environment. As you look at optimization strategies, you should think about how you can selectively address the needs of this subset of tenants. The goal is to find a balance of adding capacity or throughput that’s just enough to satisfy the needs of these high-volume tenants without the cost of extending this optimization to everyone.
The diagram that follows provides an example of a load-driven optimization model. Here we have a subset of three tenants who are currently attempting to access a catalog of products from an e-commerce system. As you might suspect, the catalog represents relatively static data, so it can often be cached to enhance a shopper’s experience. While it may be ideal to cache every tenant’s catalog, we also know that some tenants’ online stores are much more active than others. In fact, in this example, it turns out that only 20% of our tenants are placing 80% of the load on our system. If we were to simply optimize for these target tenants we could cache their catalogs without needing to support this capability for all tenants.
You’ll notice here that our diagram includes an Amazon RDS master and read replicas as well as an Amazon ElastiCache cluster. It also has a separate cache manager that manages the state and freshness of the cached data. The status is stored in some highly available repository (let’s say Amazon DynamoDB) and it includes a hash key of the tenants who are currently holding data in ElastiCache along with some timestamp data that indicates when the cache was last accessed.
Let’s look at how all these moving parts work together to support our caching strategy. The process starts with a request from Tenant 3 to view catalog data (step 1). The data access layer accepts this request and consults the cache status to determine if the tenant is already in the cache (step 2). In this example, Tenant 3 is not currently in the cache and, as a result, the data access layer acquires the catalog data from RDS (step 3). Now that the data has been acquired, the system will determine if this is a “high load” tenant and, if so, it will push the data to ElastiCache (step 4). The last step is to update the cache status to reflect the cached state of Tenant 3 (step 5). Now, when the next request for Tenant 3’s data arrives, the catalog data will be pulled from the cache instead of RDS.
The key is to have a relatively intelligent cache manager. We certainly could have implemented this by having some manual tenant configuration that provided IDs of the tenants that would be cached. That approach, however, feels a bit static. Instead, this model relies on the cache manager to observe the activity of tenants and determine which tenants belong in the cache. This allows the cache to accurately represent and model the behavior of your tenants in real time. It also accommodates changes in tenant activity on a day-to-day and hour-by-hour basis.
Once this model is in place, you have the option to tune the cache manager policies based on any analytics data you may capture. For example, if you determine that more tenants need to be cached, you can simply increase the size of your ElastiCache cluster and update your cache manager to support a deeper queue of cached tenants.
The cache manager also provides the basic mechanisms needed to ensure that the cached items are up-to-date. If relevant data were to change in RDS, the cache manager could mark entries as “stale” and force them to reload on a subsequent request.
Finding Optimization Opportunities
Analytics often represent one of your best sources for capturing and understanding how and where optimizations might be applied to your SaaS solution. Through analytics, you can build more comprehensive, more granular views of how tenants are imposing load on your system. This data often provides insights into trends and patterns of usage that may not be readily apparent.
To maximize your optimization potential, you’ll want to be sure you have tooling and instrumentation in place that will capture both resource consumption and application usage data at a tenant level. The availability of tenant-wide metrics is clearly useful to this process, but the ability to drill down and see tendencies at the tenant level provides an added dimension of data that can be much more helpful in determining where optimizations can add value.
Optimizing for Cost
Optimization isn’t solely about improving customer experience. Cost is also a key element of any optimization strategy. For some SaaS businesses, it is essential to find creative ways to strike a balance between customer needs and the costs associated with meeting those needs. In the best of worlds, you may find opportunities to minimize the costs that have a minimal effect on your tenant’s experience.
In our storage optimization solution, for example, the performance and the cost were both affected by the strategy we adopted. The ability to distribute the storage across multiple AWS services certainly would have reduced the overall storage costs of the application. The same mindset could be applied to compute models, where we could optimize and align tenant consumption profiles with the various types of instances or instance pricing models offered by AWS.
Leveraging AWS Services
In this post, I’ve touched on a few ways you might consider optimizing your SaaS solution on AWS. My goal was not to enumerate or classify all the possibilities, but rather to provide insights into the different dimensions of the SaaS optimization landscape.
AWS offers developers a diverse set of services that can be applied to the optimization equation. As you consider the entire surface area of the AWS stack of services, you’re likely to identify a number of different opportunities to enhance both the cost and performance profile of SaaS applications. These enhancements can improve your customers’ experience and provide you with more opportunities for targeting the specific needs of your tenants.