AWS Partner Network (APN) Blog

Testing SaaS Solutions on AWS

Tod Golding is a Partner Solutions Architect (SA) at AWS. He is focused on SaaS. 

The move to a software as a service (SaaS) delivery model is often motivated by a fundamental need for greater agility and customer responsiveness. SaaS providers often succeed and thrive based on their ability to rapidly release new features without compromising the stability of their solutions. Achieving this level of agility starts with a commitment to building a robust DevOps pipeline that includes a rich collection of automated tests. For SaaS providers, these automated tests are at the core of their ability to effectively assess the complex dimensions of multi-tenant load, performance, and security.

In this blog post, we’ll highlight the areas where SaaS can influence your approach to testing on AWS. In some cases, SaaS will simply extend your existing testing models (load, performance, and so on). In other cases, the multi-tenant nature of SaaS will introduce new considerations that will require new types of tests that exercise the SaaS-specific dimensions of your solution. The sections that follow examine each of these areas and provide insights into how expanding the scope of your tests can add value to SaaS environments.

SaaS Load/Performance Testing

In a multi-tenant universe, your tests go beyond simply ensuring that your system is healthy—tests must also assure that your system can effectively respond to unexpected variations in tenant activity that are commonly associated with SaaS systems. Your tests must be able to verify that your application’s scaling policies can respond to the continually changing peaks and valleys of resource consumption associated with SaaS environments. The reality is, the unpredictability of SaaS loads combined with the potential for cross-tenant performance degradation makes the bar for SaaS load and performance testing much higher. Customers will certainly be unhappy if their system’s performance is periodically affected by the activities of other tenants.

For SaaS, then, the scope of testing reaches beyond performance. It’s about building a suite of tests that can effectively model and evaluate how your system will respond to the expected and the unexpected. In addition to ensuring that customers have a positive experience, your tests must also consider how cost efficiently it is achieving scale. If you are over-allocating resources in response to activity, you’re likely impacting the bottom line for the business.

The following diagram represents an idealized representation of how SaaS organizations prefer to model the connection between load and resource consumption. Here, you see actual tenant consumption in blue and the allocated resources in red. In this model, you’ll notice that the application’s resources are allocated and deallocated in lockstep with tenant activity. This is every SaaS architect’s dream. Here, each tenant has a positive experience without over-committing any resources.

The patterns in this chart represent a snapshot of time on a given day. Tomorrow’s view of this same snapshot could look very different. New tenants may have signed up that are pushing the load in entirely new ways. This means your tests must consider the spectrum of load profiles to verify that changes in tenant makeup and application usage won’t somehow break your scaling policies.

Given this consumption goal and the variability of tenant activity, you’ll need to think about how your tests can evaluate your system’s ability to meet these objectives. The following list identifies some specific areas where you might augment your load and performance testing strategy in a SaaS environment:

  • Cross-tenant impact tests – Create tests that simulate scenarios where a subset of your tenants place a disproportionate load on your system. The goal here is to determine how the system responds when load is not distributed evenly among tenants, and assess how this may affect overall tenant experience. If your system is decomposed into separately scalable services, you’ll want to create tests that validate the scaling policies for each service to ensure that they’re scaling on the right criteria.
  • Tenant consumption tests – Create a range of load profiles (e.g., flat, spikey, random) that track both resource and tenant activity metrics, and determine the delta between consumption and tenant activity. You can ultimately use this delta as part of a monitoring policy that could identify suboptimal resource consumption. You can also use this data with other testing data to see if you’ve sized your instances correctly, have IOPS configured correctly, and are optimizing your AWS footprint.
  • Tenant workflow tests – Use these tests to assess how the different workflows of your SaaS application respond to load in a multi-tenant context. The idea is to pick well-known workflows of your solution, and concentrate load on those workflows with multiple tenants to determine if these workflows create bottlenecks or over-allocation of resources in a multi-tenant setting.
  • Tenant onboarding tests – As tenants sign up for your system, you want to be sure they have a positive experience and that your onboarding flow is resilient, scalable, and efficient. This is especially true if your SaaS solution provisions infrastructure during the onboarding process. You’ll want to determine that a spike in activity doesn’t overwhelm the onboarding process. This is also an area where you may have dependencies on third-party integrations (billing, for example). You’ll likely want to validate that these integrations can support their SLAs. In some cases, you may implement fallback strategies to handle potential outage for these integrations. In these cases, you’ll want to introduce tests that verify that these fault tolerance mechanisms are performing as expected.
  • API throttling tests – The idea of API throttling is not unique to SaaS solutions. In general, any API you publish should include the notion of throttling. With SaaS, you also need to consider how tenants at different tiers can impose load via your API. A tenant in a free tier, for example, may not be allowed to impose the same load as a tenant in the gold tier. The main goal here is to verify that the throttling policies associated with each tier are being successfully applied and enforced.
  • Data distribution tests – In most cases, SaaS tenant data will not be uniformly distributed. These variations in a tenant’s data profile can create an imbalance in your overall data footprint, and may affect both the performance and cost of your solution. To offset this dynamic, SaaS teams will typically introduce sharding policies that account for and manage these variations. Sharding policies are essential to the performance and cost profile of your solution, and, as such, they represent a prime candidate for testing. Data distribution tests allow you to verify that the sharding policies you’ve adopted will successfully distribute the different patterns of tenant data that your system may encounter. Having these tests in place early may help you avoid the high cost of migrating to a new partitioning model after you’ve already stored significant amounts of customer data.

As you can see, this test list is focused on ensuring that your SaaS solution will be able to handle load in a multi-tenant context. Load for SaaS is often unpredictable, and you will find that these tests often represent your best opportunity to uncover key load and performance issues before they impact one or all of your tenants. In some cases, these tests may also surface new points of inflection that may merit inclusion in the operational view of your system.

Tenant Isolation Testing

SaaS customers expect that every measure will be taken to ensure that their environments are secured and inaccessible by other tenants. To support this requirement, SaaS providers build in a number of policies and mechanisms to secure each tenant’s data and infrastructure. Introducing tests that continually validate the enforcement of these policies is essential to any SaaS provider.

Naturally, your isolation testing strategy will be shaped heavily by how you’ve partitioned your tenant infrastructure. Some SaaS environments run each tenant in their own isolated infrastructure while others run in a fully shared model. The mechanisms and strategies you use to validate your tenant isolation will vary based on the model you’ve adopted.

The introduction of IAM policies provides an added layer of security to your SaaS solution. At the same time, it can add a bit of complexity to your testing model. It’s often difficult to find natural mechanisms to validate that your policies are performing as expected. This is typically addressed through the introduction of test scripts and API calls that attempt to access tenant resources with specific emphasis on simulating attempts to cross-tenant boundaries.

The following diagram provides one example of this model in action. It depicts a set of resources (Amazon Elastic Compute Cloud (Amazon EC2) instances, Amazon DynamoDB items, and Amazon Simple Storage Service (Amazon S3) buckets) that belong to two tenants. To enforce isolation of these tenant resources, this solution introduces separate IAM policies that will scope and limit access to each resource.

With these policies in place, your tests must now validate the policies. Imagine, for example, that a new feature introduces a dependency on a new AWS resource. When introducing this new resource, the team happens to overlook the need to create the corresponding IAM policies to prevent cross-tenant access to that resource. Now, with good tests in place, you should be able to detect this violation. Without these tests, you have no way of knowing that your tenant isolation model is being accurately applied.

As part of isolation testing, you may also want to introduce tests that validate the scope and access of specific application and management roles. For example, SaaS providers often have separate management consoles that have varying levels of access to tenant data. You’ll want to be sure to use tests that verify that the access levels of these roles match the scoping policies for each role.

Tenant Lifecycle Testing

The management of SaaS tenants requires you to consider the full lifecycle of events that may be part of a tenant’s experience. The following diagram provides a sampling of events that are often part of the overall tenant lifecycle.

The left side of this diagram shows the actions that tenants might take, and the right side shows some of the operations that a SaaS provider’s account management team might perform in response to those tenant actions.

The tests you would introduce here would validate that the system is correctly applying the policies of the new state as tenants go through each transition. If, for example, a tenant account is suspended or deactivated, you may have policies that determine how long data is retained for that tenant. These policies may also vary based on the tier of the tenant. Your tests would need to verify that these policies are working as expected.

A tenant’s ability to change tiers also represents a good candidate for testing, because a change in tiers would also change a tenant’s ability to access features or additional resources. You’ll also want to consider the user experience for tier changes. Does the tenant need to log out and start a new session before their tier change is recognized? All of these policies represent areas that should be covered by your tier tests.

Tier Boundary Testing

SaaS solutions are typically offered in a tier-based model where SaaS providers may limit access to features, the number of users, the size of data, and so on based on the plan a tenant has selected. The system will then meter consumption and apply policies to control the experience of each tenant.

This tiering scheme is a good candidate for testing in SaaS environments. SaaS teams should create tests that validate that the boundaries of each tier are being enforced. This typically requires simulating configuration and consumption patterns that will exceed the boundary of a tier and validating that the policies associated with that boundary are correctly triggered. The policies could include everything from limiting access to sending notifications.

Fault Tolerance Testing

Fault tolerance is a general area of concern for all solutions. It’s also an area that is addressed in depth by the industry with solid guidance, frameworks, and tools. The bar for fault tolerance in SaaS applications is very high. If your customers are running on shared infrastructure and that environment is plagued by availability problems, these problems will be visible to your entire population of customers. Naturally, this can directly impact your success as a SaaS provider.

It’s beyond the scope of this blog post to dig into the various strategies for achieving better fault tolerance, but we recommend that you add this to the list of testing areas for your SaaS environment. SaaS providers should invest heavily in adopting strategies that can limit or control the scope of outages and introduce tests that validate that these mechanisms are performing as expected.

Using Cloud Constructs

Much of the testing that we’ve outlined here is made simpler and more cost effective on AWS. With AWS, you can easily spin up environments and simulate loads against those environments. This allows you to introduce tests that mimic the various flavors of load and performance you can expect in your SaaS environments. Then, when you’re done, you can tear these environments down just as quickly as you created them.

Testing with a Multi-Tenant Mindset

SaaS multi-tenancy brings with it a new set of load, performance, isolation, and agility considerations—each of which adds new dimensions to your testing mindset. This blog post provided a sampling of considerations that might shape your approach to testing in a SaaS environment. Fortunately, testing SaaS solutions is a continually evolving area with a rich collection of AWS and partner tools. These tools can support your efforts to build a robust testing strategy that enhances the experience of your customers while still allowing you to optimize the consumption of your solution.