Optimizing Cost Per Tenant Visibility in SaaS Solutions

By Ujwal Bukka, Sr Partner Solutions Architect – AWS

One of the top challenges in operating a software-as-a-service (SaaS) solution is measuring the resource consumption of individual tenants in order to understand their usage patterns, attribute costs, and more. The dynamic nature of SaaS environment and shifting needs makes it even more challenging.

In this blog, I will discuss strategies for measuring tenant resource consumption and give examples of how you can apply these strategies to your SaaS environment to get cost allocation. I will briefly touch upon how you can use the tenant resource consumption data you gathered to optimize your SaaS architecture, to improve operational footprint of SaaS environment and to drive business decisions.

Measuring tenant resource consumption is crucial for determining the cost of operating your SaaS environment, calculate cost per tenant, profile the activity, and consumption patterns of tenants, and gathering more related insights.

These insights are valuable for business teams who make strategic decisions about how to build, sell, and market your SaaS application, and for technical teams they can use this data to make strategic decisions on how to design, scale and operate various types of components in your SaaS application.

Measuring Resource Consumption Challenges

Based on the industry in which you are building your SaaS solution, you may have tenants who share resources (pool model) and there may be tenants who require a separate set of resources just for them (silo model). While this will affect the way to measure tenant resource consumption, it will not be the only measuring dimension.

In some cases, measuring resource consumption involves measuring application level metrics like application programming interface (API) requests and transaction count are all mandatory for calculating the cost per tenant. Refer to this blog to understand measuring resource consumption and cost attribution challenges in more detail.

Strategies to Measure Resource Consumption

Measuring the resource consumption requires SaaS providers to create a consumption mapping model that represents a clear view of how tenants are consuming the resources of your system as shown in Figure 1.

Figure 1 – Tenant resource consumption mapping

The goal is to arrive at a collection of data that will allow you to allocate a percentage of consumption to each tenant of your system. Assembling this view of consumption can be challenging in a multi-tenant environment where tenants might share some or all of the system’s resources. Refer to this workshop to understand how you can collect tenant consumption, derive percentage of tenant consumption and calculate cost per tenant.

There is no single model to measure tenant resource consumption which fits all architectures, rather some common strategies that you should consider when selecting a strategy for your application. First, look at the overall cost profile of your SaaS environment and determine how your application is influencing the costs in your bill. You might observe that some parts of your architecture are contributing most to your bill (such as data storage or compute power usage, for example). In order to get the most significant insights in terms of cost, start with gathering data about these parts. There is not much value spending time in areas which contribute less to your bill.

In order to get visibility into the cost heavy components, you can choose between two approaches, or combine them.

Coarse grained approach: which is less invasive, and is used to approximate tenant consumption
Fine grained approach: where you will add metering instrumentation, publish cost-related events, and aggregate and summarize consumption metrics.

Both of these approaches can be used both in the infrastructure (Amazon Web Services) or application level. Choosing the approach comes down to balancing the level of consumption detail you’re after and the complexity of instrumenting and capturing the data that’s needed to attribute consumption. We recommend considering which approach is most suitable in the component (i.e. microservice or functionality) level, rather than using one approach for the whole SaaS environment. Now that we explained about the approaches, let’s review them in more detail.

Coarse Grained Approach

This approach is about approximating tenant consumption based on general tenant activity. You can apply the approximate tenant resource consumption to billing data to calculate the cost per tenant. For example, let’s say your SaaS application includes a data storage element.

Ideally, you should find out the amount of data stored by the tenant and then calculate cost per tenant. But as an approximation you can capture number of active users for each tenant interacting with data storage and consider it as tenant activity to infer approximate tenant consumption and apply it to your billing data to calculate cost per tenant.

Let’s say if tenant one has 10% of active users with respect to total active users of your SaaS application, then you can approximately apportion 10% of your SaaS application’s billing charges to tenant one.

The approach here tries to map number of users/frequency of calls to tenant activity, then infers tenant consumption from it and uses that data to calculate cost per tenant. Sometimes in a given SaaS application as described above the number of users/calls might not precisely correlate to consumption but it might be a reasonable compromise.

You can also use AWS services like AWS Cost and Usage Reports which contain a comprehensive set of cost and usage data to approximate tenant resource consumption. If you tag your resources and activate these user-defined tags through Cost allocation tags under AWS Billing Console then the AWS Cost and Usage Reports will include your tags in the cost and usage reports, then you can group this data by tags and arrive at approximate cost and usage values.

Fine Grained Approach

This approach is about capturing detailed tenant data by using your application and/or AWS services. It starts by introducing metrics instrumentation across the stack of your solution, and collecting consumption metrics of various resources in your architecture. For example, Figure 2 shows how you can use metrics instrumentation infrastructure to collect application microservices consumption metrics and then measure tenant consumption.

Figure 2 – Capturing detailed tenant consumption with metrics instrumentation infrastructure

Here, application code will capture detailed metrics data about how tenants are consuming the service and its related resources. The application code will publish the metric data as an event and you will capture these events using the metrics instrumentation infrastructure made up of the following core AWS services:

Amazon CloudWatch
AWS Lambda
Amazon Kinesis Data Firehose
Amazon Simple Storage Service (Amazon S3)

You will then be able to aggregate these published events data and analyze them based on your own modeling strategy to arrive at a distribution of consumption across tenants and then derive cost per tenant.

Based on what best supports your needs, you can use any other tools or technologies to build the metrics instrumentation infrastructure. Refer to this blog to understand on how to build metrics instrumentation infrastructure and capture detailed metrics.

In some cases, you can collect fine grained data by only using data gathered by AWS services like Amazon CloudWatch and AWS X-Ray (rather than using application code). For example, if your application uses Amazon DynamoDB, you can query your Amazon CloudWatch logs to get the list of calls that were made to the database, or use the Amazon DynamoDB API to get the amount of capacity units consumed by each call to an Amazon DynamoDB table.

Finally, use the data to group the metrics by tenant and store this information in your data store of choice. Refer to this blog to better understand on how to implement this approach.

Siloed Resources

In some cases, due to compliance, regulation or isolation requirements you may need to provision a separate set of resources just for one tenant. Depending on customer requirements, you can provision these silo resources in separate AWS accounts or within a single AWS account and use AWS constructs like Amazon Virtual Private Cloud (Amazon VPC) and Amazon Identity and Access Management (AWS IAM) policies and roles to enforce tenant isolation.

In a scenario of an account per tenant, measuring resource consumption is straightforward. All the resources consumed in that AWS account can be attributed to that tenant. That said, remember that this format does not provide fine grained visibility.

When provisioning silo resources in a single account, you can use AWS tags to tag the resources. Tagging the resources with a unique tenant id will help differentiate between tenant’s data. A potential next step can be using AWS Cost and Usage Reports in that AWS account that will present a breakdown of your costs and usage of resources by tags.

Pooled Resources

In a pooled component, multiple tenants will share resources. Here, measuring resource consumption by tenant will be challenging, but the strategies we discussed above come handy here. Always pick areas which contribute the most to your AWS bill to measure resource consumption. Then, based on your requirements, use a coarse grained or a fine grained approach to calculate resource consumption.

Based on your SaaS solution’s architecture, it will make sense to choose the dimension of tenant activity that best models tenant resource consumption. For example, for capturing the amount of data stored by each tenant in a database. Here, you can apply the coarse grained approach described above and query the database for the amount of data that belongs to a certain tenant or capture the number of times a tenant has interacted with the database in order to understand the approximate tenant consumption.

More often, you may need to capture more detailed information on tenant consumption. For example, if your SaaS environment has CPU-intensive workloads, and you can use compute execution time to model tenant consumption.

Here you can have your microservices call some abstracted library which will publish compute execution time metrics along with tenant context. As mentioned in the fine grained approach in Figure-2, you can then build a metrics instrumentation process which will capture this metric information and store it. Then, depending upon your strategy, you can use this metric data to calculate tenant resource consumption and cost per tenant.

AWS Usage Data Collection

A SaaS solution may use various AWS services. The way resources are provisioned and consumed by tenants varies by service. Hence, the approach to measure tenant resource consumption will vary too.

You may need to build a specific process for aggregating or collecting data generated by an AWS service which would then be used to measure tenant resource consumption. For example, for storage services, you may need to build a process which collects input/output operations per second (IOPS) and storage footprint data to measure tenant resource consumption. The approach will be different for other services like compute services.

Application-Level Usage Data Collection

As described previously, you can measure resources consumption based on types of resources and by AWS services. But in some cases, it’s valuable to understand the resource consumption patterns from an application perspective. Application-level resource consumption provides you with more insights into how various tenants are consuming your overall architecture resources, and sheds light on application usage patterns.

As an example, application-level resource consumption data collection involves capturing detailed application-level metrics like API calls count per tenant, number of items retrieved from database per tenant, time taken by a request to complete at compute layer per tenant. Having a metrics collection and ingestion mechanism comes handy here as shown in this blog, where these various metrics can be captured from different areas of your architecture by this metrics instrumentation infrastructure and can be stored in a common repository.

Then, based on your business strategy, you will build a model where you will calculate total tenant consumption data by using some or all of these captured individual metrics. Use application-level tenant consumption data to calculate cost per tenant by using AWS cost analytics tools (like AWS Cost and Usage Reports, AWS Cost Explorer) or other partner tools like CloudZero or Stripe.

Ways to Use Tenant Resource Consumption Data

Tenant consumption data is valuable information. As we mentioned, this data helps not only with understanding how to charge your customers, but is also helpful for driving business decisions such as defining product tiers, understanding the popularity of existing or newly-launched product features, and shaping your product roadmap.

On the tech side, this data can be used to improve the operational footprint of your SaaS environment too. Ingest this data into operational dashboards or views can provide different insights, such as the system’s ability to respond to the continuously changing loads of your multi-tenant environment. It can help you identify tenants that are most actively consuming the system’s resources, and to understand how tenants are placing load on the key elements of your architecture.

You can use this data to optimize your architecture and proactively manage tenant health, identify tenants that may approach SLA thresholds, identify tenants that are having their experience throttled and also identify other architectural impending issues. Refer to “SaaS Metric Visualizations” section of blog to understand in more detail about how you can build different visualizations using tenant resource consumption data. Finally, you can also use tenant consumption data to optimize your current SaaS architecture in more ways, such as refining scaling, tiering, and throttling strategies, to create a more efficient multi-tenant architecture.

Conclusion

Although there is no single way to measure tenant resource consumption, you can use the strategies we discussed in the blog to build a solution that fits your architecture which can provide insights on your tenant resource consumption. The AWS Well-Architected SaaS Lens Cost Optimization pillar (which extends the AWS Well-Architected Framework’s Cost Optimization pillar) provides additional guidance about this topic.

As next steps, you can use the SaaS Lens to evaluate your current architecture from a cost perspective and then get recommendations for how to optimize cost optimization of your SaaS solution. Based on SaaS Lens users’ feedback, this is a common challenge SaaS provider are struggling with today.

About AWS SaaS Factory

AWS SaaS Factory helps organizations at any stage of the SaaS journey. Whether looking to build new products, migrate existing applications, or optimize SaaS solutions on AWS, we can help. Visit the AWS SaaS Factory Insights Hub to discover more technical and business content and best practices.

SaaS builders are encouraged to reach out to their account representative to inquire about engagement models and to work with the AWS SaaS Factory team.