
AWS Edge Services - Multi-tenant SaaS deployments
Multi-tenant deployments of CloudFront require careful design to meet business requirements such as flexibility, cost, scalability and operational overhead.
- How do you differentiate endpoints across tenants? By path (e.g. saas.com/tenant1, saas.com/tenant2)? By subdomain (e.g. tenant1.saas.com, tenant2.saas.com)? By domain name (e.g. www.tenant1.com, www.tenant2.com)? If you differentiate using domain name, do you provide apex domains to tenants (e.g. tenant1.com, tenant2.com)?
- Who controls the domain name? For example, do you support tenants managing their own DNS, and pointing to your endpoints? How does it shape the TLS certificate issuance workflow?
- Do tenants share the same cacheable content? If so, how do you serve shared cacheable content (e.g. from a single domain, or from the tenant domain)? How do you manage cache?
- How do you map tenants to your back-ends? Is it a single type of back-end (e.g. ALB) or a mix (e.g. some on ALBs and some on Lambdas)? Is it on the same host (e.g. single ALB or single S3 Bucket) or different hosts?
- How do you route tenant requests to the appropriate back-end? Do you need to rewrite the URL to back-end? how?
- Do you provide different tiers of your solution? If so, how do you implement each tier?
- How do you operate changes to configuration at scale? In a way that is balances between speed and and safety.
- How do you enable per tenant visibility? To facilitate troubleshooting, generate per tenant billing, and enable analytics with tenant dimension.
- How do you scale your your tenant base? while honoring relevant AWS quotas and controlling costs?
- CloudFront SaaS Manager. It is a native feature of CloudFront that simplifies the management of tenant endpoints at scale. Notably, it provides a parameterized template that can be implemented by thousands of tenant distributions, with the possibility to customize each tenant distribution (e.g. origin routing). It simplifies TLS certificate issuing workflow with ACM's HTTP domain validation method. Pricing is tiered based on the number of tenants, e.g. beyond 200 tenants, it's $0.10 per distribution tenant per month.
- All tenants served from a single CloudFront distribution. This is a very simple solution to manage when all tenants share a single domain that you control (e.g. saas.com/tenant1, saas.com/tenant2 or tenant1.saas.com, tenant2.saas.com). However, it makes increases the blast radius of config changes, requires additional effort to get visibility per tenant from access logs, and opens little room for customization per tenant.
- Each tenant is served from a dedicated CloudFront distribution. In this solution, you have the best control of each tenant's configuration, allowing you to implement advanced customizations. However, it requires more effort to manage at scale, such as sophisticated CI/CD pipelines to deploy changes while respecting the rates of the CloudFront control plane APIs, and careful management of CloudFront quotas (e.g. number of distributions per account).
- Delivering traffic of tenants using Global Accelerator. In certain cases, using CloudFront to deliver tenant endpoints is not an option. For example, if this is not an HTTP(S) endpoint, or if there is a strict requirement of managing TLS certificates withing a specific AWS region. In this case, Global Accelerator provides an improved endpoint performance and security, while you are fully responsible of managing everything else on your back-end, such as tenant routing, domain management, and TLS termination.
- Arc XP uses Lambda@Edge to route tenants across S3 Buckets. Note that this architecture can be optimized by using CloudFront Functions. At the time of writing that case study, CloudFront Functions did not support origin routing.
CloudFront SaaS Manager | All tenants served from on a single CloudFront distribution | Each tenant is served from a dedicated CloudFront distribution | |
---|---|---|---|
Change management | Simple, per tenant or across tenants. | Simple, but only possible for all tenants, thus increasing the blast radius of errors. | Possible per tenant or for all tenants, but requires automation with batched roll-outs to avoid throttling at CloudFront API level |
Domain management | Very flexible | More suited for a single domain (e.g. *.saas.com or saas.com) | Very flexible |
TLS certificate issuance workflow | Flexible using DNS based or HTTP based domain validations | DNS based domain validation | DNS based domain validation |
Quotas | Up to 10K tenants, and can be increased | Unlimited | Up yo 200 tenants per AWS Account, and can be increased |
Routing to back-ends | Native using parameters | Using edge functions when needed | Static |
Cache | Per tenant | Per tenant or common to tenants | Per tenant |
Cache Invalidation | Per tenant | Depends on the cache key definition | Per tenant |
Logging | Centralized, but with visibility per tenant using the tenant id field | Centralized, but with visibility per tenant using the host header field | Per tenant |
Dashboards | Per tenant and global | Global | Per tenant |
Customization | Some customization per tenant, such as WebACLs. | No customization per tenant | Full customization per tenant |
Connection pool management | Managed using connection groups | Same connection pool for all tenants | Each tenant has a dedicated connection pool |
Incremental cost | None for first 10 tenants, 20$/month for up to 200 tenants, then 0.01$/month per tenant | None | None |
Using a WebACL per tenant | Using the same WebACL for multiple tenants | |
---|---|---|
Price | WebACL/Rules cost scales linearly with tenant number | WebACL/Rules cost is independent of tenant number |
False positives | A rule update might only cause false positives with a single tenant | A rule update might cause false positives with many tenants |
- CloudFront is configured with wildcard CNAME (*.saas.com) and similarly a wildcard TLS certificate.
- Route 53 is configured with wildcard Alias record (*.saas.com) pointing to the created CloudFront distribution.
- Static content (e.g. the assets of the SPA) is most likely the same for all tenants, and can be served from a single S3 bucket using a specific cache behavior with caching enabled. Check this simple example that showcase this implementation.
- Dynamic content can be served from a single load balancer using another cache behavior, where appropriate caching policy is set. If you want to allow some caching per tenant, add the host header to the cache key. Also, add the host header to the origin request policy, so it's forwarded to the back-end so it can identify the tenant to which the request is intended.
- On-boarding new tenants is a simple operation on the back-end (e.g. provision the tenant) that does not require any change on CloudFront.
- If you do not need to have the full logs of this tier's traffic (e.g. there is no billing requirement, and only top talkers need to be monitored), you can enable real time logs on the relevant cache behavior, and enable sampling to reduce the cost of visibility.
- Additional tenant do not incur incremental costs, beside the cost of their additional traffic.
- Tenants that need on-boarding of apex domains (e.g. example.com), require attaching their distribution to a connection group associated with Anycast static IPs, a paid feature of CloudFront. If you are in control of the DNS, you can use instead alias records in Route 53.
- If your tenants share the same static content, you can implement dedicated domains (e.g. assets1.saas.com), using standalone CloudFront distributions to increase the cache hit ratio across tenants.
- In addition to native per tenant personalization capability of CloudFront SaaS Manager, such as the parameter based routing to the back-end, you can also personalize using edge functions. CloudFront Functions 's event object provides the tenant distribution id, allowing you, to customize the logic based on the tenant.
- For a paid tier, there is most likely an expectation of baseline security using an AWS WAF WebACL with the most appropriate rules for your application. Consider adding a rate limiting rule per tenant with a custom aggregation key based on the host header to protect your tenants from a noisy neighbor. For tenants that require specific security controls, you can use a per tenant AWS WAF WebACL.
Any opinions in this post are those of the individual author and may not reflect the opinions of AWS.