Designing a Multi-Tenant SFTP Server with AWS Transfer Family
By Hiroto Sakuraya, Partner Solutions Architect – AWS
Data security is a particularly important topic for multi-tenant software-as-a-service (SaaS) applications that handle customers’ sensitive data.
How to securely segregate tenant data and how to provide data access to customers will vary depending on the SaaS solution’s architecture and its requirements. Creating secured mechanisms to achieve tenant isolation on your own is possible, but usually requires significant effort and consumes valuable time.
This post explores how SaaS vendors can build secure, scalable, and cost-effective data exchange mechanisms using SFTP (SSH File Transfer Protocol) with AWS managed services.
AWS Transfer Family securely scales your recurring business-to-business file transfers to Amazon Simple Storage Service (Amazon S3) and Amazon Elastic File System (Amazon EFS) using SFTP, FTPS, and FTP protocols. You don’t need to manage servers or take care of scaling. AWS Transfer Family allows you to quickly provide your customers with a managed, scalable SFTP endpoint and securely exchange data back and forth to Amazon S3 or EFS.
There are several ways SaaS vendors can make data accessible to customers. Today, SaaS providers typically use web-based user interface (UI) to display data in a human-readable format or fill in forms to modify data. In this case, HTTP(S) is used as the protocol, but some may want to exchange data via different protocols such as SFTP due to customer requirements or technical limitations.
SFTP/FTPS/FTP is a common standard protocol for transferring files between a server and a client. Since the content is not encrypted in FTP (File Transfer Protocol) communication, you should secure connection with SSL/TLS (FTPS) or replaced with SFTP to protect customer sensitive data.
These protocols are sometimes required, even if your application is constructed as web-based SaaS delivery model. For example, if you offer solutions like hosting a website you may want to allow customers to upload contents via SFTP, so that developers could easily introduce release pipeline automation using these protocols. Or, you may want to synchronize data between your SaaS application and a third-party external services like a CRM, but not all solutions support HTTP. In such cases, the alternative to create an SFTP endpoint for customers can be considered.
So, what is the best way to build an SFTP server in a multi-tenant way? In a multi-tenant SaaS environment where all tenants share the same underlying infrastructure, you need robust, highly available, performant, and also cost-effective way to serve tenants requirements.
In the following sections, we’ll look at design patterns you can choose to create a multi-tenant SFTP function, taking into account requirements such as tenant isolation, performance, cost, and compliance.
Tenant Isolation Model
First, let’s review the basic SaaS architecture model. It can be divided into two types of models: silos and pools. The silo model is a way to provision dedicated infrastructure for each tenants. There is no sharing construct, so you can draw clear boundaries between tenants.
The pool model, on the other hand, refers to a model where all or part of the infrastructure is shared across all of the tenants. You can also combine silo and pool per layer basis such as compute, database, network, or per microservice. This is called a bridge model.
Figure 1 – Overview of SaaS architecture model.
Both silo and pool models have their own pros and cons, and you must understand their trade-offs. Let’s take a look into how can these models be applied to AWS Transfer Family architectural design.
Silo model is more likely to be adopted by customers with strict compliance requirements because tenant environments are isolated from each other. Another big advantage of this design is that different configuration can be chosen for tenants.
In order to start working with AWS Transfer Family, we first create a resource called a “server,” which represents the SFTP server itself. We choose endpoint type (public or VPC hosted), identity provider, which protocol is enabled, and so on. Each server has own unique endpoint like “s-a7afxxxxxxxx.server.transfer.us-west-2.amazonaws.com.” Users connecting to the endpoint authenticate by password or SSH key, and if authenticated, they can get authorized access to the data stored in Amazon S3 or Amazon EFS.
Figure 2 – Silo model architecture.
If you provision a dedicated server for each tenant, you can choose different configuration for tenants; for example, identity provider or custom hostname.
AWS Transfer Family supports three types of user management. One tenant can use service-managed user management that doesn’t require the customer to manage their own user identities, and another can integrate with existing Active Directory running on the customer’s on-premises data center. In addition, custom authentication using AWS Lambda is also available, so it’s possible to introduce a more flexible authentication mechanism such as restriction by source IP address for specific tenants.
The main challenge in this approach is cost. AWS Transfer Family charges based on the amount of data transferred, and hourly rate per enabled protocol per server. As the number of tenants grow, the cost increases linearly in proportion to the number of servers, even when data access is infrequent. If it doesn’t pay, turn to multi-tenancy.
The same goes for management complexity. The more tenants onboarded into your solution, the more environments you have to manage. Typically, up to a few dozen tenants may work (it really depends on the SaaS solution and its architecture), but hundreds or thousands of tenant will likely present a significant challenge. That scalability issue can be a major blocker for your business growth.
Another challenge is the service quota. The number of servers you can create per AWS account is up to 50 by default (which can be increased). Carefully consider whether this model can be adopted, considering not only the current number of tenants supported, but also the future business growth.
The pool model brings benefits such as simple manageability through centralized shared infrastructure and cost optimization through efficient capacity utilization. In this model, you must introduce a strict tenant isolation scheme. It’s also necessary to take measures against noisy neighbor problems.
In the pool model, you need to provision only one server. All tenants share a single endpoint and try to authenticate against common identity provider. While the silo model is more competitive in terms of configuration flexibility and compliance requirements, this shared infrastructure model can provide significant cost savings and prevent operational complexity.
In this model, the hostname is shared by multiple tenants, but you can also let each tenant access the same server with a unique name by registering a CNAME record in Amazon Route 53.
Figure 3 – Pool model architecture.
As we saw the service quota at the end of the silo model, here we should evaluate whether the single server could handle all tenants requests properly. When resources are shared by multiple tenants, we should always care about availability and throughput because each tenant activity varies and is unpredictable. For details, read this AWS blog post: Performance Efficiency in AWS Multi-Tenant SaaS Environments.
As for AWS Transfer Family, a server will scale automatically based on load up to the maximum number of concurrent sessions per server (10,000 at the time of writing). If the traffic exceeds this level, you should start to consider introducing throttling or balancing load to multiple servers.
Authentication and Authorization
Now that we have a basic understanding of the architecture design for each isolation model, we’ll look into the authentication and authorization next. From here, we will dig deeper into multi-tenant design.
In this example, we’ll use key based authN and ID and password based AuthN. We use custom identity provider to manage users, and this authentication type allows SaaS vendors to integrate any external identity provider, or perform arbitrary additional check such as restrictions by the source IP address.
Figure 4 – Authentication flow.
The figure above shows the high-level authentication flow. Once users try to connect an endpoint, AWS Transfer Family forwards requests to a Lambda function which is responsible for authentication. If authenticated, you need to construct a response following the predetermined format and return it to a Transfer Family server. For details, see the documentation.
If you have already built this architecture once, you may wonder why there is no Amazon API Gateway. It used to be necessary to integrate with API Gateway when using a custom identity provider, but this update simplifies it to call the Lambda function directly. Still, it’s useful to proxy requests with API Gateway for customers who want to integrate AWS WAF. For details, see this AWS blog post: Securing AWS Transfer Family with AWS Web Application Firewall and Amazon API Gateway.
In this sample, we manage users of each tenant with AWS Secrets Manager. Please see this blog post for implementation details. In a nutshell, we store user credential (password, public key) and other accompanying information (IAM role, home directory, source IP address) into a single secret keyed with the tenant identifier (this will be the username when connecting).
For example, secret “mysolution/tenant1” looks like:
If you need more than one user in a tenant, you can extend key like “mysolution-1209 /tenant1/user1” for better manageability. AWS Secrets Manager supports the hierarchical key structure.
The backend Lambda function is a shared service. Since this function does not access the AWS resources associated with a specific tenant such as database, there’s no need to be aware of tenant isolation here. Requests from users forwarded by Transfer Family includes serverId, username, password, protocol, and source IP address. A sample request is shown below:
With this username as the key, retrieve the secret from AWS Secrets Manager and perform authentication. If the password field is not empty, confirm whether the password matches. If not, the user is trying key-based authentication so simply returns public key associated with that user. If authentication succeeds, response must include an AWS Identity and Access Management (IAM) role.
How to design a policy depends on the data partitioning strategy you take. You can create an IAM role for each tenant in advance that is scoped to the extent this tenant can access, and save the Amazon Resource Name (ARN) in secret. Then, you can use this without additional lookup, since you have already retrieved the secret at the authentication process. Alternatively, you can also use ABAC instead. Note that some components such as IAM policies must be refactored in order for ABAC to work.
Including a home directory setting in the response also plays a useful role for tenant isolation.
Data Partitioning and Logical Directory Mapping
For our example, we chose Amazon S3 as the storage service behind AWS Transfer Family. There are several patterns that can be used when designing multi-tenant partitioning. Please see this repository or read the SaaS Storage Strategies whitepaper for details.
Here, we adopt a “prefix per tenant” strategy where each tenant has dedicated prefix such as “/tenant1” in single shared S3 bucket. A tenant can access only under the assigned prefix. The prefix-level tenant isolation is simple and easy to understand, and you can leverage a IAM policy resource attribute to prevent cross-tenant access.
IAM policy for tenant1 looks like:
The goal here is that, after authentication, the user will land as the root directory in the prefix assigned to tenant. This can be achieved by specifying the home directory.
Specifically, we will use the Logical Directory feature of AWS Transfer Family. This enables you to construct an arbitrary virtual directory structure by mapping the path displayed to the user to the actual S3 path. You can use logical directories to set the user’s root directory to a desired location within your storage hierarchy, by performing what is known as a chroot operation.
Figure 5 – Logical directory composed of multi buckets.
This feature has a powerful effect on restricting access to data. In this mode, users are not able to navigate to a directory outside of the home or root directory you’ve configured for them. If you don’t use this, it would be possible for some clients to traverse up folders and move them to locations they don’t have access to.
In the data structure we are assuming, there are folders nearby on the same hierarchy assigned to other tenants, which can cause unexpected cross-tenant access. To prevent this, be sure to introduce this chroot scheme in addition to scoping with IAM role.
If you have a same data partitioning model that shares Amazon S3 bucket across tenants, the same idea can be used also in a silo model. Alternatively, S3 access points integration is also available.
There is no one-size-fits-all architecture. You can choose silo or pool component by component according to your requirements and combine them in order to optimize architecture.
Onboarding and Automation
Finally, let’s look into onboarding. In this context, onboarding refers to the entire setup process required for new tenants to sign up for a service and start using it. For example, registering tenant information in the database, creating administrative users, generating temporary password, and also provisioning dedicated infrastructure to that tenant in the case of a silo model.
The onboarding process should be automated as much as possible for agility and better operational experience.
Figure 6 – Sample onboarding flow.
You need to set up the necessary resources beforehand so tenants can start communicating via SFTP immediately. The figure above shows an example of how these processes consist of creating an IAM role and corresponding policy, generating an SSH key, registering a secret in AWS Secrets Manager, and notification of a temporary secret key download URL. You should incorporate this part into your automated onboarding process.
Now you can build a secure and scalable multi-tenant SFTP server using AWS managed services with less cost and operational complexity. By using the methods described in this post, you can make data accessible via SFTP to customers without risking cross-tenant access.
The Logical Directory feature of AWS Transfer Family and authorization mechanism implemented by AWS Identity and Access Management (IAM) will help you to reduce undifferentiated heavy lifting that doesn’t add value and allow you to ensure secure tenant isolation. Let’s accelerate your innovation with the time saved by using managed services.
If you still have requirements that shared resources are unacceptable due to compliance or noisy neighbor, consider architecting with the silo model we’ve introduced. Note that you need to care about service quota limits, costs and operational management challenge.
Finally, frictionless onboarding is also important because it enables users to gain more value from your SaaS solution in agile way. Remember that you need to focus on the user experience and build the entire solution, including onboarding, to accelerate time to value.