Supporting big data and analytics engagements with AWS Transfer for SFTP
Leap Beyond is a multinational boutique consultancy that specialises in big data. Our goal is to help medium to large enterprises use their data to realise high-impact changes in their businesses and their bottom line. In my role as one of the managing partners, I have special responsibility and interest in how our engineers work with clients to securely and efficiently move and transform data.
A reasonably common scenario for a data-focused consultancy, like Leap Beyond, is that a client may want to ship sensitive data from their on-premises or cloud environment to the AWS Cloud. There are a number of reasons that a client may want to copy their data into AWS. It may be difficult for such clients to work with it onsite because the tools and compute power they need may not be available inside their environment, or there may be no access to their data stores for use by services outside their environment. In many cases, they may want to provide an extract of data, rather than the raw sources. These are all valid scenarios under which the simplest goal is the capability of dumping sensitive data into an Amazon Simple Storage Service (Amazon S3) bucket under your (a consultancy firm’s) control, so you can use a multitude of services for processing, analysis, and ML, while spinning up compute resources as and when you need it. With AWS Transfer for SFTP, your customers now have additional options to securely upload data to your S3 bucket, and in rest of this post I will talk about how you can take advantage of this option.
AWS Transfer for SFTP with Amazon S3
As a securable data store, Amazon S3 is very hard to beat. The cost of storage at rest, and of data transfer in and out, is low and trending toward zero. There are several convenient ways of providing transparent encryption at rest on the server side, and reasonably convenient ways of doing client-side encryption. Access control and auditing can be fine-grained, and unless you do something silly, access can be locked down hard. The bad, old days of inadvertently leaving buckets open to the internet are long past. It now takes active effort to open an S3 bucket for public access. Learn more about using Amazon S3 Block Public Access – and check out the short AWS blog on ‘Learn how to use two important Amazon S3 security-features: Block Public Access and S3 Object Lock‘.
Amazon S3 requires your clients to install the AWS Command Line Interface (CLI) tooling or some third-party S3 client software, or to log in to your account using the AWS Management Console to access the bucket via a web browser. In either of these cases, you will need to provide your client with access credentials for your account and securely manage the exchange of those credentials. AWS Transfer for SFTP provides you with another secure option through use of SSH keys for authentication, while offering the rock solid and proven security of SFTP.
On top of the great security provided by the combination of SFTP and S3, there are two significant advantages for Leap Beyond and our clients. First, clients who cannot (or are resistant to) install and learn to use the AWS Command Line interface (CLI) to send files, can now use more familiar tooling to send us data. Second, the use of SSH key pairs gives the client considerable control over the security of data in transit. The use of key pairs, rather than passwords or other credentials, makes it easy to have dedicated credentials for the file transfer that can be revoked or updated periodically. Data is securely transferred using an encrypted tunnel, and exchange of key materials between our clients and us can be safely done.
Creating an AWS Transfer for SFTP server
The user guide provides the steps to get started, but does not make it entirely clear that there are three steps. These could be theoretically automated using CloudFormation or Terraform (depending on what you use in your organization).
Create a bucket
First step is to create a bucket, with all public access turned off. Ideally you want to enable server-side encryption-at-rest (either using Amazon S3’s Server-Side Encryption or Key Management Service), and leverage any S3 Access logging, versioning, and lifecycle management you want. Remember, this is a standard S3 bucket with nothing different about it, so you can easily adapt your standard bucket policies and infrastructure as code tooling. As an aside, I would recommend that the bucket used as a target for SFTP is not where the data is retained. It is straightforward to set up a data pipeline that will move files from the “dropbox” bucket to a “working” bucket when they land. This adds additional assurances around the risk of the data being publicly exposed. Think of this bucket residing in your DMZ, as opposed to buckets used purely for internal access only.
Create an IAM role and policy
Next, an IAM role and policy is needed which allows the SFTP service to read and write the bucket. The user guide gives a clear tutorial of the requirements and process here. I would suggest having a distinct role and policy for each instance of the SFTP service / bucket pair – roles and policies are essentially free, and it will make auditing much simpler. Also I strongly recommend you “chroot” your user via the new logical directory functionality released in September 2019, as described in this recent AWS blog post, so it’s secure for you and simpler for them to navigate.
Create a SFTP server
The second to last step is to create an SFTP server. Again, the user guide is good here, and the creation wizard is clear. The endpoint can be public (which is what you want to expose to a client), or tied to a VPC (which could be useful for the case where you have a VPC that has a private connection into your on-premises network). In our case, we use “service managed” identities, since we use SSH key pairs and tag our servers and users for identification and classification purposes. Optionally, you can assign a host key for your server – preferably one that’s already been registered in your clients’ environments. This will prevent the seemingly abominable “Man-In-The-Middle attack” message from firing off on their end when you use different servers across engagements with the same client.
Add a user to the service
The final step is to add a user to the service. It is at this point that an association is formed between the SFTP endpoint and the bucket you created. The subtlety of the security model for the service is that the SFTP service itself does not know about the bucket. Instead, the IAM Role you created earlier, and supply during user creation, is what governs your users’ access to a particular bucket. A side effect of this is that it means that you could use different buckets for different users. In addition, the user configuration allows specification of a home directory in the bucket, which allows different virtual roots of the file system for different users. Remember, for additional security, you can “chroot” users to a specific location in your S3 bucket, as described in this AWS blog post.
At this point, your clients can now start sending you data, either at specific intervals or ad hoc. You can use S3 events to automate post upload processing in your data pipelines for extraction, transformation, and analysis.
As with any AWS service, it’s important to be aware of the costs involved to ensure you get the best value for money. With AWS Transfer for SFTP, the cost of storage and data transfer is low. The base provisioning cost of the service is $0.30/hour, adding up to about $216/month. You get rock solid security, reliability, and availability, married to blisteringly fast uploads and effectively unlimited storage, with negligible management and maintenance costs on your side. If you are providing an always-on service to your clients, this is good value for money indeed.
You should rationally evaluate the costs of the service though: if you are using it just for development and testing, or if client upload of data is not a regular occurrence, you may consider only keeping the service provisioned while it is in use. Given that the bucket, roles, and policies do not need to be torn down, you must only evaluate the cost of two options. Namely, does it cost more to tear down and rebuild the AWS Transfer for SFTP server when needed than it does to leave it running… which costs just about two cups of coffee a day!