AWS Partner Network (APN) Blog
How to Use AWS Transfer Family to Replace and Scale SFTP Servers
By Roger Simon, AWS Offering Solution Architect at DXC Technology
By Pierre Merle, Partner Solutions Architect at AWS
DXC Technology |
At DXC Technology, many clients are seen moving on-premises workloads to Amazon Web Services (AWS). However, some critical workloads may need to stay on-premises, leading to a hybrid architecture.
In the financial services domain, it’s a common architecture pattern to find shared services file servers that act as SFTP file server or FTP server.
Because these financial applications are not always API driven, data exchange using flat files remains the standard way to share information between applications, even when some of them have been migrated to AWS. While these shared services are not part of the customer’s core applications, they must be migrated with the same level of service.
The core application in this use case generates orders that need to be processed by various applications. If the data exchange process fails, this can lead to major issues for the business, such as delaying a launch or go-live date. It may also affect a customer’s overall migration success.
In this post, we will discuss how DXC addressed migrating this type of server using AWS services like AWS Transfer Family, Amazon Simple Storage Service (Amazon S3), and Amazon Elastic File System (Amazon EFS). We’ll also provide a step-by-step explanation of the proposed solution.
DXC Technology is an AWS Premier Tier Services Partner and AWS Managed Cloud Service Provider (MSP) that helps clients harness the power of innovation. DXC has AWS Competencies in Migration, SAP, and Internet of Things (IoT), and is a member of the AWS Well-Architected Partner Program.
Challenges
To perform a migration of this type, the usual approach is to “lift and shift” the existing solution on-premises, using a commodity server like an Amazon Elastic Compute Cloud (Amazon EC2) general purpose T2 or T3 instance with a virtualized SFTP server.
This is what initially happened in the customer’s case, and discussing expected requirements with them uncovered a few additional challenges.
The overall solution the customer wanted needed to address:
- Support critical business operations: The solution needs to be highly available and scalable, as exchange with other applications or third parties can be critical in the overall availability of the services. A standalone T2/T3 instance inside one AWS Availability Zone (AZs) would not be able to address that.
- Support of new services: Current exchange protocols are in SFTP/FTP, but clients want to offer services with new types of protocols based on HTTP(S) to increase their security. On the T2/T3 instance, it will require installation of new software that needs to be patched and maintained to comply with security best practices.
- Automation of tasks: Clients are aware automations can be easily performed in the cloud, particularly on AWS, and they would like to have some automation happening when there is a new file coming. This could include copying files to an Amazon EFS share file for old applications that can’t process directly files from S3.
Solution Overview
The solution DXC deployed primarily uses AWS Transfer Family, a fully managed AWS service you can use to transfer files into and out of storage or file systems over the following protocols:
- Secure Shell (SSH) File Transfer Protocol (SFTP) – (AWS Transfer for SFTP)
- File Transfer Protocol Secure (FTPS) – (AWS Transfer for FTPS)
- File Transfer Protocol (FTP) – (AWS Transfer for FTP)
With AWS Transfer Family, you don’t need to install, patch, and maintain file transfer software and operating systems, as AWS takes care of those activities.
To address the challenges outlined above, DXC built the following architecture:
Figure 1 – General architecture of the solution.
First, DXC built a private virtual private cloud (VPC) with two subnets in two Availability Zones which has no internet access and will connect to AWS via VPC endpoints.
In this VPC, AWS Transfer for SFTP Server is accessible via two VPC endpoints. The AWS Transfer Server is backed by an S3 bucket. An Amazon Route 53 zone connects to the SFTP server with a friendly name.
The solution is based on the following AWS building blocks:
- Amazon S3 provides highly durable and scalable storage.
- AWS Transfer Family is a managed service that provides SFTP endpoints that present S3 buckets to clients and performs authentication. We’ll use private endpoints that allow only private connections to the SFTP server. Note that FTP access is also possible in private mode even if this protocol is not encrypted, and should be avoided for a security concern.
- Amazon Route 53 is a DNS managed service that provides private hosted zones and AWS resolvers to allow the usage of user-friendly names.
Figure 2 – Amazon S3 to Amazon EFS Replication Mechanism.
To deliver automation on the reception of a new file, DXC used:
- Amazon S3 publishes to an Amazon Simple Notification Service (SNS) topic. SNS is a managed service used to publish or subscribe messages between different components. It allows decorrelation and scalability between components.
- An SNS function subscribes to this topic and copies a newly-created file to an EFS mount. AWS Lambda copies from S3 to EFS.
All of the AWS resources are provisioned with an AWS CloudFormation template. This ensures the deployment is repeatable on different accounts and regions, that various versions of the deployment can be archived for audit, and that users are able to clean everything up easily if this part needs to be remove later.
Prerequisites
To deploy the solution described here, you should have the following prerequisites:
- An AWS account.
- Administrative rights on that account.
Deploying the Solution
The full solution is provided as an AWS CloudFormation template you can deploy. To get started, download the CloudFormation template locally on your workstation.
Go to the AWS Management Console and select AWS CloudFormation as a service. Then, select the region where you want to deploy and click on Create Stack with new resources.
Select the file you have downloaded as the template file, click Next, and then name the stack
Finally, select two different AZs in your selected region, as proposed in the provided combo box values. You can leave the rest as is, or change the values according to your needs.
Explanation of Content in the CloudFormation Template
In this next section, we’ll describe step-by-step the solution deployed using the CloudFormation template above.
Step 1: Create an Amazon VPC and Private Hosted Zone
The CloudFormation template will create an Amazon VPC in two AZs with no access to the internet. It will also create an Amazon Route 53 private hosted zone with myexample.com DNS name.
Figure 3 – Amazon VPC and subnets.
Step 2: Create Amazon S3 Bucket
Next, we will create the Amazon S3 bucket that’ll be the back end of the SFTP server. It will host the files for the SFTP server.
Ensure this bucket is not publicly accessible and is encrypted with a customer managed encryption key using Amazon Key Management Service (KMS). This service makes it easy to create and manage cryptographic keys and control their use across a wide range of AWS services.
Finally, set a notification configuration so that every time a new file is put in the root folder, it publishes a message in an SNS topic. This is described later in the Automation section of this post.
Figure 4 – Amazon S3 notification configuration.
Step 3: Create AWS Transfer and Related Resources
We want to keep the SFTP server fully private, so we need to reference the Amazon VPC endpoint Id and specify the endpoint type to VPC_ENDPOINT.
We’ll use the default identity provider type (SERVICE_MANAGED), and users will be managed by the AWS Transfer Family and support access through SSH keys. You can also use an external identity provider like Microsoft Active Directory. To learn more, see the documentation.
We want to log all of the user activity (connection, downloads, upload), and we specify a Logging Role so that AWS Transfer Family can write on Amazon CloudWatch Logs.
Figure 5 – SFTP server configuration.
Next, we’ll create the SFTP VPC endpoint that users will connect to.
Figure 6 – VPC endpoint configuration.
As we want to have a user-friendly name, we declare a record in Amazon Route 53:
Figure 7 – Amazon Route 53 record for SFTP server.
Finally, we create the Amazon CloudWatch Log Group where AWS Transfer Family will be able to push the connection logs with the AWS Identity and Access Management (IAM) role and policy to allow AWS Transfer Family to do it.
Figure 8 – Amazon CloudWatch Log Group to store the connection logs.
The corresponding code is for SFTP log access role and policy:
We also need an SFTP user to connect, so we set its home directory. Be careful with the SFTP style declaration.
Next, set the IAM role (named here *SftpAccessRole) that will be assumed by AWS Transfer family when the user will connect.
Figure 9 – sftpuser user.
Next, we need to define the SftpAccessRole that AWS Transfer Family will assume for the user.
Figure 10 – SftpAccessRole.
This IAM role has the SftpAccessPolicy attached, which gives the required rights to put, get, and delete files in the root folder of the bucket.
As our bucket is encrypted, we also need to allow encryption and decryption with the KMS key.
The corresponding code is for SftpAccessPolicy:
Step 4: Automation When a File is Written in S3
For older applications, manipulating files in Amazon S3 may be not easy. A good way to share files between servers is to use an EFS drive because it provides standard NFS mount points to access its content. Let’s copy the file as soon as it arrives on S3.
The best candidate to do that is a Lambda function. Lambda is serverless and highly available by design, so we don’t have to provision an Amazon EC2 instance to perform this activity.
The Lambda function needs to have access to EFS and the Amazon VPC in which it’s hosted.
First, let’s create the Lambda function. As we are going to connect to an EFS drive, we need to specify on which subnets we are executing the function and with which security group.
We also need to specify the mount point that will use the function. This has to be under /mnt and can be very different from the one you are using on your application server. Here, we are using /mnt/sftp and need to specify the specific mount point we’ll use to access the EFS mount.
The Python code below retrieves from the SNS message the bucket and object name. It then recreates the directory structure, and copies the file directly from S3 to EFS with S3 download file API call.
The corresponding code is for the Lambda function:
Next, we next need to create the topic to which S3 bucket is publishing. This topic will send a message when a new file has arrived.
We need to define an SNS policy that will allow S3 to push event, and the Lambda function to subscribe to the topic.
The corresponding code is for the SNS topic policy:
Now, we will create the EFS access point. Be careful with UID (User ID) and GID (Group ID) as they need to fit with your EFS configuration.
Figure 11 – Amazon EFS access point.
The Lambda function requires the following abilities:
- Mount and write files on the mount.
- Create and delete network interfaces in the VPC.
- Read file on the S3 bucket (we don’t need write rights here).
- Subscribe to the SNS topic.
- Create logs in Amazon CloudWatch.
- Decrypt file on the S3 buckets (we don’t need encrypt as we don’t need to write files).
The corresponding code is for the Lambda role:
The corresponding code is to grant the Lambda InvokeFunction to SNS in the CloudFormation template:
Finally, we will create the subscription to the SNS topic for the Lambda function.
Figure 12 – Amazon SNS subscription.
Cleanup
If you want to delete all of resources named in this post, just delete the AWS CloudFormation stack. Before doing that, delete all of the files you may have in your S3 bucket, otherwise CloudFormation will refuse to delete it.
Next Steps
Files are stored on S3, which provide events notification as we have seen. Destinations like SNS, Amazon Simple Queue Service (SQS), and Lambda are supported.
Any further automation can be done as:
- Setting up a vulnerability scan with a ClamAV Lambda function to check malware in the stored files. Refer to this post to see an example.
- S3 replication to perform disaster in another region.
- Implement post transfer ETL job with AWS Glue that allows transformation, normalization, and loading of the data to more destinations if you want to perform some later search if the transferred files content.
Conclusion
With this architecture developed for a financial services customer, DXC Technology was able to build a highly available, durable, scalable solution without having to patch servers and administer them. DXC has provided the same level of service that’s achieved when migrating some of the applications to AWS in a hybrid environment.
We also demonstrated the capability of building automated post transfer activities using AWS Lambda and Amazon EFS. We took care of security to protect customer data with KMS to encrypt data at rest on EFS and Amazon S3 storage.
DXC Technology – AWS Partner Spotlight
DXC Technology is an AWS Premier Tier Services Partner and MSP that understands the complexities of migrating workloads to AWS in large-scale environments, and the skills needed for success.
Contact DXC Technology | Partner Overview
*Already worked with DXC Technology? Rate the Partner
*To review an AWS Partner, you must be a customer that has worked with them directly on a project.