AWS Storage Blog
How Regeneron built a secure and scalable file transfer service using AWS Transfer Family
Secure and fast transfer of mission critical data is a top priority for today’s digital businesses. Fueled by the expectation of “anywhere, anytime information”, any type of delay impacts operational efficiency and effectiveness, making a scalable and secure data transfer solution a priority. Healthcare and life sciences organizations need a secure, compliant and scalable File Transfer Service.
Regeneron, a leading biotechnology company, was using a legacy on-premises solution, which was not able to scale to support critical business operations. Regeneron faced operational challenges related to scalability, maintenance of software, and security risks due to the lack of advanced data encryption and malware detection capabilities. In addition, there were other concerns such as licensing costs, increasing storage costs, and need for user authentication methods that seamlessly work for both internal and external users.
The Technology Engineering Group within Regeneron used AWS Transfer Family to build a file transfer service, Regeneron Transfer Service (RTS), to meet their ever-increasing file transfer needs. RTS is leveraged by internal users and thousands of external partners including Clinical Research Organizations (CRO’s) and hospitals to exchange and store terabytes of data per day.
In this blog, we elaborate on how Regeneron built a secure and scalable File Transfer Solution using AWS Transfer Family. The goal is to migrate from a legacy system to a more robust and scalable system that will provide tools to automate and deliver an end-to-end file transfer capability and ensure it also complies with GxP procedures.
Solution overview
RTS was built using AWS Transfer Family, which is a managed service and enables the transfer of files using the Secure File Transfer Protocol (SFTP), also known as Secure Shell (SSH) File Transfer Protocol. SFTP remains one of the most stable methods to transfer data across the enterprises. While the basic protocol hasn’t changed in the last two decades, a few key enhancements were made to keep STFP secure and reliable.
RTS is highly secure and provides three types of authentication (SSH Key, Password protected, AD authentication). The following architecture diagram highlights the end-to-end solution.
Regeneron uses this solution to provide a secure and scalable file transfer service, which allows seamless exchange of data files internally or with third parties vendors and partners. AWS Transfer Family provides a fully managed, highly available file transfer service with auto-scaling capabilities, eliminating the need to manage file transfer related infrastructure. End user workflows remain unchanged and data uploaded or downloaded over the SFTP protocol is stored in an Amazon S3 bucket, which uses AWS Key Management Service (KMS) to encrypt the data objects.
S3 Lifecycle Policies are used to maintain the required retention period for different type of files. Files are initially moved to low cost cold storage (Amazon S3 Glacier) and later purged per the organization’s record retention policy.
RTS service uses Regeneron’s existing custom hostname, which is associated with the AWS Transfer Family server endpoint. The solution supports any standard out of the box SFTP client. Some commonly used clients are listed below:
i. OpenSSH – A Macintosh and Linux command line utility
ii. WinSCP – A Windows-only graphical client
iii. Cyberduck – A Linux, Macintosh, and Microsoft Windows graphical client
iv. FileZilla – A Linux, Macintosh, and Windows graphical client
Authentication and authorization
RTS allows internal or external users to use the SSH keys or the user-id and password combination to validate their identify. Once authenticated it assumes the role associated with that user to provide right level of Amazon S3 folder access. User authentication and authorization is managed by integrating external identity providers with AWS Transfer Family service using a custom RESTful API provisioned using Amazon API Gateway. This API is configured to take input parameters from AWS Transfer Family as per defined standards and then invokes an AWS Lambda function using a parameter mapping template.
During on-boarding, user information is captured in Amazon DynamoDB. AWS Lambda function queries the Amazon DynamoDB based on user-id and retrieves user attributes, which indicates the types of authentication required for each user. Depending on type of user/authentication, AWS Lambda performs three different authentication methods mentioned below:
- If SSH key is provided within an authentication request then the corresponding public key is returned to the SFTP service in the API response.
- If Active Directory (AD) based authentication is required, Ping Identity OAuth2.0 API is invoked to validate the user. If the user is authenticated successfully then user authorization data is retrieved from Amazon DynamoDB and sent back in an API response.
- If external user authentication is required, Amazon Cognito API is invoked to validate the user. If the user is successfully authenticated by Amazon Cognito then user authorization data is retrieved from Amazon DynamoDB and sent back in an API response.
Finally, a standard response is sent back to AWS Transfer Family that allows users to access the appropriate Amazon S3 bucket.
This solution uses policy-based user authorization and every user, based on the pre-defined policy, has its own access to the Amazon S3 bucket/folder. After successful authentication, this policy is retrieved from Amazon DynamoDB and then returned to SFTP service as part of API response. SFTP service translates this policy at its end to orchestrate user access to the Amazon S3 bucket/folder.
User on-boarding
Users are provisioned using a web application built on serverless architecture using AWS Amplify as shown below. AWS AppSync is used to connect the web app securily to Amazon Dynamo DB via GraphQL APIs and Amazon CloudFront enables the fast and secure deliver of web contents.
This app is used to store user metadata along with authorization policy in Amazon DynamoDB and only internal admin users will have access to onboard new users. The following attributes are captured during onboarding:
Attribute | Description |
SFTPUser |
SFTP User ID/Name |
BucketPolicy |
Access policy on buckets |
HomeBucket |
Landing bucket name |
HomeDirectoryMappings |
Scope down policy to limit the directory access |
Role |
Main role having bucket level access |
UserEmail |
User email for Active Directory authentication |
UserType |
Identifies user type (INT, EXT, AD) |
BU |
Business Unit |
RTS Operations UI
This solution also incorporates a Web UI (Filestash) that can be leveraged by users (external or internal) to transfer files using AWS Transfer Family Service. This provides flexibility to users as they can either use out of the box SFTP clients, as mentioned above, or use their web browser to use the RTS. The following architecture diagram highlights the different components of the Web UI, which is built using Filestash UI installed on Docker containers.
In this setup, users connect to the web application via Application Load Balancer using https protocol, and then the web application initiates the SFTP protocol with AWS Transfer Family. Authentication and authorization are executed by AWS Transfer Family as explained in the earlier “Authentication and authorization” section.
Post-upload scanning process
RTS is also uses the Trend Micro Deep Security (Installed on auto-scaled EC2 Nodes) for detecting any malware. Files loaded on to the Amazon S3 are locked immediately for all users until scanning is completed.
For each uploaded file, a message is dropped into an Amazon Simple Queue Service (SQS), which is polled by a Lambda function. Lambda is triggered periodically by Amazon EventBridge to distribute the SQS messages to Trend Micro nodes, which scan the file for malware. Distribution of messages to nodes is done using a hashing algorithm and first in first out (FIFO) method. Once scanning is completed successfully, a file is unlocked and ready for consumption.
Conclusion
Regeneron transitioned from an on-premises legacy transfer service to AWS Transfer Family to address risks associated with security, scalability, high availability and the cost of ownership. With the right planning, tools, and deep knowledge, Regeneron successfully migrated from legacy system to the new Regeneron Transfer Service leveraging AWS (using AWS Transfer Family and Amazon S3).
Their on-premises legacy system had challenges related to the technical debt, availability, scaling and storage due to growth and archiving of data to meet compliance requirements. The previous solution had high cost related to maintenance, license and storage. Migrating to AWS Transfer Family, backed by Amazon S3, optimized storage costs by up to 80% and reduced overall cost by 90%. Migration to managed AWS services reduced labor cost that was needed to maintain the legacy system. The overall migration took about 4 months to complete and was easy to deploy. The new service also uses automated Trend Micro scanning as part of intake process to mitigate potential malware attacks.
The RTS service running on AWS allows the customer to scale operations per business needs and supports thousands of users, storing large volumes of data (TB) and millions of file transfers in timely fashion. Furthermore, the new RTS simplifies user experience, allowing improved UI access, and provides 3 different types of authentication mechanisms to support internal users, CROs, and external vendors.
RTS can serve users, CROs and vendors globally distributed across different regions and improve file transfer experience related to geographical distance from server endpoints leveraging AWS Transfer Family. For more information refer to the blog “Minimize network latency with your AWS Transfer for SFTP servers.”