AWS Storage Blog

How Regeneron built a secure and scalable file transfer service using AWS Transfer Family

Secure and fast transfer of mission critical data is a top priority for today’s digital businesses. Fueled by the expectation of “anywhere, anytime information”, any type of delay impacts operational efficiency and effectiveness, making a scalable and secure data transfer solution a priority. Healthcare and life sciences organizations need a secure, compliant and scalable File Transfer Service.

Regeneron, a leading biotechnology company, was using a legacy on-premises solution, which was not able to scale to support critical business operations. Regeneron faced operational challenges related to scalability, maintenance of software, and security risks due to the lack of advanced data encryption and malware detection capabilities. In addition, there were other concerns such as licensing costs, increasing storage costs, and need for user authentication methods that seamlessly work for both internal and external users.

The Technology Engineering Group within Regeneron used AWS Transfer Family to build a file transfer service, Regeneron Transfer Service (RTS), to meet their ever-increasing file transfer needs. RTS is leveraged by internal users and thousands of external partners including Clinical Research Organizations (CRO’s) and hospitals to exchange and store terabytes of data per day.

In this blog, we elaborate on how Regeneron built a secure and scalable File Transfer Solution using AWS Transfer Family. The goal is to migrate from a legacy system to a more robust and scalable system that will provide tools to automate and deliver an end-to-end file transfer capability and ensure it also complies with GxP procedures.

Solution overview

RTS was built using AWS Transfer Family, which is a managed service and enables the transfer of files using the Secure File Transfer Protocol (SFTP), also known as Secure Shell (SSH) File Transfer Protocol. SFTP remains one of the most stable methods to transfer data across the enterprises. While the basic protocol hasn’t changed in the last two decades, a few key enhancements were made to keep STFP secure and reliable.

RTS is highly secure and provides three types of authentication (SSH Key, Password protected, AD authentication). The following architecture diagram highlights the end-to-end solution.

Architecture diagram depicting internal and external users moving data through a sftp client to an Amazon S3 bucket

Regeneron uses this solution to provide a secure and scalable file transfer service, which allows seamless exchange of data files internally or with third parties vendors and partners. AWS Transfer Family provides a fully managed, highly available file transfer service with auto-scaling capabilities, eliminating the need to manage file transfer related infrastructure. End user workflows remain unchanged and data uploaded or downloaded over the SFTP protocol is stored in an Amazon S3 bucket, which uses AWS Key Management Service (KMS) to encrypt the data objects.

S3 Lifecycle Policies are used to maintain the required retention period for different type of files. Files are initially moved to low cost cold storage (Amazon S3 Glacier) and later purged per the organization’s record retention policy.

RTS service uses Regeneron’s existing custom hostname, which is associated with the AWS Transfer Family server endpoint. The solution supports any standard out of the box SFTP client. Some commonly used clients are listed below:

i. OpenSSH – A Macintosh and Linux command line utility

ii. WinSCP – A Windows-only graphical client

iii. Cyberduck – A Linux, Macintosh, and Microsoft Windows graphical client

iv. FileZilla – A Linux, Macintosh, and Windows graphical client

Authentication and authorization

RTS allows internal or external users to use the SSH keys or the user-id and password combination to validate their identify. Once authenticated it assumes the role associated with that user to provide right level of Amazon S3 folder access. User authentication and authorization is managed by integrating external identity providers with AWS Transfer Family service using a custom RESTful API provisioned using Amazon API Gateway. This API is configured to take input parameters from AWS Transfer Family as per defined standards and then invokes an AWS Lambda function using a parameter mapping template.

During on-boarding, user information is captured in Amazon DynamoDB. AWS Lambda function queries the Amazon DynamoDB based on user-id and retrieves user attributes, which indicates the types of authentication required for each user. Depending on type of user/authentication, AWS Lambda performs three different authentication methods mentioned below:

  • If SSH key is provided within an authentication request then the corresponding public key is returned to the SFTP service in the API response.
  • If Active Directory (AD) based authentication is required, Ping Identity OAuth2.0 API is invoked to validate the user. If the user is authenticated successfully then user authorization data is retrieved from Amazon DynamoDB and sent back in an API response.
  • If external user authentication is required, Amazon Cognito API is invoked to validate the user. If the user is successfully authenticated by Amazon Cognito then user authorization data is retrieved from Amazon DynamoDB and sent back in an API response.

Finally, a standard response is sent back to AWS Transfer Family that allows users to access the appropriate Amazon S3 bucket.

This solution uses policy-based user authorization and every user, based on the pre-defined policy, has its own access to the Amazon S3 bucket/folder. After successful authentication, this policy is retrieved from Amazon DynamoDB and then returned to SFTP service as part of API response. SFTP service translates this policy at its end to orchestrate user access to the Amazon S3 bucket/folder.

User on-boarding

Users are provisioned using a web application built on serverless architecture using AWS Amplify as shown below. AWS AppSync is used to connect the web app securily to Amazon Dynamo DB via GraphQL APIs and Amazon CloudFront enables the fast and secure deliver of web contents.

Architecture showing a web app to Amazon Cloudfront

This app is used to store user metadata along with authorization policy in Amazon DynamoDB and only internal admin users will have access to onboard new users. The following attributes are captured during onboarding:

Attribute Description

SFTPUser

SFTP User ID/Name

BucketPolicy

Access policy on buckets

HomeBucket

Landing bucket name

HomeDirectoryMappings

Scope down policy to limit the directory access

Role

Main role having bucket level access

UserEmail

User email for Active Directory authentication

UserType

Identifies user type (INT, EXT, AD)

BU

Business Unit

RTS Operations UI

This solution also incorporates a Web UI (Filestash) that can be leveraged by users (external or internal) to transfer files using AWS Transfer Family Service. This provides flexibility to users as they can either use out of the box SFTP clients, as mentioned above, or use their web browser to use the RTS. The following architecture diagram highlights the different components of the Web UI, which is built using Filestash UI installed on Docker containers.

External user to a firewall with AWS Direct Connect to Availability Zones

In this setup, users connect to the web application via Application Load Balancer using https protocol, and then the web application initiates the SFTP protocol with AWS Transfer Family. Authentication and authorization are executed by AWS Transfer Family as explained in the earlier “Authentication and authorization” section.

In this set up users connect to the web application via Application Load Balancer using https protocol. The web application initiates the SFTP protocol with AWS Transfer Family over AWS Direct Connect, which is used to provide dedicated and reliable network connection between -Regeneron’s on-premises network and AWS. Authentication and authorization are executed by AWS Transfer Family as explained in the preceding “Authentication and authorization” section.

Post-upload scanning process

RTS is also uses the Trend Micro Deep Security (Installed on auto-scaled EC2 Nodes) for detecting any malware. Files loaded on to the Amazon S3 are locked immediately for all users until scanning is completed.

AWS transfer family to Amazon S3 to lock newly uploaded files and unlock for consumption

For each uploaded file, a message is dropped into an Amazon Simple Queue Service (SQS), which is polled by a Lambda function. Lambda is triggered periodically by Amazon EventBridge to distribute the SQS messages to Trend Micro nodes, which scan the file for malware. Distribution of messages to nodes is done using a hashing algorithm and first in first out (FIFO) method. Once scanning is completed successfully, a file is unlocked and ready for consumption.

Conclusion

Regeneron transitioned from an on-premises legacy transfer service to AWS Transfer Family to address risks associated with security, scalability, high availability and the cost of ownership. With the right planning, tools, and deep knowledge, Regeneron successfully migrated from legacy system to the new Regeneron Transfer Service leveraging AWS (using AWS Transfer Family and Amazon S3).

Their on-premises legacy system had challenges related to the technical debt, availability, scaling and storage due to growth and archiving of data to meet compliance requirements. The previous solution had high cost related to maintenance, license and storage. Migrating to AWS Transfer Family, backed by Amazon S3, optimized storage costs by up to 80% and reduced overall cost by 90%. Migration to managed AWS services reduced labor cost that was needed to maintain the legacy system. The overall migration took about 4 months to complete and was easy to deploy. The new service also uses automated Trend Micro scanning as part of intake process to mitigate potential malware attacks.

The RTS service running on AWS allows the customer to scale operations per business needs and supports thousands of users, storing large volumes of data (TB) and millions of file transfers in timely fashion. Furthermore, the new RTS simplifies user experience, allowing improved UI access, and provides 3 different types of authentication mechanisms to support internal users, CROs, and external vendors.

RTS can serve users, CROs and vendors globally distributed across different regions and improve file transfer experience related to geographical distance from server endpoints leveraging AWS Transfer Family. For more information refer to the blog “Minimize network latency with your AWS Transfer for SFTP servers.”

Abdul Shaik

Abdul Shaik

Abdul is an Senior Director, Head of Data & Analytics Architecture & Insights at Regeneron Pharmaceuticals, representing the Digital Technology and Engineering team. Abdul is a thought leader in Cloud, Data platform, AI & ML technologies, with over 16 years of experience in data space. Deep Architecture experience in Modern technology Platforms. Incubated multiple technology teams throughout the Carrier, including cloud, Data platform, Artificial Intelligence and Machine learning at Regeneron.

Eliyas Mahammad

Eliyas Mahammad

Eliyas is an Associate Director and Technology Leader in Data & Analytics at Regeneron Pharmaceuticals, representing the Digital Technology and Engineering team. Eliyas delivers innovative, integrated programming solutions and is responsible for the engineering, design, and build of connected data eco-systems, data lake and Modern Data technologies. He is also responsible for the Data Mesh and Data Product development practices behind Regeneron’s Data as a Product strategy, focusing on Data Domains, Data Virtualization, and knowledge models.

Rakesh Singh

Rakesh Singh

Rakesh is a Senior Director, Head of Compute, Database & DevOps at Regeneron Pharmaceuticals, representing the IT Enterprise Services. Rakesh is an accomplished Information Technology Leader with over 20 years of experience focused on Cloud Integration, Digital Transformation, Application Development/Delivery, Dev-Ops, Architecture and Design of business-critical strategic solutions. A team builder that believes in the creation of close knit and cohesive teams, and who mentors & develops technical talent.

Sachin Jain

Sachin Jain

Sachin is a Senior Solutions Architect at Amazon Web Services (AWS) with focus on helping Healthcare and Life-Sciences customers in their cloud journey. He has over 20 years of experience in technology, healthcare and engineering space.

Sanjoy Thanneer

Sanjoy Thanneer

Sanjoy a Sr. Technical Account Manager with AWS based out of New York. He has over 20 years of experience working in Database and Analytics Domains.  He is passionate about helping enterprise customers build scalable , resilient and cost efficient Applications.

Sri Anmalsetty

Sri Anmalsetty

Sri is an Associate Director of Cloud Operations at Regeneron Pharmaceuticals. He has 20+ years of experience in IT Engineering and Delivery and is very passionate about proving seamless solutions to users that meet and exceed their expectations and TCO/ROI considerations. He is very interested in developing Automated solutions for complex problems.