AWS Storage Blog

How Discover Financial secures file transfers with AWS Transfer Family

Discover Financial Services (NYSE: DFS) is a digital banking and payment services company with one of the most recognizable brands in US financial services. Since its inception in 1986, Discover has become one of the largest card issuers in the United States.

We are proud members of the platform team at Discover, where we are responsible for managing the company’s cloud data platforms. Over the last few years of Discover’s cloud journey, most of our big data and analytical workloads were migrated from on premises to the AWS Cloud. We have moved petabytes of data from our SOR databases, and we had other external vendor data received in on-premises SFTP systems moved to Amazon S3. As of now, we maintain an ongoing incremental movement of data to S3.

Part of our responsibility on the platform team is to enable infrastructure and platform services for business users, data operations teams, and data scientists to perform ETL (Extract, Transform, and Load) data preparation for machine learning and reporting. This requires that output data generated by various applications must move from the AWS Cloud to on premises in near-real time.

In this post, we discuss our work on leveraging AWS Transfer Family’s AWS Transfer for SFTP service, other AWS services, and a designed solution architecture to meet key business requirements. These business requirements are the parameters for our event-based transfers of many GBs of files between Discover’s on-premises data centers and the AWS Cloud:

  1. We must secure our file transfers end-to-end.
  2. Data transfer should not go over the public internet.
  3. Stringent SLAs, near real-time transfer for the files/critical reports, from Amazon S3 to on premises.
  4. Seamless integration with on-premises SFTP server, which acts as a gateway for any data transfer in and out of the Discover network.
  5. Better performance and scalability compared to on premises.
  6. GUI based monitoring tools to visualize the transfer stages and status.

Challenges with our original solution and why we chose AWS Transfer for SFTP

Here are some of the problems we faced with our original homegrown solution, which was initially in place to transfer files from on premises to the AWS Cloud:

  1. Frequent failures, particularly larger files.
  2. High latency for file transfers, leading to missed SLAs.
  3. We faced scalability issues when we transferred thousands of files, with delays caused in processing time and occasional failures.
  4. Operating and maintaining the services was complex, alerting mechanism was inconsistent.
  5. No near-real time transfer.

The impetus behind choosing AWS Transfer for SFTP

  1. Fully managed service, and straightforward to set up and use.
  2. VPC endpoint enablement to establish internal access from on-premises SFTP server to AWS Transfer for SFTP over AWS Direct Connect.
  3. Enabled secure key-based authentication with an on-premises file transfer system.
  4. AWS options to secure AWS Transfer for SFTP to Amazon S3 connections. We have the option to use IAM role-based bucket policy allow listing and AWS Key Management Service customer master keys (CMKs) for encrypting files.
  5. We performed thorough performance testing. AWS Transfer for SFTP scaling was excellent and it succeeded in production as we were able to meet our SLAs successfully.
  6. We used AWS services like Amazon SQS, Amazon SNS, and Amazon RDS to quickly build a solution using the outbound transfer option (Amazon S3 to on premises) to transfer files.
  7.  Custom user interface seamlessly integrated for monitoring progress.

Solution overview

In this section, we provide an overview of our solution, which we represent with this architecture diagram:

Discover Financial architecture using AWS Transfer SFTP, Amazon S3, Amazon SQS. and more to transfer files to the Cloud.

As per Discover standards, the existing secure FTP framework handles any file transfers in and out of the Discover network. The framework performs required legal validations for the corporate data release. We also integrated the FTP framework with other on-premises systems, however it does not support file transfers to the cloud.

As depicted in the preceding diagram, we integrated the on-premises SFTP server with AWS Transfer for SFTP using AWS Direct Connect, Amazon Route 53, and VPCE endpoints. We used two Amazon S3 buckets for the purpose of receiving and sending file transfers. We push files from the on-premises SFTP server to the Amazon S3 input bucket. We also enabled event notifications for these buckets, which we process using Amazon SNS and a Lambda Function, and the metadata is stored in Amazon RDS. We use the metadata as input for the SFTP UI, hosted in OpenShift Container Platform, and we publish bucket-specific event notifications in Amazon SQS. We subscribed Spring Boot microservices running in EC2 instances to SQS to process the event notification. When SQS notifies the microservices of the event, they initiate the file transfer between the Amazon S3 output bucket to the on-premises system’s bucket using AWS Transfer for SFTP.

AWS IAM roles and policies for AWS Transfer for SFTP

This section describes the required AWS IAM roles and policies for AWS Transfer for SFTP

Required roles and policies:

  1. We created an IAM role for AWS Transfer for SFTP with the following IAM policy. This role enables AWS Transfer for SFTP to put or get files from your Amazon S3 bucket depending on your use case.
{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Action": [
                "s3:ListBucket",
                "s3:GetBucketLocation"
            ],
            "Resource": [
                "arn:aws:s3:::bucket_name "
            ],
            "Effect": "Allow",
            "Sid": "AllowListingOfUserFolder"
        },
        {
            "Action": [
                "s3:Get*",
                "s3:Put*"
            ],
            "Resource": “arn:aws:s3:::bucket_name/*”
            "Effect": "Allow",
            "Sid": "HomeDirObjectAccess"
        }
    ]
}

Update Trust Policy:

{
  "Version": "2012-10-17",
  "Statement": [
    {
      "Effect": "Allow",
      "Principal": {
        "Service": "transfer.amazonaws.com"
      },
      "Action": "sts:AssumeRole"
    }
  ]
}
  1. In our use case, S3 buckets have restricted access, and they only allow requests coming from VPC Gateway endpoints. We added another condition to allow an IAM role we created for AWS Transfer for SFTP.

From the AWS CLI, we executed the command to get the user id
(example: aws iam get-role  --role-name “Test-Role”).

We included the following condition to allow list the user-id in the Amazon S3 bucket policy, which will grant write access for AWS Transfer for SFTP to the Amazon S3 bucket.

{
            "Sid": "DenyUnlessSFTPUser",
            "Effect": "Deny",
            "Principal": "*",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:DeleteObject",
                "s3:GetObjectVersion",
                "s3:DeleteObjectVersion"
            ],
            "Resource": "bucket_arn/*", "Condition": { "StringNotLike": { "aws:userid": "user_id:*"
                }
}
        }	

Our AWS Transfer for SFTP server

After creating the roles and policies in the preceding section, we followed the steps included in the AWS documentation to create an AWS Transfer for SFTP server.

We chose the following options during the AWS Transfer for SFTP setup in alignment with our use case:

  • Choose protocols: SFTP.
  • Identity provider: Service Managed.
  • Endpoint Type: VPC hosted with access set to Internal.
  • Under VPC we selected a VPC ID, and checked off the Availability Zones boxes (more than one recommended). At this step we also select Subnet IDs, and enabled FIPS.
  • Amazon CloudWatch logging – we chose the existing role we created in the prerequisite by selecting Choose an existing role.
  • We added Tags.

Once the server is ready, we create a user, assign the Amazon IAM role (refer to the previous section), and include the public key in user proprieties.

We followed this AWS documentation to complete the preceding step.

Our solution in practice

The following diagrams depict the data flow sequences for our AWS Transfer for SFTP solution with VPC endpoints and service-managed users.

Inbound pattern

This pattern is for data that is moving into the cloud (Amazon S3). We created a user under AWS Transfer for SFTP called “xyz-ingress” – we keep the public key and private key on an on-premises SFTP server.

When an internal or external customer wants to transfer files to Amazon S3, an on-premises SFTP server connects to AWS Transfer for SFTP using key-based authentication. After connecting, the IAM role grants access to put the file in S3.

Data Flow

Data flow for inbound data to Amazon S3 from on-premises SFTP serverWe have added an additional condition to the bucket policy that ensures that traffic only comes from our VPC or the IAM role we use for SFTP. Once files land in Amazon S3, our solution triggers an event to Amazon SNS and then sends a notification to AWS Lambda to update the status in Amazon RDS. You can view this detail in the dashboard to follow.

Outbound pattern

This pattern is for data that is moving out of the cloud (Amazon S3) in near-real time. It involves an integrated solution using AWS Transfer for SFTP with other AWS services like Amazon S3, Amazon SNS, Amazon SQS, and Amazon EC2 to transfer files to on-premises servers.

Data Flow

Data flow for outbound data to on-premises SFTP server from Amazon S3

We have subscribed our microservices running in EC2 instances to Amazon SQS for file transfer initiation notifications. These microservices leverage AWS Transfer for SFTP to pull from Amazon S3 and transfer the files to an on-premises server.

Dashboard

Per our business requirements, we must have a user interface to view the file transmissions status and generate reports. This comes as a dashboard, which enables the support team to perform monitoring and supply production support.

To implement the dashboard, we used AWS services like Amazon S3, Amazon SNS, AWS Lambda, Amazon RDS, and Spring Boot microservices running in EC2 instances.

Data flow for outbound data to on-premises SFTP server from Amazon S3

The following screenshot depicts the SFTP dashboard. Our users use this dashboard to know the status of their file transfers; it also provides the summary of total files transferred and project/department specific transfer status including if any transmission errors occur.

Discover Financials SFTP dashboard built on AWS

Conclusion

Discover’s internal business and data operation users on AWS can now leverage AWS Transfer for SFTP to move large volumes of data. We can now complete these data transfers in a fast and cost-effective way across various AWS environments and on premises. Our transfer solution encompasses security best practices from the AWS Well-Architected Framework, which ensures an end-to-end secure file transfer. AWS Transfer for SFTP is also highly reliable, highly resilient, and can successfully meet our file transfer SLAs – every time. Thousands of large file transfers are happening successfully every day, consistently meeting the expected file transfer rate without any glitches. The fully managed AWS Transfer for SFTP service has certainly enhanced our customer experience.

The collaboration with the AWS Transfer for SFTP service team and solutions architects helped us immensely in outlining a secure design. Security is paramount for Discover’s AWS Cloud ecosystem and this solution design went through all internal security reviews and approvals successfully.

Thanks for reading this blog and learning more about how Discover Financial Services is using AWS Transfer for SFTP to secure our file transfers. I hope this blog post provided some helpful guidance on how you can leverage AWS Transfer for SFTP to improve file transfer security at your organization. If you have any comments or questions, do not hesitate to leave them in the comments section.

Kiran Chennuri

Kiran Chennuri

Kiran Chennuri is Director of Cloud & Data Platforms Engineering at Discover Financial Services. Kiran has held various leadership roles in finance and healthcare organizations focusing on cloud strategy, enablement, and application modernization. He has a passion for transforming organizations to serve the new digital age.

Nadeem Khan

Nadeem Khan

Nadeem is a Principal Cloud Engineer for Discover Financial Services. He is responsible for design and implementation of resilient cloud data platforms, including designing and building secure and reliable applications. He is passionate about AWS Networking & Serverless technologies. He has AWS Solutions Architect Professional, Advanced Networking, and Security Specialty certifications.

Senthil Kumar Thiagarajan

Senthil Kumar Thiagarajan

Senthil is a Sr. Principal Cloud Architect for Discover Financial Services. He is responsible for designing the AWS Cloud data platform at Discover. His responsibilities include defining reference architectures and patterns, such as machine learning engineering design, site reliability, and chaos engineering. He is passionate about leading innovation. His previous experience spans architecture as well as engineering leadership roles.