Secure and process raw data transfers at scale with AWS Transfer Family

Ibexlabs is an AWS Advanced Consulting Partner with 100+ AWS certifications covering four competencies and seven service delivery programs. As an AWS Partner with competencies in AWS Security and AWS Level 1 Managed Security Services, and as an AWS Well-Architected Partner, Ibexlabs applies AWS best practices to design and build cloud solutions that meet the highest standards of governance, security, and compliance.

At Ibexlabs we have customers with complex or multi-step workflows that produce data that must be processed at each step. This can be a challenge when multiple vendors and partners are working on multiple projects which are at different phases of implementation. Each phase produces large amounts of data that must be processed and manipulated into a structured format for queries and reports. To migrate this data at different phases of operation, our customers need a secure, highly available data transfer solution where they can transfer data between external providers and themselves at regular intervals with no interruption.

In this post, we explain how Ibexlabs was able to migrate data from a provider to a client while processing the data to track opportunities at different stages. This process enabled sales personnel to get their benefits faster. I’ll cover how we used AWS Transfer Family for SFTP to ingest external data into Amazon S3, AWS Lambda functions to process the data, and Amazon Simple Queue Service (Amazon SQS) to send/receive messages between software resources, and Amazon Relational Database Service (Amazon RDS) to store data.

Solution overview

The following diagram shows the key components that we used to deliver and process data using AWS Transfer Family combined with multiple AWS Lambda functions in a serverless workflow to achieve the target workflow.

Secure and process raw data transfers at scale with AWS Transfer Family

Figure 1: Secure and process raw data transfers at scale with AWS Transfer Family

AWS Transfer Family is a fully managed FTP, FTPS, SFTP, and AS2 service backed by either Amazon S3 or Amazon Elastic File System (Amazon EFS). This AWS managed solution relieves our customers of managing the additional infrastructure supporting thousands of concurrent users to transfer files quickly, scaling in line with the business needs and making it a highly available solution. It supports data encryption and allows for endpoints hosted in customer VPCs to meet security requirements.

AWS Lambda is a serverless, event-driven compute service that lets us execute code without provisioning or managing servers. We can run the code at the capacity we need, when we need it, by associating it with multiple AWS resources for a stable, scalable experience. It also allows us—or the client—to pay only for the compute time that is used—by the millisecond.

When a file is transferred using AWS Transfer Family into Amazon S3, an Amazon S3 PUT event triggers a receiver AWS Lambda function. This AWS Lambda function inserts all the received data into a logging table within the Amazon RDS for MySQL and sorts the rows according to their project key ID. It then aggregates all the successful interpolations in a batch of 10 to send it to processor Lambda using processor Amazon SQS.

Using the sorted data, the processor AWS Lambda queries the database, maps the data, and inserts the data into the multiple required tables as per business logic. This data is used on the application UI for various business requirements. Data with missing attributes are added to a failure table for further reprocessing. A user can access this data from the UI and can fill in the missing attributes to make the data viable. When a user accesses this data, an API call is made in the application server, which triggers the reprocessor Amazon SQS, which triggers the reprocessor AWS Lambda. Using the dealer ID, data is filtered for the failed business logic, and missing data is added and sent to the processor AWS Lambda in batches for insertions into the database.

Solution walkthrough

We demonstrate the preceding approach with an example of an external vendor transferring files using AWS Transfer Family to the dedicated Amazon S3 bucket and multiple AWS Lambda functions consuming that data to process it into an Amazon RDS relative to the business requirements.

The walkthrough consists of the following steps:

Transfer Family configuration with Amazon S3
Provisioning an Amazon RDS for MySQL database instance
AWS Serverless Application Model (AWS SAM) template for serverless workflow

AWS Transfer Family configuration with Amazon S3

Transfer Family makes the most sense for the secure transfer of the files as it aligned with our business requirements. Here we will configure our AWS Transfer Family endpoint in Amazon S3.

Before, provisioning the AWS Transfer Family Configuration, the following IAM Role with the mentioned policy and trust relationship needs to be created to allow Transfer Family to PUT objects into the required S3 bucket within a particular folder.

Note: Replace <receiver-bucket> with the name of your S3 bucket.

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "VisualEditor0",
            "Effect": "Allow",
            "Action": [
                "s3:PutObject",
                "s3:GetObject",
                "s3:ListBucket"
            ],
            "Resource": [
                "arn:aws:s3:::<receiver-bucket>",
                "arn:aws:s3:::<receiver-bucket>/*"
            ]
        },
        {
            "Sid": "VisualEditor1",
            "Effect": "Allow",
            "Action": "s3:ListAllMyBuckets",
            "Resource": "*"
        }
    ]
}

Code block 1: AWS Transfer Family user S3 upload IAM policy

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "",
            "Effect": "Allow",
            "Principal": {
                "Service": "transfer.amazonaws.com"
            },
            "Action": "sts:AssumeRole"
        }
    ]
}

Code block 2: AWS Transfer Family IAM trust telationship

Now, Open the Transfer Family Console and Click Create Server.

Choose protocols: SFTP (SSH File Transfer Protocol) – file transfer over Secure Shell.

AWS Transfer Family Protocol Type

Figure 2: AWS Transfer Family Protocol Type

2. Identity provider type: Service managed.

AWS Transfer Family Identity Provider

Figure 3: AWS Transfer Family Identity Provider

3. Endpoint Type: VPC hosted set to Internet facing. We choose a custom VPC with protected subnets, assigning EIP and a custom security group to manage access control.

AWS Transfer Family Endpoint, Network and Security Settings

Figure 4: AWS Transfer Family Endpoint, Network and Security Settings

4. Domain: Amazon S3. To store and access files as Amazon S3 objects over the secure protocols.

AWS Transfer Family Domain Types - Amazon S3 and Amazon EFS

Figure 5: AWS Transfer Family Domain Types – Amazon S3 and Amazon EFS

5. CloudWatch logging: We chose the creation of a role for logging the data of all incoming file transfers into CloudWatch.

: AWS Transfer Family Logging Settings

Figure 6: AWS Transfer Family Logging Settings

After setting up the AWS Transfer Family endpoint, complete the following steps:

Create a user with a previously created custom IAM Role to access an S3 bucket. This included a public key under SSH public keys section for secure access.

AWS Transfer Family User Settings

Figure 7: AWS Transfer Family User Settings

2. Restrict the use to a particular folder in the bucket using the restricted option, which isolated a path to the folder.

Provisioning an Amazon RDS for MySQL database instance

We use Amazon RDS for MySQL to store the structured data into multiple tables after the AWS Lambda functions process the incoming data from an external vendor using AWS Transfer Family. This Amazon RDS database instance is also configured to the application server to infer the details in a human-readable format to the end users. We will now show you how we provisioned an Amazon RDS for MySQL database instance.

In Subnet groups, create a DB subnet group and provide VPC and subnet details for the MySQL RDS to further create a custom subnet group.

Figure 8: Subnet groups for Amazon RDS

2. In the dashboard, under the Create database section, choose Create database.

3. Choose MySQL as the Engine type and a compatible engine version. Select either Production or Dev/Test based on your workload.

Amazon RDS Engine Type and Version

Figure 9: Amazon RDS Engine Type and Version

We can now configure the DB instance details. The following are the settings that you can configure.
Settings:

Availability and durability: We have chosen a single Availability Zone (AZ) that doesn’t create a standby instance. Using a Multi-AZ DB Instance will automatically provision and maintain a synchronous standby replica in a different Availability Zone. This helps when the primary Availability Zone is down.

Amazon RDS Availability and Durability

Figure 10: Amazon RDS Availability and Durability

DB Instance identifier: Name for the DB Instance that is unique in the account in the Region that is selected.
Master username: Username that will be used to log into the DB Instance.
Master password: Password that contains from 8 to 41 printable ASCII characters (excluding /,”, and @) for your master user password to log into the DB Instance.
Confirm password: Retype the same password.

Amazon RDS DB Identifier and Admin Credentials

Figure 11: Amazon RDS DB Identifier and Admin Credentials

Database Instance specifications

DB Instance class: Select the instance class to be used by Amazon RDS.
Storage type: Select the storage type to be used in the DB Instance.
Allocated storage: Select the default of 20 to allocate 20 GB of storage for the database. It can be scaled up to a maximum of 64 TB with Amazon RDS for MySQL.
Enable storage autoscaling: If your workload is cyclical or unpredictable, you can activate storage autoscaling to enable Amazon RDS to automatically scale up your storage when needed.
We need to provide the connectivity details for your DB instance.

Amazon RDS Storage Requirements

Figure 12: Amazon RDS Storage Requirements

Connectivity

Virtual private cloud (VPC): Select the custom VPC that is in use or newly created.
Subnet group: Select the previously created custom DB subnet group.
Public accessibility: Choose No.
VPC security groups: Either select an existing security group or choose Create new to create a new security group for the DB Instance.
Availability zone: Choose No preference.
Database port: Provide a custom port or leave the default value as 3306.

Amazon RDS Connectivity and Network Settings

Figure 13: Amazon RDS Connectivity and Network Settings

In the Additional configurations section:

Database options
- Database name: Type a database name that is 1 to 64 alphanumeric characters. If you do not provide a name, Amazon RDS will not automatically create a database on the DB Instance you are creating.
- DB parameter group: Leave the default value.
- Option group: Leave the default value.
Encryption: Choose default encryption.
Backup
- Backup retention period: The number of days to retain the backup. It can vary from 1 day to 35 days, with Seven days as the default retention period.
- Backup window: Choose a suitable backup window with a duration of the backup. The time is in UTC.

Amazon RDS Additional Settings - Database Options, Backups and Encryption

Figure 14: Amazon RDS Additional Settings – Database Options, Backups and Encryption

Monitoring
- Enhanced monitoring: Enable enhanced monitoring. Enabling enhanced monitoring will give you metrics in real-time for the operating system (OS) that your DB Instance runs on.

Maintenance

Auto minor version upgrade: Select Enable auto minor version upgrade to receive automatic updates when they become available.
Maintenance window: Choose a suitable maintenance window with a day of the week. The time is in UTC.

Deletion protection: Select Enable deletion protection to avoid accidental deletion.

Amazon RDS Addition Settings -Monitoring, Logs and Maintenance

Figure 15: Amazon RDS Addition Settings -Monitoring, Logs and Maintenance

Select Create Database.

The database should not start provisioning as per the provided configuration.

AWS SAM template for serverless workflow

To make an automated workflow for the serverless infrastructure—including multiple Lambda functions and Amazon SQS as well as to deployable code with the following AWS SAM template.

AWS SAM template to provision serverless resources deploys the following:

The Lambdas deployed into a custom VPC within the provided private subnets with a custom security group added using the security group ID.
Creates the required AWS-managed execution roles with the custom policies to connect to the S3 bucket and the Amazon SQS for the required Lambda functions.
Creates a standard Amazon SQS with custom visibility timeout and adds it as a trigger to the required Lambda functions.

To execute the above template, we need AWS SAM CLI installed as a pre-requisite and use the following commands to deploy the serverless infrastructure.

First, change into the project directory, where the template.yml file is located along with your code and run the following command:

sam build

To deploy the application into AWS Cloud, run the following command and follow the prompts

sam deploy --guided

Cleaning up

If you are done using the resources that are part of the blog, do not forget to clean up and check for any permissions, IAM roles that are no longer required to avoid recurring charges. Keep in mind that this includes the AWS Transfer Family endpoint, AWS RDS and the resources created using SAM template that you deployed.

AWS Transfer Family

Open the AWS Transfer Family console.
Select the server that you created.
Select the Actions dropdown, and choose Delete.
Enter “delete” in the pop-up to confirm the deletion of the server.

Amazon RDS

Open the Amazon RDS console.
In the navigation pane, choose the MySQL Database that you created.
Select Actions, and choose Delete.
For Create the final snapshot, choose No, and select the acknowledgement.
Choose Delete.

Amazon S3

In order to terminate the serverless resources, you must empty the S3 bucket.
Open the Amazon S3 console.
Search and select the created S3 bucket, and choose Empty.
On the Empty bucket page, confirm that you want to empty the bucket by entering “permanently delete” and choosing Empty.

Serverless resources provisioned via SAM Template

To delete the resources, you need to run the command sam delete and follow the prompts

sam delete --stack-name <stack-name>

This deletes the SAM application by deleting the AWS CloudFormation stack, the artifacts that were packaged.

Conclusion

In this post, we covered how Ibexlabs helps customers with complex, multi-step workflows requiring data transfers and processing at each step to build a solution using AWS Transfer Family. With this solution, you can seamlessly transfer bulk data at scale into AWS and process it in a cost-effective manner.

With AWS Lambda and Amazon SQS, pay only for the execution time rather than long-running VMs, without compromising security. Using AWS managed and serverless resources like AWS Lambda, Amazon SQS, and AWS Transfer Family over Amazon EC2 servers helped reduce overall usage and maintenance expenses. This has enabled coherent data transfer and transformation at scale with minimal manual effort.

Ibexlabs is a Launch Partner for the AWS Transfer Family Service Delivery Program. Learn how we leverage AI and ML to support customer needs.

AWS Storage Blog

Secure and process raw data transfers at scale with AWS Transfer Family

Solution overview

Solution walkthrough

AWS Transfer Family configuration with Amazon S3

Provisioning an Amazon RDS for MySQL database instance

Database Instance specifications

Connectivity

Maintenance

AWS SAM template for serverless workflow

Cleaning up

AWS Transfer Family

Amazon RDS

Amazon S3

Serverless resources provisioned via SAM Template

Conclusion

Resources

Follow