Scheduling a secure data transfer using AWS DataSync
Customers looking to transfer data have traditionally developed their own in-house solutions or used open-source tools. In addition to that, they have also had to account for the security and integrity of their data transfer jobs. AWS DataSync addresses overhead costs and pain-points by providing a complete suite of data transfer solutions out of the box.
DataSync is an online data transfer service that enables customers to copy data between on-premises data stores and AWS storage services. DataSync is frequently used to copy data from NFS and SMB file systems to Amazon Elastic File System (Amazon EFS), Amazon FSx for Windows File Server, and all Amazon Simple Storage Service (Amazon S3) storage classes. DataSync makes it simple to transfer data by eliminating or automating many tasks traditionally associated with data migration. This means that scripting copy jobs, scheduling and monitoring transfers, validating data, and optimizing network utilization can all be facilitated using DataSync.
With DataSync, the data being transferred between the on-premises storage and AWS is encrypted via Transport Layer Security (TLS). Once the data is transferred to AWS, AWS Key Management Service (AWS KMS) can be used to encrypt data at rest. DataSync also complies with global and industry security standards. You can find compliance details on the DataSync FAQs page. Additionally, DataSync enables customers to reduce their operational burden further by easily scheduling their data transfer tasks.
In this blog post, I show you how you can use the scheduling feature with new and existing DataSync tasks. I also include examples of creating custom schedules with DataSync.
Scheduling your DataSync tasks
DataSync provides a scheduling feature that enables customers to automatically execute transfer tasks to detect and copy changes from the source storage to the destination storage at specified intervals. Scheduling is handy if you want to execute recurring tasks to migrate data between on-premises and AWS Cloud storage. It enables you to transfer data to the cloud for running analytics and deriving insights, and to replicate data to AWS for disaster recovery or running hybrid applications. With DataSync scheduling, you can do all this without needing to write and run scripts to track repeated transfers.
With DataSync’s scheduling feature, you can configure a schedule for the data transfer task periodically at fixed times, dates, or intervals. A scheduled task runs automatically at the frequency you set.
Configure data transfer schedule
In this section, I demonstrate adding a schedule to an existing DataSync task, adding scheduling to new transfer tasks, and creating a custom schedule using the AWS Management Console.
Steps to add scheduling to new tasks
- Create a data transfer Task from the DataSync menu and configure the source and destination locations. More information about creating a task can be found here.
- In Step 3 Configure settings, enter the settings to control how a task execution behaves. Scroll down and choose Schedule.
- As shown in the following screenshot, in Schedule choose the desired Frequency. Choose Custom if you want to use a custom cron expression to run your task, and enter your expression in the Cron expression box.
- Choose Next, review settings and choose Create Task.
Steps to add scheduling to existing data transfer tasks
- Go to the DataSync service from the AWS Management Console.
- Choose Tasks from the DataSync menu:
- Choose the task for which you want to schedule the transfer, and from Actions choose Edit:
- In Schedule, choose the desired Frequency. Choose Custom if you want to use a custom cron expression to run your task, enter your expression in the Cron expression box, and choose Save changes:
Create a custom schedule
If you want to use a custom cron expression to run your task, then you must choose the Frequency as Custom and enter a Cron expression.
Cron expressions have six required fields, which are separated by white space. The six fields represent Minutes, Hours, Day-of-month, Month, Day-of-Week, and Year, respectively. Allowed values and wildcards for the fields can be found here.
Here are a few examples:
- Run at 6:00 am (UTC) every Monday through Friday:
- 0 6 ? * MON-FRI *
- Run at 8:00 pm (UTC) every 10th day of the month:
- 0 20 10 * ? *
Clarification on symbols:
The “*” (asterisk) wildcard includes all values in the field. In the Hours field, “*” would include every hour. You cannot use “*” in both the Day of-month and Day-of-week fields. If you use it in one, you must use “?” in the other.
The “?” (question mark) wildcard specifies one or another. In the Day-of-month field you could enter “7,” and if you didn’t care what day of the week the 7th was, you could enter “?” in the Day-of-week field.
If you were following the steps in the blog for testing purposes, ensure you delete the schedule by editing the DataSync task. All you need to do is choose Schedule, set the Frequency as Not scheduled and choose Save changes. Your DataSync task will no longer to run periodically.
In this blog post, I demonstrated how to add a scheduling feature to new and existing DataSync tasks. I also specified different examples of creating a custom schedule using the cron expression.
Task scheduling automatically runs tasks on an interval, alleviating the need for many traditional tasks associated with data migration. Customers looking to move data are no longer burdened with things like scripting copy jobs and monitoring transfers, and they can avoid the need for costly commercial transfer tools. It simplifies data transfers by providing a single tool to manage, monitor, and secure your data transfers for analysis and processing. It also enables you to archive data to free up on-premises storage capacity, or replicate data to AWS for business continuity.
Thanks for reading this post. Looking forward to your feedback and questions in the comments section below!