How can I copy data to and from Amazon EFS in parallel to maximize performance on my EC2 instance?

Last updated: 2019-05-31

I have a large number of files to copy or delete. How can I run these jobs in parallel on an Amazon Elastic File System (Amazon EFS) file system on my Amazon Elastic Compute Cloud (Amazon EC2) instance?

Short Description

Use one of the following tools to run jobs in parallel on an Amazon EFS file system:

Resolution

GNU parallel

1.    Install GNU parallel.

For Amazon Linux and RHEL 6:

$ sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
$ sudo yum install parallel nload -y

For RHEL 7 and Amazon Linux 2:

$ sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
$ sudo yum install parallel nload -y

For Ubuntu:

$ sudo apt-get install parallel

2.    Use rsync to copy the files to Amazon EFS.

$ time find -L /src -type f | parallel rsync -avR {} /dst

3.    Use the nload console application to monitor network traffic and bandwidth.

$ nload -u M

Msrsync

Msrsync is a Python wrapper for rsync that runs multiple rsync processes in parallel.

Note: Msrsync is compatible only with Python 2. You must run the msrsync script using Python version 2.7.14 or later.

1.    Install msrsync.

$ sudo curl -s https://raw.githubusercontent.com/jbd/msrsync/master/msrsync -o /usr/local/bin/msrsync && sudo chmod +x /usr/local/bin/msrsync

2.    Add the path to the current PATH variable.

$ export PATH=$PATH:~/usr/local/bin/msrsync

3.    Use the -p option to specify the number of rsync processes to run in parallel. Replace X with the number of rsync processes. The –P option shows the progress of each job.

$ time python msrsync -P -p X --stats --rsync "-artuv" /source/ /mnt/efs/test/

Fpsync

The fpsync tool synchronizes directories in parallel using fpart and rsync. It can execute several rsync processes locally or launch rsync transfers on several nodes (workers) through SSH.

1.    Enable the EPEL repository, and then install the fpart package.

For Amazon Linux and RHEL 6:

$ sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-6.noarch.rpm
$ sudo yum install fpart -y

For RHEL 7 and Amazon Linux 2:

$ sudo yum install -y https://dl.fedoraproject.org/pub/epel/epel-release-latest-7.noarch.rpm
$ sudo yum install fpart -y

For Ubuntu:

$ sudo apt-get install fpart

Note: In Ubuntu, fpsync is part of the fpart package.

2.    Use fpsync to synchronize the /dst and /src directories. Replace X with the number of rsync processes to run in parallel.

$ fpsync -n X /src /dst

Did this article help you?

Anything we could improve?


Need more help?