AWS Storage Blog

Simplify data migrations using an AWS DataSync agent on Linux KVM Hypervisor

Migrating data to the cloud often requires customers to dedicate time, budget, and employee resources that can take away from other initiatives. Reconfiguring your existing infrastructure or investing in virtualization platforms is time intensive and expensive. For customers that predominantly run Linux operation systems on-premises can take advantage of deploying an on-premises AWS DataSync agent on a KVM hypervisor host without making changes to their existing infrastructure or investing in other virtualization platforms. AWS DataSync supports the ability to deploy an on-premises agent using VMware ESXi, Microsoft Hyper-V, and Linux Kernel-based Virtual Machine (KVM) hypervisor hosts for data migration. Using Linux KVM hypervisor enables a convenient option for customers who do not have a VMware ESXi or a Microsoft Hyper-V platform to deploy a DataSync agent. This results in a cost-effective hypervisor solution that creates a simpler and easier migration of large datasets from on-premises storage systems to AWS storage services. Today, customers use AWS DataSync to migrate data to AWS, replicate their data to AWS for business continuity, archive their data to AWS for cost efficiency and upload their data for timely in-cloud data processing by other AWS services.

KVM is an open source virtualization technology that allows you to convert a Linux-based host machine into one or more virtual machines. KVM is included in all versions of Linux 2.6.20 and newer. DataSync supports CentOS/RHEL 7.8, Ubuntu 16.04 LTS, and Ubuntu 18.04 LTS Linux operating systems. KVM virsh is a command line tool that you can use to manage virtual machines; however, you can also manage KVM virtual machines through a graphical user interface. Most advanced Linux administrators use virsh when automating virtual machine management. In this blog post, I show you how to deploy a DataSync agent on a KVM hypervisor on an Ubuntu 18.04 LTS distribution using KVM virsh. These instructions also apply to other supported Linux distributions.

Walkthrough

I start with the following steps required to deploy a DataSync agent on Linux KVM hypervisor, so I can take advantage of this cost-effective hypervisor solution that save costs and migrates active data sets for timely processing in the cloud. The steps discussed in this bog are as follows:

  1. Deploy AWS DataSync on KVM hypervisor host.
  2. Install KVM packages and all of its dependencies.
  3. Start and enable the KVM service.
  4. Extract the KVM image in qcow2 format.
  5. Create a virtual machine.
  6. Retrieve the activation key from the virtual machine.
  7. Activate a DataSync agent.

Deploy AWS DataSync on KVM Hypervisor Host

To start off, log into the AWS Management Console and search for AWS DataSync. Once identified, click on the Get started button.

Figure-1-AWS-DataSync-Management-Console-Login

Figure 1: AWS DataSync console login

Select the hypervisor type you would like to use under the Hypervisor drop-down list. In this case, select the Kernel-based Virtual machine (KVM) option.

Figure-2-AWS-DataSync-Management-Console-Create-agent

Figure 2: AWS DataSync console: Create agent

Below the drop down-list, right click on Download the image and choose Copy Link in order to initiate a download of the KVM image. Once done, navigate to your KVM hypervisor host and change the directory to /mnt. Finally, run the wget command from the Command Line Interface (CLI) to download the KVM image which should appear as follows:

final-wget-command-KVM

Install KVM and its required packages

From the Linux KVM hypervisor host, run the following CLI commands to install the required KVM packages and all its dependencies.

sudo apt update
sudo apt install qemu qemu-kvm libvirt-bin virt-manager

The command should then appear as follows:

commands-to-install-the-required-KVM-packages

Start and enable the KVM service

Once the required KVM packages are installed, start the libvirtd service (the libvirtd service or program is the server-side daemon component for managing the KVM virtualization platform). Once done, proceed with enabling the libvirtd service to ensure its state persists across system reboots.

sudo service libvirtd start
sudo update-rc.d libvirtd enable
sudo service libvirtd status

The output should appear as follows:

enabling-the-libvirtd-service

From /mnt directory, unzip and extract DataSync’s KVM image in qcow2 format.

KVM-image-in-qcow2-format

Create virtual machine using virt-install command

The following command demonstrates how to manually install the DataSync agent VM with defined parameters and settings, such as different memory or vCPU allocation.

sudo virt-install  \
--name "Datasync-Agent"  \
--description "AWS DataSync agent"  \
--os-type=generic  \
--ram=65536  \
--vcpus=4        \
--disk path=/mnt/aws-datasync-2.0.1641250071.1-x86_64.xfs.gpt.qcow2,bus=virtio,size=80 \
--network default,model=virtio  \
--graphics spice  \
--import \

When deploying a DataSync agent VM on Ubuntu 18.04 LTS, set the graphics option to spice. You won’t be able to access the VM console if the graphic option is set to none. However, the none option does work for CentOS/RHEL 7.8 OS.

The root disk would be stored in the path specified. The disk path section should state actual folder path to the DataSync KVM image. The root disk will be created by KVM and stored in the disk path. In this example, I use /mnt. I recommend running with 4 vCPUs and 32GB RAM for most deployments. See the AWS DataSync User Guide for more information about agent requirements.

After waiting a few minutes for the installation to complete, check the operating state of the VM using the following CLI command:

virsh list

Once the VM is running, I log into the VM in order to fetch the DataSync agent activation key by passing the VM name in the following CLI command:

virsh console vm_name

The output I see shows the state of my VM and its login console as seen in the following image:

virsh-output

Next, login to the local VM console. For information on how to access your local VM console, you can read more details in this AWS DataSync User Guide.

Figure-3-Accessing-DataSync-agent-VM-console

Figure 3: Accessing DataSync agent VM console

To leave the DataSync VM console, press CTRL and “]” on your keyboard. This action drops you back to the Linux shell.

Activate a DataSync agent

Now that the VM has been deployed, you need to proceed with activating the agent. Follow the next steps to retrieve an activation key:

  1. Type 0
  2. Enter the region
  3. Enter the endpoint type

Figure-4-Fetching-the-activation-key

Figure 4: Fetching the activation key

DataSync agents can be activated via public endpoints, FIPS endpoints, and VPC endpoints. In this blog, I chose to demonstrate activating an agent via a public endpoint. For more information on choosing a DataSync endpoint type, please see the AWS DataSync User Guide.

Next, copy the activation key generated from the VM and navigate to the AWS DataSync console. From the console, select manually enter your agent’s activation key and paste the activation key in the window provided. You can optionally enter a name for your DataSync agent. Next, click Create agent to activate the DataSync agent.

Figure-5-Applying-the-activation-key-to-activate-the-DataSync-agent

Figure 5: Applying the activation key to activate the DataSync agent

After the agent is activated, you should see the agent listed and showing as online in the AWS DataSync Management Console as shown in figure 6.

Figure-6-Confirmation-that-the-DataSync-agent-is-online

Figure 6: Confirmation that the DataSync agent is online

Now that the agent is successfully activated. The next step would be to create your task by creating the source and destination locations, and then run the task. To complete this step, follow the AWS Documentation user guide for configuring source and destination locations available here.

Conclusion

AWS DataSync support for Linux KVM hypervisor makes it convenient for you to deploy a DataSync agent in your environment without altering the shape of your existing infrastructure or investing on additional virtualization software licenses. In this post, I demonstrated how you can deploy an AWS DataSync agent on Linux KVM hypervisor host. Doing so results in more flexibility, thereby enabling you to move active datasets for processing in the cloud and also reduce on-premises primary storage costs. To learn more about this solution and DataSync, follow these resources:

  1. AWS DataSync
  2. AWS DataSync User Guides

Thanks for reading this blog post. I encourage you to try this configuration today in your own Linux environment. If you have any questions or comments, please leave a note in the comments section.

Simeon Adeniyi Oladokun

Simeon Adeniyi Oladokun

Simeon is a Cloud Engineer with AWS. He has many years of experience working with Unix Systems, storage and virtualisation platforms. Outside of work, he enjoys spending time with his family, and playing music.