AWS Storage Blog

One way to migrate data from Azure Blob Storage to Amazon S3

Many businesses face situations where they must migrate their digital content, like images, text files, or data (from a database), from one place to another. More specifically, they may face mandates requiring a hybrid architecture or mixed-cloud solution. Certain cloud customers have faced the problem of not knowing how to move data from Azure Blob Storage to Amazon Simple Storage Service (Amazon S3). They may not know how to choose between their data migration options, or where to start.

Most customers understand the value of moving data from on-premises storage to the cloud: no overprovisioning/pay-as-you-go, eliminating hardware refresh cycles, better durability, fully managed services, and the list goes on. However, We’ve encountered a number of customers who want to move data stored on another cloud platform to Amazon S3. Perhaps they’re going through an organizational change or they want to diversify their cloud storage options. Or maybe they want to migrate their data from Azure NoSQL databases, like Azure Table Storage or Azure CosmosDB, to similar destinations in the AWS Cloud. One popular path for these conversions is to migrate the data to Azure Blob Storage, then to Amazon S3, where it can be transferred to final destinations like Amazon DynamoDB.

In this blog post, we cover one approach to migrating content from Microsoft Azure Blob Storage into Amazon S3 using the AWS Elastic Beanstalk service. The migration of the content from Azure Blob Storage to Amazon S3 is taken care of by an open source Node.js package named “azure-blob-to-s3.” One major advantage in using this Node.js package is that it tracks all files that are copied from Azure Blob Storage to Amazon S3. It also has a “resume” feature, which is useful if you happen to experience connectivity failures. While this approach is just one of many, it is fairly simple and involves fewer tools.

Prerequisites

If you are following along with this blog post, we assume that you are familiar with the following:

Additionally, it is assumed that you have valid accounts with both AWS Cloud and Azure Cloud.

Trying out this solution would incur charges to both your AWS and Azure Cloud accounts. We have included a cleanup section at the end of this post to help you avoid unnecessary charges.

Source data to be migrated

For the purposes of this demonstration, we are going to migrate the IMDB movies dataset to Amazon S3. The IMDb movies dataset (moviedata.json) must already be available in Azure Blob Storage, and in this case it is a JSON file. As mentioned previously, you can have other kinds of files too. We detail how this JSON file is migrated to Amazon S3 in the following sections. The following screenshot shows the previously mentioned JSON file in Azure Blob Storage:

IMDB movies dataset (moviedata.json) in Azure blob storage

Solution

The main AWS service that drives our solution is Elastic Beanstalk. We ultimately host the “azure-blob-to-s3” node package in an Elastic Beanstalk environment. The proposed solution is depicted in the following pictorial representation:

Elastic Beanstalk-based solution with node package hosted in Beanstalk environment

Setting up application

Let us dive into the details of implementing the solution step by step.

There are two main steps that are involved in achieving this solution from an implementation standpoint:

  1. Create and bundle the Node.js package, along with the required Amazon S3 and Azure Blob storage connection details.
  2. Deploy the bundled Node.js package in an Elastic Beanstalk worker environment.

Creating a Node.js package

  1. Open up a terminal window and create an empty directory.
  2. Navigate to the created directory, and create a package.json file by issuing an npm init command.
  3. You are prompted to provide the following details:
    • Name: <<Name for your Node.js package>>.
    • Version: <<Initial package version number. We recommend following semantic versioning guidelines like starting with 1.0.0>>.
    • Main: <<This is the file that gets executed when our Node.js package gets executed. Provide name for this file Server.js>>.
  4. Download and install the azure-blob-to-s3, inquirer, beautify, and fs node modules.
    • To make sure that the packages got installed without errors, run the command $npm install --save azure-blob-to-s3 inquirer beautify fs at the terminal.
  5. Create a text file named index.js file and copy the following code in it.
#!/usr/bin/env node
const inquirer = require("inquirer");
const fs = require("fs");
const beautify = require('beautify');

const enterValues = () => {
  const input_values = [
    {
      name: "aws_region",
      type: "input",
      message: "Enter AWS Region:"
    },
    {
      type: "input",
      name: "bucket_name",
      message: "Enter S3 Bucket Name:"
    },
    {
      type: "input",
      name: "azure_connection",
      message: "Enter Azure Connection String:"
    },
    {
      type: "input",
      name: "azure_container",
      message: "Enter Azure Container:"
    }
  ];
  return inquirer.prompt(input_values);
};

const createFile = (aws_region, bucket_name, azure_connection, azure_container) => {

  const code = `var toS3 = require('azure-blob-to-s3')` +
                `\n` +
                `\n` +
                `toS3({` +
                  `aws: {` +
                    `region: "${aws_region}",` +
                    `bucket: "${bucket_name}"` +
                  `},` +
                  `azure: {` +
                    `connection: "${azure_connection}",` +
                    `container: "${azure_container}"` +
                  `}` +
                `})`;
  const data = beautify(code, {format: 'js'});
  fs.writeFile('server.js', data, function (err) {
    if (err) throw err;
    console.log('File is created successfully.');
  });

};


const run = async () => {

  const values = await enterValues();

  const { aws_region, bucket_name, azure_connection, azure_container } = values;

  const filePath = createFile(aws_region.trim(), bucket_name.trim(), azure_connection.trim(), azure_container.trim());

};

            run();	
  1. Open a terminal window, run the command node index.js, and enter values for AWS Region, S3 bucket name, Azure connection String, and Azure container. On successful execution, you should see a Server.js file created in the folder.
  2. To make a zip file, compress the server.js, package.json, and package-lock.json files. We upload this zip file while we are creating an Elastic Beanstalk worker environment detailed in the following steps.

Deploy Node.js package in AWS Elastic Beanstalk

  1. Navigate to the Elastic Beanstalk console, and choose Create New Application. Provide an application name and an appropriate description for your application, then choose the Create button:

Create a new application with a description and a name

  1. Choose the Create one now link, shown in the following screenshot:

Choose the Create one now link

  1. Select the Worker environment tier on the next screen:

    Select the Worker environment tier on the next screen

  2. Set up the worker environment (depicted in the following screenshot):

a) Provide a name for Environment name.
b) Provide a description in the Description field (this is optional).
c) In the Platform section, choose the Preconfigured platform option, and set platform to Node.js.
d) In the Application code section, select the Upload your code option and upload the Node.js zip file that was created earlier.
e) Choose Create environment.

Set up the Worker environment

Once the Node.js package completes the moviedata.json file migration, your destination Amazon S3 bucket should look similar to the following screenshot:

Once the Node.js package completes the moviedata.json file migration, your destination Amazon S3 bucket should look similar

Cleaning up

To ensure you do not incur any accidental or unnecessary charges, it is important to clean up the resources used during this tutorial. If you do not need it anymore, we recommend you delete the content you have stored in Azure Blob Storage and Amazon S3. Lastly, we recommend you delete the Elastic Beanstalk worker environment at the conclusion of the migration exercise.

Conclusion

This blog details one solution among many for migrating data from Microsoft Azure Blob Storage to Amazon S3. You can use this solution to migrate data from Azure Cosmos DB, Azure Table Storage, Azure SQL, and more, to Amazon Aurora, Amazon DynamoDB, Amazon RDS for SQL Server, and so on. Using AWS Elastic Beanstalk to move data through Azure Blob Storage to Amazon S3 opens up a trove of database, analytics, and query services to help optimize the lifetime of your data. We hope this approach comes in handy when needed. We would like to pay thanks to Ben Drucker for his contributions towards the Node.js package.

Thank you for reading, please leave a comment if you have any questions.