AWS Database Blog

Automating table mappings creation in AWS DMS

AWS Database Migration Service (AWS DMS) helps you migrate on-premises databases to AWS quickly and securely. It supports homogeneous migrations as well as heterogeneous migrations. The source database remains operational while the migration is running or being tested. Migration takes place using a DMS replication server, source, target endpoints, and migration tasks.

If you are trying to migrate a large number of databases using DMS and looking for a way to automate the JSON file creation selecting only a few tables, read on. This post describes a tool that automates the JSON file creation for DMS tasks.

The need for automation

DMS takes over many of the difficult or tedious tasks involved in a migration project. You specify which schemas and tables to migrate, along with any special processing such as logging or error handling.

A migration task includes:

  • Name
  • Description
  • Source
  • Target endpoints
  • Table mappings

This post focuses on the table-mapping section. Table mapping uses several types of rules to specify the data source, schema, any transformation that should occur during the task, etc.

There are two different ways to specify the table mappings: guided and JSON.

The guided method allows you to enter the table name as an individual table name or wild-card character (% or ABC%). If you have a large number of tables requiring you to include or exclude selected tables for the migration, the guided method is time-consuming.

JSON is an automatable option allowing you to enter the same information in a detailed manner.

This post focuses on the JSON file creation automation using a Python tool. You can create the JSON file manually, but it can get cumbersome or error-prone depending on the number of rules being written.

Explanation of the automated solution

This post presents a Python-based tool that takes the input as CSV files and generates a single JSON file with the required exclude and include actioned rule components. There can be multiple input files present in a particular folder. The only input to the tool is the folder location.

All the files in this folder should have names starting with include* or exclude*. This depicts which content in that particular file should be included or excluded. The content of any file is the schema name and table name on each line separated by a comma. In the steps below, I include the queries, which you can run on the source database to generate the include* or exclude* files.

Prerequisites

Below are the prerequisites for this solution.

  • Access to an Amazon Web Services (AWS) account: You need an AWS account to set up the sample AWS Data Migration Service. Sign up for AWS.
  • IT Skill level: You require basic knowledge of Oracle and SQL to successfully complete this project.
  • An instance with Python 2.7 installed.

Architecture diagram

The following architecture diagram shows the flow of events, as detailed in the following Steps section.

Solution steps

To implement the solution, follow these steps.

1) Based upon the existing on-premises environment, connect to your Oracle/SQL server. To find out the entire list of available tables, run the following query, modifying the required schema names in the place of HR. Copy the following script to a .sql file, for example: “allhrtables.sql“.

set heading off
set pages 0
set feedback off
set echo off
col text format a500
set trimspool on
set trimout on
spool alltable.csv
select owner||','||object_name from dba_objects
where owner='HR' and object_type='TABLE'
spool off;
exit;

Now, if you run this command, you should see the list of available tables.

sqlplus -s <UserName>/<Password>@<DB_Alias> @<SQL_FileName>

For example:

sqlplus -s oraadmin/oraadmin123@orclauth @allhrtables.sql

The following is a screenshot preview of what the output will look like.

2) Out of the list of all available tables, you generate a CSV file in which you may want to include or exclude few tables from migration to AWS. To migrate only few tables, you should first determine which tables (or patterns of table names) you want to include or exclude. After the determination, you can run the script as below. Below are two examples of tables that must be included and excluded. Both files are mutually exclusive explaining same tables to be included/excluded.

The include scenario

For this example, include all the tables except those with table names ending with TEMP, BACKUP, LOG, ARCHIVE, and HISTORY. Create a file called “include.sql” and copy the following lines in it.

set heading off
set pages 0
set feedback off
set echo off
col text format a500
set trimspool on
set trimout on
spool include.csv
select owner||','||object_name from dba_objects
where owner='HR' and object_type='TABLE'
and object_name not like '%TEMP'
and object_name not like '%BACKUP'
and object_name not like '%LOG'
and object_name not like '%ARCHIVE'
and object_name not like '%HISTORY';
spool off;
exit;

When the file is ready, execute it, and it prepares the include.csv, as follows.

sqlplus -s <UserName>/<Password>@<DB_Alias> @include.sql

For example:

sqlplus -s oraadmin/oraadmin123@orclauth @include.sql

The following is a screenshot preview of what the output will look like.

The exclude scenario

For this example, include all the tables except table names ending with TEMP, BACKUP, LOG, ARCHIVE, and HISTORY. This is the same scenario as detailed in the previous section. In this CSV file, instead of mentioning which tables to include, you mention which tables to exclude. For that, create a file called “exclude.sql” and copy the following lines in it.

set heading off
set pages 0
set feedback off
set echo off
col text format a500
set trimspool on
set trimout on
spool exclude.csv
select owner||','||object_name from dba_objects
where owner='HR' and object_type='TABLE'
and (object_name like '%TEMP'
or object_name like '%BACKUP'
or object_name like '%LOG'
or object_name like '%ARCHIVE'
or object_name like '%HISTORY');
spool off;
exit;

When the file is ready, execute it, and it prepares the exclude.csv file, as follows.

sqlplus -s <UserName>/<Password>@<DB_Alias> @exclude.sql

For example:

sqlplus -s oraadmin/oraadmin123@orclauth @exclude.sql

The following is a screenshot preview of what the output will look like.

Further steps

After setting up the include and exclude scenarios, follow these steps.

3) When the CSV files are ready, copy them into the folder on the same machine on which you are running the Python program.

4) Download the Python program from GitHub or copy the following in a suitable location inside your workstation.

line_count = 0
data['rules'] = []

def writeJSON():
    with open('automated_json.json', 'w') as outfile:
        json.dump(data, outfile)


def createJSON(csvfile,action):
    global line_count
    with open(csvfile) as file:
       csv_reader = csv.reader(file, delimiter=',')

       for row in csv_reader:
           counter = str(line_count + 1)
           data['rules'].append({
               "rule-type": "selection",
               "rule-id": counter,
               "rule-name": counter,
               "object-locator": {
                   "schema-name": row[0],
                   "table-name": row[1]
               },
               "rule-action": action
           })
           line_count += 1


if __name__ == "__main__":
    print("This program expects a folder location from the user. ")
    print("The folder can have two different types of files in csv format.")
    print("The file types are include table list and exclude table list.")
    print(" ")
    print("The file name should start with include or exclude to indicate "
          "whether the content of a particular file has to included or excluded.")
    print(" ")
    print("Both include and exclude files should contain schema name and the table name "
          "to be included or excluded separated by comma.")
    print("It is not necessary to have both include and exclude files.")
    print(" ")

    File_Location = raw_input("Enter the Folder location: ")
    if("/" in File_Location):
        separator = "/"
    else:
        separator = "\\"
    listOfFiles = os.listdir(File_Location)
    for entry in listOfFiles:
        if (entry.startswith("include")):
            createJSON(File_Location+separator+entry,"include")
        elif (entry.startswith("exclude")):
            createJSON(File_Location+seperater+entry, "exclude")

    writeJSON()

5) Run the Python program and provide the location of the folder in which the include.sql or exclude.sql files are present. The following is a screenshot preview of what the output will look like.

Open the JSON file in your favorite text editor, copy the content, and validate it using JSONLint. This validation gives you a chance to format the JSON file in a pretty format as well.

To view the JSON file in a pretty format, you can use the following command from the same folder in which the Python program is located.

python -m json.tool automated_json.json

6) As shown in the below diagram, in the “Table mappings” section, you will find two options to edit. 1) Guided UI and 2) JSON editor. Select the second option of JSON editor and copy the content of JSON file generated from the program in the preceding step.

Here are the benefits of using this automated approach:

  • It’s a faster way to generate a JSON file from the list of include and exclude tables.
  • It helps avoid manual intervention and thus reduces the chances of errors.
  • If the user wants to include only a few tables and not all the tables in from the given schemas, this program helps by including selected tables and voiding the wild-card option.

Conclusion

In this post, I detailed the steps to create a JSON file using a Python program. This JSON file includes or excludes selected tables while using AWS DMS. This JSON file can be used for the table mappings section in DMS to migrate the selected tables from the on-premises environment to AWS.

Try this approach in your environment and see the benefits listed above. I hope this post helps you with faster migration. Please feel free to reach out with questions or feature requests via comments.

 


About the Authors


Amishi Shah is a DevOps Consultant with the Professional Services team at Amazon Web Services. She works with customers to build scalable, highly available and secure solutions in AWS cloud. Her focus area is leading enterprise customers through large technical, organizational, and SDLC transformations.

 

 

 

Bhavesh Rathod is an Oracle Database Cloud Architect with the Professional Services team at Amazon Web Services. He works as database migration specialist to help internal Amazon customers to move their on-premises database environment to AWS cloud database solutions.