Simplify Data Warehouse Migration to Amazon Redshift Using New AWS Schema Conversion Tool Features

The AWS Schema Conversion Tool (AWS SCT) makes heterogeneous database migrations more predictable by automatically converting the source database schema and most of the database code objects to a format compatible with the target database. To help with migration of on-premises data warehouses to Amazon Redshift, the new version of AWS SCT adds the following important capabilities:

Migrating large objects to Amazon Redshift and Amazon S3
Virtual partitioning of large tables when migrating to Amazon Redshift

This post describes how you can take advantage of new AWS SCT features to automate the migration of large data objects and parallelize the migration of large data tables.

Migrating large objects to Amazon Redshift and Amazon S3
The AWS SCT data extraction agents have been enhanced to support extraction of large objects (LOBs) to Amazon S3 during the migration of an on-premises data warehouse to Amazon Redshift. During data extraction, the AWS SCT data extraction agent generates unique file names for the LOB values and extracts the LOB values into corresponding Amazon S3 files. The extraction agent then generates links to the Amazon S3 files and writes the links into the specified fields of the extract data file for import into Amazon Redshift.

To configure the location to store LOBs in an S3 bucket, use the Project Settings/Data Migration menu in AWS SCT:

Or, if you want more granular control, use the data migration task creation process:

Virtual partitioning of large tables when migrating to Amazon Redshift
It can take a long time to migrate data from very large non-partitioned data warehouse tables to Amazon Redshift. The virtual partitioning feature lets you create sub-tasks for these large tables to parallelize data migration. Each sub-task creates a virtual partition using filtering rules. For extremely large tables, the number of sub-tasks/virtual partitions can reach hundreds or even thousands. AWS SCT ensures that the virtual partitions that you create are non-overlapping so that double loading by two different agents is not possible.

The AWS SCT user interface has been extended to support virtual partitioning of large tables to achieve massive parallelism during migration to AWS.

The following partitioning types are supported:

RANGE partitioning with Numeric, Date, and Time data types
LIST with Numeric, Character, Date, and Time data types
DATE AUTO SPLIT with Date and Time data types

RANGE with Numeric, Date, and Time data types
Range partitioning assigns rows to partitions based on column values falling within a given range. You do this by specifying a list of values for the partitioning column in the Values field. Range partitioning information can be loaded from .csv files.

The following are examples of WHERE conditions for RANGE partitions on a column:

With Numeric data types:

Partition1: WHERE ID <= 123
Partition2: WHERE ID > 123 AND ID <= 345
Partition3: WHERE ID > 345 AND LO_TAX <= 456
…
PartitionN+1: WHERE ID > USER_VALUE_N

With Date data types:

Oracle and Teradata

Partition1: WHERE LO_DATE <= date ‘1954-10-24’
Partition2: WHERE LO_DATE > date ‘1954-10-24’ AND LO_DATE <= date ‘1954-11-06’
Partition3: WHERE LO_DATE > date ‘1954-11-06’ AND LO_DATE <= date ‘1954-11-20’
…
PartitionN+1: WHERE LO_DATE > date ‘USER_VALUE_N’

IBM Netezza, Greenplum Database, Vertica, and Microsoft SQL Server

Partition1: WHERE LO_DATE <= ‘1954-10-24’
Partition2: WHERE LO_DATE > ‘1954-10-24’ AND LO_DATE <= ‘1954-11-06’
Partition3: WHERE LO_DATE > ‘1954-11-06’ AND LO_DATE <= ‘1954-11-20’
…
PartitionN+1: WHERE LO_DATE > ‘USER_VALUE_N’

LIST with Numeric, Character, Date, and Time data types
List partitioning assigns rows to partitions based on column values. You can configure this capability by specifying a list of values for the partitioning column in the Values field. Information can be loaded from .csv files.

The following is an example of WHERE conditions for LIST partitions:

Partition1: WHERE C_VARCHAR2 = 1
Partition2: WHERE C_VARCHAR2 = 2
Partition3: WHERE C_VARCHAR2 = 3
…
PartitionN: WHERE C_VARCHAR2 = USER_VALUE_N

DATE AUTO SPLIT
Date Auto Split partitioning assigns rows to partitions based on column values that fall within a given interval between the start date and end date. You can enable this feature by specifying the date range and interval (DAY, WEEK, MONTH, and YEAR) values for the partitioning column in the Start Date, End Date, and Interval boxes.

The following is an example of WHERE conditions for DATE AUTO SPLIT partitions:

Partition1: WHERE C_DATE >= ‘2017-09-28’ AND C_ DATE < ‘2017-09-30’
Partition2: WHERE C _ DATE >= ‘2017-09-30’ AND C _ DATE < ‘2017-10-02’
Partition3: WHERE C _ DATE >= ‘2017-10-02’ AND C _ DATE < ‘2017-10-04’
…
PartitionN: WHERE C _ DATE >= USER_VALUE_N AND C _DATE <= ‘2017-11-14’

Summary
The AWS Schema Conversion Tool can help you migrate your data warehouse to Amazon Redshift. In this post, we described two new features in AWS SCT to automate the migration of large data objects (LOBs) and parallelize the migration of very large data tables.

These features were the result of direct feedback from our customers. If there are specific features you would like to see included in these tools, please drop us a note! For more information about AWS Data Migration Service (AWS DMS) and AWS SCT, visit AWS Database Migration Service.

About the Authors

Venu Reddy is a software development manager for Database Migration Service at Amazon Web Services. He works with our customers to provide guidance and technical assistance on database projects, helping them improving the value of their solutions when using AWS.

AWS Database Blog

Simplify Data Warehouse Migration to Amazon Redshift Using New AWS Schema Conversion Tool Features

About the Authors

Resources

Blog Topics

Follow