AWS Big Data Blog
Accelerate your data warehouse migration to Amazon Redshift – Part 2
This is the second post in a multi-part series. We’re excited to shared dozens of new features to automate your schema conversion; preserve your investment in existing scripts, reports, and applications; accelerate query performance; and potentially reduce your overall cost to migrate to Amazon Redshift.
Check out all posts in this series:
|
Amazon Redshift is the leading cloud data warehouse. No other data warehouse makes it as easy to gain new insights from your data. With Amazon Redshift, you can query exabytes of data across your data warehouse, operational data stores, and data lake using standard SQL. You can also integrate other services like Amazon EMR, Amazon Athena, and Amazon SageMaker to use all the analytic capabilities in the AWS Cloud.
Many customers have asked for help migrating their self-managed data warehouse engines to Amazon Redshift. In these cases, you may have terabytes (or petabytes) of historical data, a heavy reliance on proprietary features, and thousands of extract, transform, and load (ETL) processes and reports built over years (or decades) of use.
Until now, migrating a data warehouse to AWS was complex and involved a significant amount of manual effort.
Today, we’re happy to share additional enhancements to the AWS Schema Conversion Tool (AWS SCT) to automate your migrations to Amazon Redshift. These enhancements reduce the recoding needed for your data tables, and more importantly, the manual work needed for views, stored procedures, scripts, and other application code that use those tables.
In this post, we introduce automation for INTERVAL and PERIOD data types, automatic type casting, binary data support, and some other enhancements that have been requested by customers. We show you how to use AWS SCT to convert objects from a Teradata data warehouse and provide links to relevant documentation so you can continue exploring these new capabilities.
INTERVAL data types
An INTERVAL is an unanchored duration of time, like “1 year” or “2 hours,” that doesn’t have a specific start or end time. In Teradata, INTERVAL data is implemented as 13 distinct data types depending on the granularity of time being represented. The following table summarizes these types.
Year intervals | Month intervals | Day intervals | Hour intervals | Minute intervals | Second intervals |
INTERVAL YEAR INTERVAL YEAR TO MONTH |
INTERVAL MONTH | INTERVAL DAY INTERVAL DAY TO HOUR INTERVAL DAY TO MINUTE INTERVAL DAY TO SECOND |
INTERVAL HOUR INTERVAL HOUR TO MINUTE INTERVAL HOUR TO SECOND |
INTERVAL MINUTE INTERVAL MINUTE TO SECOND |
INTERVAL SECOND |
Amazon Redshift doesn’t support INTERVAL data types natively. Previously, if you used INTERVAL types in your data warehouse, you had to develop custom code as part of the database conversion process.
Now, AWS SCT automatically converts INTERVAL data types for you. AWS SCT converts an INTERVAL column into a CHARACTER VARYING column in Amazon Redshift. Then AWS SCT converts your application code that uses the column to emulate the INTERVAL semantics.
For example, consider the following Teradata table, which has a MONTH interval column. This table store different types of leaves of absences and the allowable duration for each.
AWS SCT converts the table to Amazon Redshift as follows. Because Amazon Redshift doesn’t have a native INTERVAL data type, AWS SCT replaces it with a VARCHAR data type.
Now, let’s suppose your application code uses the loa_duration
column, like the following Teradata view. Here, the INTERVAL MONTH field is added to the current date to compute when a leave of absence ends if it starts today.
Because the data is stored as CHARACTER VARYING, AWS SCT injects the proper type CAST into the Amazon Redshift code to interpret the string values as a MONTH interval. It then converts the arithmetic using Amazon Redshift date functions.
Also, as a bonus, AWS SCT automatically converts any literal INTERVAL values that you might be using in your code.
For example, consider the following Teradata table. The table contains a DATE column, which records the last date when an employee was promoted.
Now, suppose the database contains a view that computes the next date an employee is eligible for a promotion. We implement a business rule that employees who have never been promoted are eligible for promotion in 1.5 years. All other employees become eligible 2.5 years after their last promotion. See the following code:
AWS SCT converts the INTERVAL values used in the CASE statement and translates the date expressions accordingly:
We’re excited about INTERVAL automation in AWS SCT and encourage you to give it a try. For more information about getting started with AWS SCT, see Installing, verifying, and updating AWS SCT.
PERIOD data type
A PERIOD data value represents a duration of time with a specified start and end. For example, the Teradata literal “(2021-01-01 to 2021-01-31)”
is a period with a duration of 31 days that starts and ends on the first and last day of January 2021, respectively. PERIOD data types can have three different granularities: DATE, TIME, or TIMESTAMP. The following table provides some examples.
Period Type | Example |
PERIOD(DATE) | “(2021-01-01 to 2021-01-31) “ |
PERIOD(TIME) | “(12:00:00 to 13:00:00)” |
PERIOD(TIMESTAMP) | “(2021-01-31 00:00:00 to 2021-01-31 23:59:59)” |
As with INTERVAL, the PERIOD data type isn’t natively supported by Amazon Redshift. Previously, if you used these data types in your tables, you had to write custom code as part of the database conversion process.
Now, AWS SCT automatically converts PERIOD data types for you. AWS SCT converts a PERIOD column into two DATE (or TIME or TIMESTAMP) columns as appropriate on Amazon Redshift. Then AWS SCT converts your application code that uses the column to emulate the source engine semantics.
For example, consider the following Teradata table:
AWS SCT converts the PERIOD(TIMESTAMP) column into two TIMESTAMP columns in Amazon Redshift:
Now, let’s look at a simple example of how you can use AWS SCT to convert your application code. A common operation in Teradata is to extract the starting (or ending) timestamps in a PERIOD value using the BEGIN and END built-in functions:
AWS SCT converts the view to reference the transformed table columns:
We’ll continue to build automation for PERIOD data conversion, so stay tuned for more improvements. In the meantime, you can try out the PERIOD data type conversion features in AWS SCT now. For more information, see Installing, verifying, and updating AWS SCT.
Type casting
Some data warehouse engines, like Teradata, provide an extensive set of rules to cast data values in expressions. These rules permit implicit casts, where the target data type is inferred from the expression, and explicit casts, which typically use a function to perform the type conversion.
Previously, you had to manually convert implicit cast operations in your SQL code. Now, we’re happy to share that AWS SCT automatically converts implicit casts as needed. This feature is available now for the following set of high-impact Teradata data types.
Category | Source data type | Target data types |
Numeric | CHAR | BIGINT NUMBER TIMESTAMP |
VARCHAR | NUMBER NUMERIC DEC CHAR GEOMETRY |
|
INTEGER | DATE DEC |
|
BIGINT | DATE | |
NUMBER | CHARACTER VARCHAR DEC |
|
DECIMAL | DATE TIMESTAMP SMALLINT DOUBLE PRECISION |
|
FLOAT | DEC | |
Time | DATE | BIGINT INTEGER DECIMAL FLOAT NUMBER CHARACTER TIMESTAMP |
INTERVAL | NUMBER BIGINT INTEGER |
|
Other | GEOMETRY | DECIMAL |
Let’s look at how to cast numbers to DATE. Many Teradata applications treat numbers and DATE as equivalent values. Internally, Teradata stores DATE values as INTEGER. The rules to convert between an INTEGER and a DATE are well-known and developers have commonly exploited this information to perform date calculations using INTEGER arithmetic.
For example, consider the following Teradata table:
We insert a single row of data into the table:
We use a macro to update the last_promo_date
field for id = 112
. The macro accepts a BIGINT parameter to populate the DATE field.
Now, we run the macro and check the value of the last_promo_date
attribute:
You can see the last_promo_date
attribute is set to the date March 30, 2041.
Now, let’s use AWS SCT to convert the table and macro to Amazon Redshift. As we saw in Part 1 of this series, AWS SCT converts the Teradata macro into an Amazon Redshift stored procedure:
Note that 20410330 = 1410330 + 19000000; so adding 19,000,000 to the input returns the correct date value 2041-03-30.
Now, when we run the stored procedure, it updates the last_promo_date
as expected:
Automatic data type casting is available in AWS SCT now. You can download the latest version and try it out.
BLOB data
Amazon Redshift doesn’t have native support for BLOB columns, which you use to store large binary objects like text or images.
Previously, if you were migrating a table with a BLOB column, you had to manually move the BLOB values to file storage, like Amazon Simple Storage Service (Amazon S3), then add a reference to the S3 file in the table. Using Amazon S3 as the storage target for binary objects is a best practice because these objects are large and typically have low analytic value.
We’re happy to share that AWS SCT now automates this process for you. AWS SCT replaces the BLOB column with a CHARACTER VARYING column on the target table. Then, when you use the AWS SCT data extractors to migrate your data, the extractors upload the BLOB value to Amazon S3 and insert a reference to the BLOB into the target table.
For example, let’s create a table in Teradata and populate it with some data:
Now, we convert the table with AWS SCT and build it on Amazon Redshift:
Then we use the AWS SCT data extractors to migrate the table data from Teradata to Amazon Redshift.
When we look at the table in Amazon Redshift, you can see the paths to the S3 files that contain the BLOB values:
And on Amazon S3, you can see the actual data files. There are two, one for each BLOB value:
BLOB support is available now in AWS SCT and the AWS SCT data extractors. Download the latest version of the application and try it out today.
Multi-byte CHARACTER conversion
Teradata supports multibyte characters in CHARACTER data columns, which are fixed length fields. Amazon Redshift supports multibyte characters in CHARACTER VARYING fields but not in fixed-length CHARACTER columns.
Previously, if you had fixed-length CHARACTER columns, you had to determine if they contained multibyte character data, and increase the target column size as appropriate.
AWS SCT now bridges this gap for you. If your Teradata tables contain CHARACTER columns with multibyte characters, AWS SCT automatically converts these columns to Amazon Redshift CHARACTER VARYING fields and sets the column sizes accordingly. Consider the following example, which contains four columns, a LATIN column that contains only single-byte characters, and UNICODE, GRAPHIC, and KANJISJIS columns that can contain multi-byte characters:
AWS SCT translates the LATIN column to a fixed length CHARACTER column. The multi-byte columns are upsized and converted to CHARACTER VARYING:
Automatic conversion for multibyte CHARACTER columns is available in AWS SCT now.
GEOMETRY data type size
Amazon Redshift has long supported geospatial data with a GEOMETRY data type and associated spatial functions.
Previously, Amazon Redshift restricted the maximum size of a GEOMETRY column to 64 KB, which constrained some customers with large objects. Now, we’re happy to share that the maximum size of GEOMETRY objects has been increased to just under 1 MB (specifically, 1,048,447 bytes).
For example, consider the following Teradata table:
You can use AWS SCT to convert it to Amazon Redshift. The converted table definition is as follows. A size specification isn’t needed on the converted columns because Amazon Redshift implicitly sets the column size.
Large GEOMETRY columns are available in Amazon Redshift now. For more information, see Querying spatial data in Amazon Redshift.
Conclusion
We’re happy to share these new features with you. If you’re contemplating a migration to Amazon Redshift, these capabilities can help automate your schema conversion and preserve your investment in existing reports, applications, and ETL, as well as accelerate your query performance.
This post described a few of the dozens of new features we have recently introduced to automate your data warehouse migrations to Amazon Redshift. We will share more in upcoming posts. You’ll hear about additional SQL automation, a purpose-built scripting language for Amazon Redshift with BTEQ compatibility, and automated support for proprietary SQL features.
Check back soon for more information. Until then, you can learn more about Amazon Redshift and the AWS Schema Conversion Tool on the AWS website. Happy migrating!
About the Author
Michael Soo is a database engineer with the AWS DMS and AWS SCT team at Amazon Web Services.