AWS Database Blog

Handle IDENTITY columns in AWS DMS: Part 2

In Part 1 of this series, we discussed how the IDENTITY column is used in different relational database management systems. In this post, we focus on how AWS Database Migration Service (AWS DMS) handles tables with IDENTITY column. For the source database, AWS DMS captures the IDENTITY column as a regular column. For the target database, AWS DMS will take the source database column value and apply it to the IDENTITY column.

When setting up an AWS DMS replication task, one option is let AWS DMS recreate the table on the target by choosing Drop tables on target.

In this case, the IDENTITY column is created as a regular column, and the IDENTITY property for a column isn’t migrated to a target database column. This behavior is consistent with all database engines, even without AWS DMS . For example, a create table A as select * from table B will not inherit the IDENTITY property of the columns in table A. Regular columns can be converted into IDENTITY columns by using an alter table command for the target database after the data migration.

The following code shows the extracted Data Definition Language (DDL) on the target after AWS DMS recreates the table (in SQL Server):

CREATE TABLE [dbo].[Name](
[ID] [int] NOT NULL,
[FirstName] [varchar](40) NOT NULL,
CONSTRAINT [PK__Name__3214EC2771ED397B] PRIMARY KEY CLUSTERED
(
[ID] ASC
)

However, in many cases, tables are created manually on the AWS DMS target endpoint with database definition language (DDL) from a schema export, AWS Schema Conversion Tool (AWS SCT) or restored from a backup, therefore the IDENTITY property is kept. When setting up an AWS DMS task via the AWS DMS console, we choose Do nothing (for CDC only) or Truncate (for full load and CDC) for Target table preparation mode.

In the following sections, we look at how SQL Server, MySQL, PostgreSQL, and Oracle handle things differently.

SQL Server

We apply the same DDL from the source on the target endpoint:

CREATE TABLE Name(
ID INT IDENTITY NOT NULL PRIMARY KEY,
FirstName VARCHAR(40) NOT NULL
);

When AWS DMS replicates data from the source to the target, it sets IDENTITY_INSERT to ON when it updates the target table, and sets it to OFF after the change is complete. We can validate this behavior by turning on SQL Server Audit (see the following screenshot). As we discussed in Part 1, SQL Server can accept values for the IDENTITY column with IDENTITY_INSERT set to ON.

Therefore, AWS DMS handles the SQL Server target with an IDENTITY column automatically by using the following SQL Server features:

  • INDENTITY_INSERT
  • Automatically set the next value after it’s provided by the insert SQL

There is no manual work involved; you can stop the AWS DMS task and cut over the application to the target database when the data is in sync.

MySQL

In this example, we use auto converted DDL from the AWS Schema Conversion Tool (AWS SCT):

CREATE TABLE `name-created` (
`Id` int(11) NOT NULL AUTO_INCREMENT,
`Firstname` varchar(30) DEFAULT NULL,
PRIMARY KEY (`Id`)
) ENGINE=InnoDB AUTO_INCREMENT=152 DEFAULT CHARSET=latin1;)
WITH (
OIDS=FALSE
);

AWS DMS successfully migrates the data for a table with an IDENTITY column without any issue. In MySQL, you can specify the value for the IDENTITY column as long as it doesn’t violate the unique rule (in MySQL, the IDENTITY column has to be primary key or unique index).

When we want to cut over, we can stop the AWS DMS task and use the following SQL query to match the AUTO_INCREMENT value of IDENTITY_COLUMN to match that of source. This ensures data consistency and prevents failure when the app starts to insert data into the new target MySQL database.

On the source database, find the current value of the AUTO_INCREMENT column. For an example on MySQL, use the following SQL:

SELECT `AUTO_INCREMENT`
FROM INFORMATION_SCHEMA.TABLES
WHERE TABLE_SCHEMA = <SCHEMA_NAME>
AND TABLE_NAME = <TABLE_NAME>;

Now that we have the current AUTO_INCREMENT value from the source database, use the following SQL to set it on the target MySQL database:

ALTER TABLE <TABLE_NAME>
AUTO_INCREMENT=<VALUE_FROM_PREVIOUS_STEP>.

Note the following:

  • The DELETE command doesn’t decrease or reset the value of AUTO_INCREMENT. Therefore, if you delete all rows of a table with AUTO_INCREMENT, it will not reset its value.
  • TRUNCATE will always reset AUTO_INCREMENT to 1.
  • If you’re expecting the IDENTITY column to exceed an integer limit of 2^31 (2,147,483,648), it’s better to define the IDENTITY column as BIGINT.
  • You can also define the IDENTITY column as UNSIGNED if your application doesn’t need to store negative numbers. This will increase your limit of the INT data type.

PostgreSQL

AWS DMS handles the IDENTITY column differently for full load compared to change data capture (CDC).

For a full load-only task, you can create the table on the target database beforehand manually or by using scripts from the AWS SCT. The following code is the auto converted DDL from the AWS SCT:

CREATE TABLE dms_sample.name(
id BIGINT GENERATED ALWAYS AS IDENTITY( MAXVALUE 9223372036854775807 MINVALUE 1 NO CYCLE CACHE 20),
firstname CHARACTER VARYING(40) NOT NULL)
WITH (OIDS=FALSE);

AWS DMS successfully migrates the data for a table with an IDENTITY column. Note that the AWS DMS full load-only task will be successful for IDENTITY columns with GENERATED ALWAYS and GENERATED BY DEFAULT.

When using an AWS DMS full load with CDC or CDC-only task, note the following: If the source tables have the IDENTITY columns created with GENERATED ALWAYS and if the target tables are also created using GENERATED ALWAYS, the AWS DMS task will throw an error while migrating the data because the source IDENTITY columns will have the values already generated for the IDENTITY column.

To overcome the AWS DMS error, the target tables must be created with the IDENTITY columns using GENERATED BY DEFAULT. Then migrate the data and after a successful migration, at cutover, modify the IDENTITY column back to GENERATED ALWAYS.

Let’s say the Name table has an IDENTITY column created with GENERATED ALWAYS in the source. If using the AWS SCT, the table DDL will have the IDENTITY as GENERATED ALWAYS in PostgreSQL as well, which when created will error out for CDC.

First, create the table manually by changing the IDENTITY column to GENERATED BY DEFAULT. You can also run ALTER TABLE to change the property of the IDENTITY column. See the following code:

CREATE TABLE dms_sample.name(
id BIGINT GENERATED by DEFAULT AS IDENTITY( MAXVALUE 9223372036854775807 MINVALUE 1 NO CYCLE CACHE 20),
firstname CHARACTER VARYING(40) NOT NULL
);

After the AWS DMS tasks complete successfully, at the time of final cutover, modify the IDENTITY column back to GENERATED ALWAYS:

ALTER TABLE dms_sample.name ALTER COLUMN id SET GENERATED ALWAYS;

Post-migration

At the final cutover, run the following code to set the current value of the IDENTITY column:

select setval(pg_get_serial_sequence('dms_sample.name', 'id'), max(id))
from dms_sample.name;

We provide a code snippet to automate these two steps:

  1. Convert a table’s IDENTITY column from GENERATED ALWAYS to GENERATED BY DEFAULT.
  2. Reset the current value of the IDENTITY column after cutover.

For example, let’s say that we are migrating data from a commercial system (Oracle or SQL Server) to PostgreSQL. The source has two tables, NAME and NAME2, with IDENTITY columns. Complete the following steps:

  1. Create a table in the source database with the schema name, table name, column name, and generation type of the IDENTITY column:
    CREATE TABLE IDENTITY_col_tables
    (
    tableschema varchar(30)
    , tablename varchar(100)
    , column_name varchar(50)
    , generation_type varchar(50)
    );
  2. Insert the required data into the IDENTITY_col_tables table:
    • For Oracle, use the following code:
      INSERT INTO IDENTITY_col_tables
      					SELECT OWNER,table_name, column_name, generation_type
      					FROM all_tab_IDENTITY_cols;
    • For SQL Server, use the following code:
      INSERT INTO IDENTITY_col_tables
         SELECT OBJECT_SCHEMA_NAME(TAB.object_id, db_id()),
         TAB.name as TableName,COL.
         name as ColumnName,'ALWAYS'
      FROM sys.tables TAB INNER JOIN sys.columns COL on TAB.object_id=COL.object_id
      where COL.column_id=1
      and COL.is_identity=1
      and is_ms_shipped=0
  3. Create the same table in the target database (PostgreSQL):
    CREATE TABLE IDENTITY_col_tables
    			(
    			tableschema varchar(30)
    			,tablename varchar(100)
    			, column_name varchar(50)
    			, generation_type varchar(50)
    			);
  4. Migrate the data from source to target only for the IDENTITY_col_tables table. You can use AWS DMS for this step.
  5. Run the DDL that is generated by the AWS SCT or equivalent. This creates the same IDENTITY columns as in the source database.
  6. Run the following in PostgreSQL to change the GENERATE ALWAYS IDENTITY columns to GENERATE BY DEFAULT:
    DO $$
    DECLARE
    rec RECORD;
    BEGIN
    for rec in ( select lower(tableschema) as tabschema, lower(tablename) as tabname,lower(column_name)as colname from identity_col_tables where trim(generation_type) = 'ALWAYS')
    loop
    EXECUTE format('ALTER TABLE %s.%s ALTER COLUMN %s SET GENERATED BY DEFAULT', rec.tabschema, rec.tabname,rec.colname);
    end loop;
    end;
    $$;
  7. Run the full load and CDC.
  8. Run the following in PostgreSQL to change the IDENTITY columns:
    DO $$
    DECLARE
    rec RECORD;
    BEGIN
    for rec in ( select lower(tableschema) as tabschema, lower(tablename) as tabname,lower(column_name)as colname from identity_col_tables where trim(generation_type) = 'ALWAYS')
    loop
    EXECUTE format('ALTER TABLE %s.%s ALTER COLUMN %s SET GENERATED ALWAYS', rec.tabschema, rec.tabname,rec.colname);
    end loop;
    end;
    $$;
  9. Run the following on the target to reset all the IDENTITY column values at cutover:
    DO $$
    DECLARE
    v_stmt text;
    rec record;
    BEGIN
    for rec in ( select lower(tableschema) as tableschema, lower(tablename) as tablename, lower(column_name) as column_name from identity_col_tables)
    loop
    v_stmt := 'select setval(pg_get_serial_sequence('''||rec.tableschema ||'.'||rec.tablename ||''', '''||rec.column_name||'''),
     max('||rec.column_name||')) from '|| rec.tableschema||'.'||rec.tablename;
    raise notice '%', v_stmt;
    end loop;
    end;
    $$;

Oracle

Given the flexibility provided by Oracle for IDENTITY columns, the following approach would be the most appropriate to migrate data into the IDENTITY columns in target Oracle databases. Complete the following steps:

    1. Define the IDENTITY column in the target Oracle table with a GENERATED BY DEFAULT ON NULL clause before or during data migration.This ensures that previous values from source tables are inserted into these target tables without any challenges. This also ensures that if the source table has NULL values, Oracle will insert a NOT NULL integer value for this ID column on the target.
    2. Determine the maximum value of the IDENTITY column on the source table and supply the START WITH clause to modify the target Oracle table.This ensures data uniqueness and that no duplicate values are inserted into the ID column, which may cause insertion errors due to the IDENTITY definition of the column. Provide a higher value for the START WITH clause to prevent duplicate data insertion when additional data is inserted into the source table.
    3. Perform post-data migration and validation using AWS DMS. You may alter the table to change this IDENTITY column to GENERATED ALWAYS or GENERATED BY DEFAULT based on your application’s requirements.Now you can use the Oracle-provided dictionary views to identify the IDENTITY columns and their characteristics. The DBA_TAB_IDENTITY_COLS, ALL_TAB_IDENTITY_COLS, and USER_TAB_IDENTITY_COLS views provide the IDENTITY columns, their data generation type, internal sequence associated with each column, and the options for each of these internal sequences.
    4. Use the ALL_TAB_IDENTITY_COLS view on the source Oracle database to script out the SQL required to create or alter these columns on the target Oracle database.
    5. Use the following SQL to list all the IDENTITY column details on the source database. Spool the output to a text or CSV file for your records.
      select owner, table_name, column_name, generation_type, sequence_name, IDENTITY_options from all_tab_IDENTITY_cols;
      OR
      select * from all_tab_IDENTITY_cols;
    6. The following SQL generates a script to modify the IDENTITY column generation type to GENERATED BY DEFAULT ON NULL. Before migrating the data using AWS DMS, run this script against the source database to generate the output script. Then run the resulting output script against the target database to change the IDENTITY column properties first. This makes sure that all the IDENTITY columns on the target Oracle database are set to the GENERATED BY DEFAULT ON NULL type.
      spool IDENTITY_2_gen-def.sql
      set lines 400 pages 10000
      select “spool IDENTITY_2_gen-def.out “
      select
      “alter table “|| owner ||”.”|| table_name ||
      ” (modify “|| column_name ||
      ” GENERATED BY DEFAULT ON NULL AS IDENTITY ); “
      from all_tab_IDENTITY_cols;
      select “spool off ;”
      spool off
    7. After you migrate the data using AWS DMS, you must reset the data generation type for these IDENTITY columns back to their original settings (matching the source Oracle database). The following SQL generates a script to modify the IDENTITY column generation type back to its original settings. Run this script against the source database to capture the current generation type and generate the output script. Then run the resulting output script against the target database to change the IDENTITY column properties to the same as the source.
      spool IDENTITY_2_default.sql
      set lines 400 pages 10000
      select “spool IDENTITY_2_default.out “
      select
      “alter table “|| owner ||”.”|| table_name ||
      ” (modify “|| column_name ||
      ” GENERATED “|| generation_type ||” AS IDENTITY ); “
      from all_tab_IDENTITY_cols;
      select “spool off ;”
      spool off
    8. As mentioned earlier, Oracle internally maintains some sequences to comply with the IDENTITY options defined for IDENTITY columns. These sequences are generated internally and have their own naming convention. Even though user didn’t create these sequences manually, they may be identified by querying the *_TAB_IDENTITY_COLS dictionary:
      select owner, table_name, column_name, generation_type, sequence_name, IDENTITY_options from all_tab_IDENTITY_cols;

      or

      select * from all_tab_IDENTITY_cols;
    9. We need to reset the values for these sequences associated with each IDENTITY column post-data migration using AWS DMS and before applications start writing to the tables to maintain data uniqueness for these IDENTITY columns.
    10. Alternately, you can modify the IDENTITY column itself by resetting its START value with the ALTER TABLE command. Oracle provides the START WITH LIMIT VALUE clause in its ALTER TABLE statement.Let’s see what this START WITH LIMIT VALUE clause does on the ALTER TABLE statement in more detail. This clause can only be specified with an ALTER TABLE statement and by implication against an existing IDENTITY column. When this clause is specified, Oracle locks the table, scans it for the highest value for the IDENTITY column, and resets the underlying sequence for this IDENTITY column to start with a new value of highest value + 1. The next value generated by the sequence for this IDENTITY column is this high water mark + INCREMENT BY value. See the following example SQL:
      ALTER TABLE sample_tab MODIFY id_col GENERATED BY DEFAULT
      		AS IDENTITY (START WITH LIMIT VALUE);

      Oracle locks the sample_tab table and scans for the highest value for the id_col column. The underlying sequence generator used internally by this IDENTITY column is reset with the highest value found + the INCREMENT BY value.

      This option saves time in identifying high values on each table manually and updating its corresponding sequence generator value individually.

      Now we repeat Step 7 to reset the IDENTITY column generation type to include this START WITH LIMIT VALUE clause. This way, when setting the type and option of the IDENTITY columns on the target database, we also reset the values of its sequence generators automatically.

    11. Revise the SQL to run after data migration as follows, and run this against the source database to generate a dynamic SQL output and run the output script against the target:
      spool IDENTITY_2_final.sql
      		set lines 400 pages 10000
      		select “spool IDENTITY_2_final.out “
      		select
      		“alter table “|| owner ||”.”|| table_name ||
      		” (modify “|| column_name ||
      		” GENERATED “|| generation_type ||
      		” AS IDENTITY (START WITH LIMIT VALUE )); “
      		from all_tab_IDENTITY_cols;
      		select “spool off ;”
      		spool off

The IDENTITY columns are altered to the same generation type as the source and the next values are automatically adjusted with the highest value added to the IDENTITY column in the table.

Summary

In this blog series we covered the IDENTITY column migration best practice with AWS DMS. In part 1 we demonstrated how different relational database management systems handle IDENTITY columns. In this post (part 2), we demonstrated the implementation of the IDENTITY column and cutover strategy in AWS DMS. This will help you migrate data from on-premises databases to AWS.

If you have any questions or comments, post your thoughts in the comments section.


About the authors

Feng Cai is a Database Consultant at AWS. He has a wide background in providing database services to different clients. Feng is currently a member of the AWS Global Competency Center, providing homogeneous and heterogenous database migration support to customers.

Sindhusha Paturu is a Database Consultant with the AWS Professional Services Team. She has been supporting and enabling customers to migrate their database from on-premises data centers to the AWS Cloud and also to migrate from commercial database engines to open-source databases in Amazon.

Bala K is a Senior Database Consultant with the AWS Global Competency Center and brings a vast experience of database technologies. Starting with mainframes, Bala has worked on all modern database engines like Oracle, SQL Server, DB2, Sybase, PostgreSQL and Amazon Redshift. At AWS, he helps empower customers with their digital transformation and accelerate migration of their database workload to the AWS Cloud. He has successfully migrated a number of complex databases from on premises to the AWS Cloud.

Mohammad Waseem is a Database Consultant at AWS. He works as a database migration specialist, helping and enabling customers to build highly available, cost-effective database solutions and migrate their database workloads to the AWS Cloud. He works with AWS customers and partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.