AWS Database Blog

Handle IDENTITY columns in AWS DMS: Part 1

In relational database management systems, an IDENTITY column is a column in a table that is made up of values generated automatically by the database at the time of data insertion. Although different systems handle the implementation of IDENTITY columns differently, they share some common characteristics. In most cases, the value of the IDENTITY column is populated during insert and no modifications are allowed afterwards. Each new value is generated based on the current seed and increment. For an example, identity(1, 1) means the value starts at 1 and will increase by 1 for each insert.

There are some common misconceptions that IDENTITY columns enforce uniqueness. Although the value is unique in general, it can produce duplicates if the starting value gets reset manually. To enforce uniqueness on the column, you must include the appropriate constraint, like a primary key or unique key. Another misconception is that IDENTITY columns cannot have holes (gaps). Holes can happen when there is situation like transaction rollback, bulk delete (table truncation), or server restart.

You can use AWS Database Migration Service (AWS DMS) to migrate your databases to AWS. When tables with IDENTITY column are replicated, special care is required:

  • In AWS DMS, the IDENTITY column value from the source database (generated automatically) is replicated to the target, which may cause database errors because the target database wants to generate the value on the same IDENTITY column
  • When the application points to the target database after migration (AWS DMS is stopped during the cutover window), the database needs to generate new values for these IDENTITY columns with a new starting point (reseeding)

This two-part series can help you handle IDENTITY columns with AWS DMS during a database migration. In Part 1 (this post), we introduce how the IDENTITY column is implemented in four major relational database engines: SQL Server, MySQL, PostgreSQL, and Oracle. In Part 2, we discuss how to set up tables with the IDENTITY column as the AWS DMS target, and provide instructions to handle reseeding after cutover.

Let’s look at examples of how IDENTITY columns are implemented in different database management systems.

SQL Server

IDENTITY columns have the following characteristics in SQL Server:

  • Each new value is generated based on the current seed and increment
  • Each new value for a particular transaction is different from other concurrent transactions on the table
  • Only one IDENTITY column can be created per table

The following is an example of how IDENTITY columns are used:

CREATE TABLE Name(
   ID INT IDENTITY NOT NULL PRIMARY KEY,
   FirstName VARCHAR(40) NOT NULL
);
-- Inserting values into table.
INSERT INTO Name(FirstName) VALUES ('John');
INSERT INTO Name(FirstName) VALUES ('Mary');
INSERT INTO Name(FirstName) VALUES ('Peter');
SELECT * FROM dbo.Name;
ID FirstName
1 John
2 Mary
3 Peter

SET IDENTITY_INSERT

SET IDENTITY_INSERT allows explicit values to be inserted into the IDENTITY column of a table.

To run a test to manually set the value for the IDENTITY column ID without IDENTITY_INSERT turned on (row got rejected), use the following code:

SET IDENTITY_INSERT dbo.Name OFF;
INSERT INTO dbo.Name(ID,FirstName) VALUES (100,'Jane');
Msg 544, Level 16, State 1, Line 18
Cannot insert explicit value for IDENTITY column in table 'Name' when IDENTITY_INSERT is set to OFF.

Run the same test with IDENTITY_INSERT turned on (row inserted successfully) with the following code:

SET IDENTITY_INSERT dbo.Name ON;
INSERT INTO dbo.Name(ID,FirstName) VALUES (100,'Jane');
(1 row affected)
Completion time: 2022-12-07T15:15:27.2674217+00:00

IDENTITY column reseeding

According to the Microsoft documentation, if the value inserted is larger than the current IDENTITY value for the table, SQL Server automatically uses the new inserted value as the current IDENTITY value. See the following code:

SET IDENTITY_INSERT dbo.Name OFF;
GO
INSERT INTO dbo.Name(FirstName) VALUES('Jack');
GO
(1 row affected)
Completion time: 2022-12-07T15:30:25.4959694+00:00

You can see the value of ID is automatically set to 101 by SQL Server:

SELECT * FROM dbo.Name;
ID FirstName
1 John
2 Mary
3 Peter
100 Jane
101 Jack

MySQL

IDENTITY columns have the following characteristics in MySQL:

  • The IDENTITY column equivalent is AUTO_INCREMENT.
  • The column declared as AUTO_INCREMENT must be declared as a primary or unique key
  • MySQL allows exactly one column per table as AUTO_INCREMENT.
  • The data type of the column declared as AUTO_INCREMENT must be an integer. DECIMAL and NUMERIC are not allowed.
  • Explicit insert of the AUTO_INCREMENT column value is allowed.
  • The AUTO_INCREMENT column value increment is always 1 when using implicit value generation from the database.

The following is an example of how IDENTITY columns are used:

CREATE TABLE name (
Id int NOT NULL AUTO_INCREMENT,
Firstname varchar(30) DEFAULT NULL,
PRIMARY KEY (Id)) ;
Query OK, 0 rows affected
-- Inserting values into table.
INSERT INTO Name(FirstName) VALUES ('John');
INSERT INTO Name(FirstName) VALUES ('Mary');
INSERT INTO Name(FirstName) VALUES ('Peter');
select * from name;
ID FirstName
1 John
2 Mary
3 Peter

AUTO_INCREMENT

If we mention a value for the ID AUTO_INCREMENT column, MySQL will increment the next value from the manually inserted value:

Insert into name(id,Firstname) values(100,'Jane');
Query OK, 1 row affected
select * from name;
ID FirstName
1 John
2 Mary
3 Peter
100 Jane
Insert into name(Firstname) values('Jack');
Query OK, 1 row affected
select * from name;
ID FirstName
1 John
2 Mary
3 Peter
100 Jane
101 Jack

MySQL resumes from last inserted value.

PostgreSQL

IDENTITY columns have the following characteristics in PostgreSQL:

  • The data type can be of type SMALLINT, INT or BIGINT.
  • It has GENERATED ALWAYS and GENERATED BY DEFAULT options:
    • GENERATED ALWAYS instructs PostgreSQL to always generate a value for the IDENTITY column. If you attempt to insert (or update) values into the GENERATED ALWAYS AS IDENTITY column, PostgreSQL will issue an error.
    • GENERATED BY DEFAULT also instructs PostgreSQL to generate a value for the IDENTITY column. However, if a value is supplied for insert or update, PostgreSQL will use that value to insert into the IDENTITY column instead of using the system-generated value.

Let’s look at some examples of these options.

GENERATED ALWAYS

The following code is an example of GENERATED ALWAYS:

CREATE TABLE name (
ID INT GENERATED ALWAYS AS IDENTITY,
Firstname VARCHAR(40) NOT NULL,
CONSTRAINT name_pk PRIMARY KEY(ID)
);
INSERT INTO name(Firstname) VALUES ('John');
INSERT INTO name(Firstname) VALUES ('Mary');
INSERT INTO name(Firstname) VALUES ('Peter');
SELECT * FROM name;
Id Firstname
1 John
2 Mary
3 Peter
INSERT INTO name(id, Firstname) VALUES (100, 'Jane');
SQL Error [428C9]: ERROR: cannot insert into column "id"
Detail: Column "id" is an IDENTITY column defined as GENERATED ALWAYS.

To fix the preceding error, use the OVERRIDING SYSTEM VALUE clause as follows:

INSERT INTO name(id, Firstname) OVERRIDING SYSTEM VALUE
VALUES (100, 'Jane');
SELECT * FROM name;
Id Firstname
1 John
2 Mary
3 Peter
100 Jane

GENERATED BY DEFAULT AS IDENTITY

Alternatively, you can use GENERATED BY DEFAULT AS IDENTITY:

ALTER TABLE name
ALTER COLUMN id SET GENERATED BY DEFAULT;
INSERT INTO name(id, Firstname) VALUES (100, 'Jane');

Unlike the previous example that uses the GENERATED ALWAYS AS IDENTITY constraint, the preceding SELECT statement works perfectly fine:

SELECT * FROM name;
Id Firstname
1 John
2 Mary
3 Peter
100 Jane

IDENTITY column reseeding

In PostgreSQL, you need reset the IDENTITY value if values are set manually:

ALTER TABLE name
ALTER COLUMN id SET GENERATED BY DEFAULT RESTART WITH 101;
INSERT INTO name(Firstname) VALUES ('jack');
SELECT * FROM name;
Id Firstname
1 John
2 Mary
3 Peter
100 Jane
101 Jack

Oracle

Oracle 12c Release 1 (12.1) introduced the IDENTITY column to match the IDENTITY functionality of other database engines. A combination of triggers and sequences were used to achieve this identity functionality in earlier versions of Oracle, which we will not cover in this post.

IDENTITY columns in Oracle (version 12c and after) have the following characteristics:

  • Only one IDENTITY column per table
  • Must be of numeric data type
  • Cannot have a DEFAULT clause
  • It has GENERATED ALWAYS and GENERATED BY DEFAULT options:
    • With GENERATED ALWAYS, values are always generated by Oracle for this column, and any attempt to insert a value to this column will throw an error
    • With GENERATED BY DEFAULT, the value will only be generated by Oracle if the user doesn’t provide a value or provide a NULL value to this column

Let’s look at some examples of these options.

GENERATED ALWAYS

The following code is an example of GENERATED ALWAYS:

CREATE TABLE name (
ID INT GENERATED ALWAYS AS IDENTITY,
Firstname VARCHAR(40) NOT NULL,
PRIMARY KEY (ID)
);
INSERT INTO name(Firstname) VALUES ('John');
INSERT INTO name(Firstname) VALUES ('Mary');
INSERT INTO name(Firstname) VALUES ('Peter');
SELECT * FROM name;
Id Firstname
1 John
2 Mary
3 Peter
INSERT INTO name(id, Firstname) VALUES (100, 'Jane');
SQL Error: ORA-32795: cannot insert into a generated always IDENTITY column

GENERATED BY DEFAULT

The following code is an example of GENERATED BY DEFAULT:

ALTER TABLE name
ALTER COLUMN id SET GENERATED BY DEFAULT;
INSERT INTO name(id, Firstname) VALUES (100, 'Jane');

Unlike the previous example that uses the GENERATED ALWAYS AS IDENTITY constraint, the preceding SELECT statement works perfectly fine:

SELECT * FROM name;
Id Firstname
1 John
2 Mary
3 Peter
100 Jane

Insert another record with a NULL value for the ID column:

INSERT INTO name(Id, Firstname) VALUES (NULL, 'Peter');
SQL Error: ORA-01400: cannot insert NULL into ("TEST"."NAME"."ID")

The preceding example demonstrates that Oracle will allow a user-supplied integer value for the IDENTITY column but won’t accept a NULL value using this GENERATED BY DEFAULT clause.

Unlike other DB engines, in Oracle the Identity column can be NULL and thus Oracle provides this additional Syntax for DEFAULT on NULL.

GENERATED BY DEFAULT ON NULL

The following code shows examples of GENERATED BY DEFAULT ON NULL:

Create a table:

CREATE TABLE name (
ID INT GENERATED BY DEFAULT ON NULL AS IDENTITY,
Firstname VARCHAR(40) NOT NULL
);

Insert a record:

INSERT INTO name(Firstname) VALUES ('John');

Select a record:

SELECT * FROM name;
Id Firstname
1 John

Insert another record with a NULL value for for ID column:

INSERT INTO name(Id, Firstname) VALUES (NULL,'Mary');

Select a record:

SELECT * FROM name;
Id Firstname
1 John
2 Mary

In this example, Oracle generated a value for the ID column, ignoring the NULL value supplied by the user.

IDENTITY column reseeding

In Oracle, you need reset the IDENTITY value if values are set manually:

ALTER TABLE name MODIFY id GENERATED BY DEFAULT AS IDENTITY
(START WITH 101);
INSERT INTO name(Firstname) VALUES ('Jack');
SELECT * FROM name;
Id Firstname
1 John
2 Mary
101 Jack

Summary

In this post, we demonstrated how different relational database management systems handle IDENTITY columns.
In Part 2 of this series, we’ll show how AWS DMS uses these different characteristics for target tables in their databases.


About the authors

Feng Cai is a Database Consultant at AWS. He has a wide background in providing database services to different clients. Feng is currently a member of the AWS Global Competency Center, providing homogeneous and heterogenous database migration support to customers.

Sindhusha Paturu is a Database Consultant with the AWS Professional Services Team. She has been supporting and enabling customers to migrate their database from on-premises data centers to the AWS Cloud and also to migrate from commercial database engines to open-source databases in Amazon.

Bala K is a Senior Database Consultant with the AWS and brings a vast experience of database technologies. Starting with mainframes, Bala has worked on all modern database engines like Oracle, SQL Server, DB2, Sybase, PostgreSQL, and Amazon Redshift. At AWS, he helps empower customers with their digital transformation and accelerate migration of their database workload to the AWS Cloud. He has successfully migrated a number of complex databases from on premises to the AWS Cloud.

Mohammad Waseem is a Database Consultant at AWS. He works as a database migration specialist, helping and enabling customers to build highly available, cost-effective database solutions and migrate their database workloads to the AWS Cloud. He works with AWS customers and partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.