AWS Database Blog
Handle IDENTITY columns in AWS DMS: Part 1
In relational database management systems, an IDENTITY column is a column in a table that is made up of values generated automatically by the database at the time of data insertion. Although different systems handle the implementation of IDENTITY columns differently, they share some common characteristics. In most cases, the value of the IDENTITY column is populated during insert and no modifications are allowed afterwards. Each new value is generated based on the current seed and increment. For an example, identity(1, 1) means the value starts at 1 and will increase by 1 for each insert.
There are some common misconceptions that IDENTITY columns enforce uniqueness. Although the value is unique in general, it can produce duplicates if the starting value gets reset manually. To enforce uniqueness on the column, you must include the appropriate constraint, like a primary key or unique key. Another misconception is that IDENTITY columns cannot have holes (gaps). Holes can happen when there is situation like transaction rollback, bulk delete (table truncation), or server restart.
You can use AWS Database Migration Service (AWS DMS) to migrate your databases to AWS. When tables with IDENTITY column are replicated, special care is required:
- In AWS DMS, the IDENTITY column value from the source database (generated automatically) is replicated to the target, which may cause database errors because the target database wants to generate the value on the same IDENTITY column
- When the application points to the target database after migration (AWS DMS is stopped during the cutover window), the database needs to generate new values for these IDENTITY columns with a new starting point (reseeding)
This two-part series can help you handle IDENTITY columns with AWS DMS during a database migration. In Part 1 (this post), we introduce how the IDENTITY column is implemented in four major relational database engines: SQL Server, MySQL, PostgreSQL, and Oracle. In Part 2, we discuss how to set up tables with the IDENTITY column as the AWS DMS target, and provide instructions to handle reseeding after cutover.
Let’s look at examples of how IDENTITY columns are implemented in different database management systems.
SQL Server
IDENTITY columns have the following characteristics in SQL Server:
- Each new value is generated based on the current seed and increment
- Each new value for a particular transaction is different from other concurrent transactions on the table
- Only one IDENTITY column can be created per table
The following is an example of how IDENTITY columns are used:
ID | FirstName |
1 | John |
2 | Mary |
3 | Peter |
SET IDENTITY_INSERT
SET IDENTITY_INSERT allows explicit values to be inserted into the IDENTITY column of a table.
To run a test to manually set the value for the IDENTITY column ID without IDENTITY_INSERT turned on (row got rejected), use the following code:
Run the same test with IDENTITY_INSERT turned on (row inserted successfully) with the following code:
IDENTITY column reseeding
According to the Microsoft documentation, if the value inserted is larger than the current IDENTITY value for the table, SQL Server automatically uses the new inserted value as the current IDENTITY value. See the following code:
You can see the value of ID is automatically set to 101 by SQL Server:
ID | FirstName |
1 | John |
2 | Mary |
3 | Peter |
100 | Jane |
101 | Jack |
MySQL
IDENTITY columns have the following characteristics in MySQL:
- The IDENTITY column equivalent is
AUTO_INCREMENT
. - The column declared as
AUTO_INCREMENT
must be declared as a primary or unique key - MySQL allows exactly one column per table as
AUTO_INCREMENT
. - The data type of the column declared as
AUTO_INCREMENT
must be an integer. DECIMAL and NUMERIC are not allowed. - Explicit insert of the
AUTO_INCREMENT
column value is allowed. - The
AUTO_INCREMENT
column value increment is always 1 when using implicit value generation from the database.
The following is an example of how IDENTITY columns are used:
ID | FirstName |
1 | John |
2 | Mary |
3 | Peter |
AUTO_INCREMENT
If we mention a value for the ID AUTO_INCREMENT
column, MySQL will increment the next value from the manually inserted value:
ID | FirstName |
1 | John |
2 | Mary |
3 | Peter |
100 | Jane |
ID | FirstName |
1 | John |
2 | Mary |
3 | Peter |
100 | Jane |
101 | Jack |
MySQL resumes from last inserted value.
PostgreSQL
IDENTITY columns have the following characteristics in PostgreSQL:
- The data type can be of type SMALLINT, INT or BIGINT.
- It has GENERATED ALWAYS and GENERATED BY DEFAULT options:
- GENERATED ALWAYS instructs PostgreSQL to always generate a value for the IDENTITY column. If you attempt to insert (or update) values into the GENERATED ALWAYS AS IDENTITY column, PostgreSQL will issue an error.
- GENERATED BY DEFAULT also instructs PostgreSQL to generate a value for the IDENTITY column. However, if a value is supplied for insert or update, PostgreSQL will use that value to insert into the IDENTITY column instead of using the system-generated value.
Let’s look at some examples of these options.
GENERATED ALWAYS
The following code is an example of GENERATED ALWAYS:
Id | Firstname |
1 | John |
2 | Mary |
3 | Peter |
To fix the preceding error, use the OVERRIDING SYSTEM VALUE clause as follows:
Id | Firstname |
1 | John |
2 | Mary |
3 | Peter |
100 | Jane |
GENERATED BY DEFAULT AS IDENTITY
Alternatively, you can use GENERATED BY DEFAULT AS IDENTITY:
Unlike the previous example that uses the GENERATED ALWAYS AS IDENTITY constraint, the preceding SELECT statement works perfectly fine:
Id | Firstname |
1 | John |
2 | Mary |
3 | Peter |
100 | Jane |
IDENTITY column reseeding
In PostgreSQL, you need reset the IDENTITY value if values are set manually:
Id | Firstname |
1 | John |
2 | Mary |
3 | Peter |
100 | Jane |
101 | Jack |
Oracle
Oracle 12c Release 1 (12.1) introduced the IDENTITY column to match the IDENTITY functionality of other database engines. A combination of triggers and sequences were used to achieve this identity functionality in earlier versions of Oracle, which we will not cover in this post.
IDENTITY columns in Oracle (version 12c and after) have the following characteristics:
- Only one IDENTITY column per table
- Must be of numeric data type
- Cannot have a DEFAULT clause
- It has GENERATED ALWAYS and GENERATED BY DEFAULT options:
- With GENERATED ALWAYS, values are always generated by Oracle for this column, and any attempt to insert a value to this column will throw an error
- With GENERATED BY DEFAULT, the value will only be generated by Oracle if the user doesn’t provide a value or provide a NULL value to this column
Let’s look at some examples of these options.
GENERATED ALWAYS
The following code is an example of GENERATED ALWAYS:
Id | Firstname |
1 | John |
2 | Mary |
3 | Peter |
GENERATED BY DEFAULT
The following code is an example of GENERATED BY DEFAULT:
Unlike the previous example that uses the GENERATED ALWAYS AS IDENTITY constraint, the preceding SELECT statement works perfectly fine:
Id | Firstname |
1 | John |
2 | Mary |
3 | Peter |
100 | Jane |
Insert another record with a NULL value for the ID column:
The preceding example demonstrates that Oracle will allow a user-supplied integer value for the IDENTITY column but won’t accept a NULL value using this GENERATED BY DEFAULT clause.
Unlike other DB engines, in Oracle the Identity column can be NULL and thus Oracle provides this additional Syntax for DEFAULT on NULL.
GENERATED BY DEFAULT ON NULL
The following code shows examples of GENERATED BY DEFAULT ON NULL:
Create a table:
Insert a record:
Select a record:
Id | Firstname |
1 | John |
Insert another record with a NULL value for for ID column:
Select a record:
Id | Firstname |
1 | John |
2 | Mary |
In this example, Oracle generated a value for the ID column, ignoring the NULL value supplied by the user.
IDENTITY column reseeding
In Oracle, you need reset the IDENTITY value if values are set manually:
Id | Firstname |
1 | John |
2 | Mary |
101 | Jack |
Summary
In this post, we demonstrated how different relational database management systems handle IDENTITY columns.
In Part 2 of this series, we’ll show how AWS DMS uses these different characteristics for target tables in their databases.
About the authors
Feng Cai is a Database Consultant at AWS. He has a wide background in providing database services to different clients. Feng is currently a member of the AWS Global Competency Center, providing homogeneous and heterogenous database migration support to customers.
Sindhusha Paturu is a Database Consultant with the AWS Professional Services Team. She has been supporting and enabling customers to migrate their database from on-premises data centers to the AWS Cloud and also to migrate from commercial database engines to open-source databases in Amazon.
Bala K is a Senior Database Consultant with the AWS and brings a vast experience of database technologies. Starting with mainframes, Bala has worked on all modern database engines like Oracle, SQL Server, DB2, Sybase, PostgreSQL, and Amazon Redshift. At AWS, he helps empower customers with their digital transformation and accelerate migration of their database workload to the AWS Cloud. He has successfully migrated a number of complex databases from on premises to the AWS Cloud.
Mohammad Waseem is a Database Consultant at AWS. He works as a database migration specialist, helping and enabling customers to build highly available, cost-effective database solutions and migrate their database workloads to the AWS Cloud. He works with AWS customers and partners to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.