AWS Database Blog

Migrate generated columns to PostgreSQL using AWS Database Migration Service

AWS launched Amazon Database Migration Accelerator (Amazon DMA) to accelerate your journey to AWS Databases and Analytics services and achieve cloud adoption benefits such as cost savings and performance improvements. In this post, we share Amazon DMA’s approach to migrate generated columns to PostgreSQL implementations, such as Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition using AWS Database Migration Service (AWS DMS).

What is a generated column?

A generated column is a column in a relational database table where its value is calculated on the fly by using either the values from other columns in the table or from a deterministic expression. Generated columns simplify queries, reduce errors, and offer a centralized, consistent mechanism to compute column values and keep them in sync. There are two types of generated columns:

  • Stored generated column – The value in a stored generated column is computed when the data is written (inserted or updated) and it occupies storage as if it were a normal column. A stored generated column is like a materialized view, except that it’s always updated automatically.
  • Virtual generated column – The value in a virtual generated column is computed when it’s read and it doesn’t occupy any storage. A virtual generated column is like a view.

Create generated columns

Most relational databases support generated columns. Microsoft SQL Server and MySQL support both stored and virtual generated columns, Oracle supports only virtual generated columns, and PostgreSQL (version 12 or higher) supports only stored generated columns. The following code snippets show how to create generated columns in these relational databases:

# Microsoft SQL Server
CREATE TABLE Parts 
(
Part_Id INT,
List_Price INT NOT NULL,
Cost INT NOT NULL,
Profit AS (List_Price - Cost) PERSISTED
);

# MySQL
CREATE TABLE Parts 
(
Part_Id INT,
List_Price INT NOT NULL,
Cost INT NOT NULL,
Profit INT AS (List_Price - Cost) STORED
);

# ORACLE
CREATE TABLE Parts 
(
Part_Id INT,
List_Price INT NOT NULL,
Cost INT NOT NULL,
Profit INT AS (List_Price - Cost)
);

# PostgreSQL (version 12 or higher)
CREATE TABLE Parts 
(
Part_Id INT,
List_Price INT NOT NULL,
Cost INT NOT NULL,
Profit INT GENERATED ALWAYS AS (List_Price - Cost) STORED
);

You can achieve the generated column functionality in PostgreSQL versions lower than 12 by using triggers or views with implicit or explicit functions to compute the generated columns. With triggers, you can define the generated column as a real column and use the trigger to update the column. See the following code:

# PostgreSQL (version < 12)
CREATE TABLE Parts 
(
Part_Id INT,
List_Price INT NOT NULL,
Cost INT NOT NULL,
Profit INT
);

CREATE FUNCTION calc_profit() RETURNS TRIGGER AS $$
BEGIN
	NEW.Profit = new.List_Price - new.Cost;
	RETURN NEW;
END;
$$
language plpgsql;

CREATE TRIGGER calc_profit_trigger 
BEFORE INSERT OR UPDATE ON Parts 
FOR EACH ROW EXECUTE PROCEDURE calc_profit();

Alternatively, you can create a view on the table, which provides generated columns in place and populates them with implicit or explicit function values:

# Using Implicit Function
CREATE VIEW part_View AS SELECT Part_Id, List_Price, Cost, (List_Price – Cost) as Profit FROM Parts;

# Using Explicit Function
CREATE VIEW part_View AS SELECT Part_Id, List_Price, cost, f(List_Price, Cost) as Profit FROM Parts;

Update generated columns

The values in the generated column are auto-generated and they can’t be manually inserted or updated. For example, if you try to insert or update a value in the generated column, you get an error:

>> INSERT INTO parts (Part_Id, List_Price, Cost, Profit) VALUES (12, 20, 0, 18);
SQL Error [42601]: ERROR: cannot insert into column "profit"
Detail: Column "Profit" is a generated column.

Instead, you can insert or update the values of all other columns, and the value of the generated column will get updated automatically, as shown in the following code snippet:

>> INSERT INTO parts (Part_Id, List_Price, Cost) VALUES (12, 20, 2);
>> SELECT * FROM Parts
     Part_Id	List_Price	Cost	Profit
     9	        20	        10	    10
    10          20          5	    15
    11          20          5	    15
    12          20          2	    18

In addition, a generated column can be added or dropped from a table. For example, in PostgreSQL (version 12 or higher) you can run the following commands:

>> SELECT * FROM Parts
     Part_Id	List_Price   Cost	Profit
     9          20           10		10
    10          20           5		15
    11          20           5		15
    12          20           2		18
>> ALTER TABLE parts DROP Profit;
>> SELECT * FROM Parts
     Part_Id	List_Price	Cost	
     9          20			10	
    10          20          5	
    11          20          5	
    12          20          2
>> ALTER TABLE Parts ADD Gross_Profit INT GENERATED ALWAYS AS (List_Price - Cost) stored;
>> SELECT * FROM Parts
     Part_Id	List_Price	Cost	Gross_Profit
     9          20          10		10
    10          20          5		15
    11          20          5		15
    12          20          2		18

Migrate generated columns to PostgreSQL using AWS DMS

You can use AWS DMS to migrate relational databases such as Oracle or Microsoft SQL Server to PostgreSQL implementations, such as Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition. AWS DMS supports both full load (the entire table) and CDC (change data capture, which are incremental changes to the target table) migrations, as shown in the following diagram.

However, if the tables in the target database contain generated columns, AWS DMS will give the following error while migrating a table with the generated column:

ERROR: extra data after last expected column

You can mitigate this challenge by using one of the following options.

Full load or Change Data Capture (CDC)

For PostgreSQL versions lower than 12, you can use full load or CDC with the following actions:

  • Either use Views or create Triggers and add the generated column as a real column to the target table.
  • Use AWS DMS to migrate the data from the source table to the target table

Full load

For PostgreSQL versions 12 or higher, you can use full load for the following actions:

  • Modify the target table – Complete the following steps:
    • Add all columns from the source table to the target table, except the generated columns.
    • Migrate the data from the source to the target table using AWS DMS.
    • Add the generated columns in the target table.
  • Disable the copy command with AWS DMS – Prior to starting the AWS DMS migrations, disable the copy command option by setting LoadUsingCSV=false. In this case, AWS DMS uses the SQL command to insert the source records into the target table. This process is generally slower than the other options because the rows are inserted with one row at a time.
  • Use the copy command and skip AWS DMS – Instead of using AWS DMS to migrate the source table containing generated columns, you can export the source table to a CSV file and use PostgreSQL’s copy command directly to load the data to the target table:
>> oracle=# SELECT * FROM Parts
     Part_Id	List_Price	Cost	Profit
     9          20           10		10
    10          20           5		15
    11          20           5		15
    12          20           2		18
# Copy the data from the above source table to data.csv file
>> postgres=# \COPY Auto_Parts FROM data.csv;
>> postgres=# SELECT * FROM Auto_Parts
     Part_Id	List_Price	Cost	Profit
     9           20         10		10
    10           20         5		15
    11           20         5		15
    12           20         2		18

Change data capture

For PostgreSQL version 12 or higher, you can configure AWS DMS in CDC mode to ignore the generated columns on the target. Then you can migrate the data from the source to the target table.

Conclusion

In this post, we shared Amazon DMA’s approach to migrate generated columns to PostgreSQL implementations, such as Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition using AWS DMS.

Amazon DMA offers complementary advisory services to create migration strategy and implementation plans, and enables your in-house migration team (or Amazon Professional Services or APN Partner, if involved) to conduct migration implementation. If you are planning to migrate your workloads to AWS Databases and Analytics Services, email DMA-sales@amazon.com to engage the Amazon DMA team.


About the Authors

Ezat Karimi is a Sr. Solution Architect at Amazon Database Migration Accelerator.

Sharath Gopalappa is a Sr. Product Manager Technical at Amazon Database Migration Accelerator.