AWS Database Blog

Migrate Oracle global unique indexes in partitioned tables to Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL

When you migrate from Oracle to Amazon Aurora PostgreSQL-Compatible Edition or Amazon Relational Database Service (Amazon RDS) for PostgreSQL, you may need to deal with partitioned tables and ensure global uniqueness across partitions.

Oracle Database supports global indexes. A global index contains keys from multiple table partitions in a single index partition. Each table partition is a separate sub-table that stores a subset of data based on a defined partition key, and each index partition stores a subset of indexed data based on the partition key, which may or may not be the same as the table partition key. These global indexes are commonly used for online transaction processing (OLTP) environments because they offer efficient access to individual records.

Partitioned tables in native PostgreSQL, Amazon RDS for PostgreSQL, or Amazon Aurora offer several benefits, such as improved query performance (particularly when most of the heavily accessed rows of the table are in a single partition or a small number of partitions) and the ability to add or remove partitions to accomplish bulk loads and deletes. However, they don’t provide a built-in mechanism for enforcing global uniqueness across all partitions when the primary key and partition key differ.

Due to the nature of the data or business requirements, the columns that are best suited for partitioning (the partition key) may not be the same as the columns that should be used for enforcing primary key uniqueness. For example, we might partition a table of sales data by date, but the primary key may be based on a combination of product ID and transaction ID. When the partition key and primary key columns are different, it’s possible for the same combination of values in the primary key columns to exist in multiple partitions. In such cases, the primary key constraint is enforced separately within each partition. This means that duplicate primary key values could exist in different partitions without violating the primary key constraint.

The PostgreSQL community’s recommendation is to create a unique key constraint on a partitioned table by including all of the partition key columns. This makes sure that the partition table structure has no duplicates in different partitions.

In this post, we explore the problem of achieving global uniqueness and demonstrate possible solutions to consider while migrating from Oracle partitioned tables with global unique indexes to PostgreSQL, Amazon RDS for PostgreSQL, or Amazon Aurora.

Solution overview

Our use case for this post involves an Amazon RDS for Oracle database as the source and an Amazon Aurora PostgreSQL-Compatible database as the target. In this use case, we define Oracle’s partitioned table with global unique indexes and explore the step-by-step approach to achieve global uniqueness in the PostgreSQL environment.

At a high level, the solution steps are as follows:

  1. Deploy the table structures into a source RDS for Oracle DB instance.
  2. Convert the table structures into the Amazon Aurora equivalent using AWS Schema Conversion Tool (AWS SCT).
  3. Deploy the converted table structures into the target Amazon Aurora database.
  4. Choose one of the possible solutions discussed in this post and follow the step-by-step approach to enforce global uniqueness:
    1. Enforce uniqueness across partitions by defining partition keys using unique index composite columns.
    2. Enforce uniqueness across partitions by restructuring the base partitioned table.
    3. Enforce uniqueness by creating a flat table instead of a partition table.

The following diagram illustrates the migration architecture.

Solution Overview

The architecture consists of the following components:

  • A source RDS for Oracle DB instance
  • AWS SCT for converting Oracle functions to the PostgreSQL equivalent
  • A target Amazon Aurora PostgreSQL-Compatible database

Prerequisites

To implement this solution, you must have an RDS for Oracle DB instance and Amazon Aurora PostgreSQL-Compatible database with version 11 or above inside a VPC. We used Oracle Database 19c (19.0.0.0) on Amazon RDS and PostgreSQL Database (14.8) on Aurora for this demonstration.

Schema table structure in Amazon RDS for Oracle

For our business use case, a customer operates in the sports and entertainment industry. They manage a substantial number of sporting event tickets and collect detailed data on tickets for analytical and business purposes. Their ticket_purchase_hist table has grown significantly, accumulating a large volume of historical data. The table is partitioned by range based on the transaction_date column, with partitions created for specific time periods. A global partition index is created on the ticket_purchase_hist table, using the partitioned structure because it provides efficient access to data across partitions, optimizing queries involving date ranges.

The schema table structure in Amazon RDS for Oracle is as follows:

  • TICKET_PURCHASE_HIST:
    • Range partitioned on the transaction_date field with three partitions
    • Constraints:
      • Primary key: ticket_purchase_hist_pk
      • Global unique index: ticket_purchase_hist_uk
      • Foreign keys:
        • ticket_purchase_hist_fk1
        • ticket_purchase_hist_fk2
CREATE TABLE ticket_purchase_hist ( 
    sporting_event_ticket_id NUMBER NOT NULL,
    purchased_by_id NUMBER NOT NULL,
    transferred_from_id NUMBER NOT NULL,
    location_id NUMBER NOT NULL,
    purchase_price NUMBER NOT NULL,
    transaction_date DATE NOT NULL,
    CONSTRAINT ticket_purchase_hist_pk PRIMARY KEY (sporting_event_ticket_id),
    CONSTRAINT ticket_purchase_hist_uk UNIQUE (purchased_by_id, transferred_from_id),
    CONSTRAINT ticket_purchase_hist_fk1 FOREIGN KEY (location_id) REFERENCES location(location_id), 
    CONSTRAINT ticket_purchase_hist_fk2 FOREIGN KEY (transferred_from_id) REFERENCES
person(id) 
) PARTITION BY RANGE(transaction_date) (
    PARTITION tph_01_2020 VALUES LESS THAN ('01-FEB-2020'),
    PARTITION tph_02_2020 VALUES LESS THAN ('01-MAR-2020'),
    PARTITION tph_03_2020 VALUES LESS THAN ('01-APR-2020')
);
  • LOCATION:
    • Referenced by the ticket_purchase_hist table
CREATE TABLE location ( 
    location_id NUMBER NOT NULL,
    location_name VARCHAR2(100) NOT NULL,
    CONSTRAINT location_pk PRIMARY KEY (location_id)
);
  • PERSON:
    • Referenced by the ticket_purchase_hist table
CREATE TABLE person ( 
    id NUMBER NOT NULL,
    name VARCHAR2(100) NOT NULL,
    CONSTRAINT person_pk PRIMARY KEY (id)
);
  • EVENT_HIST:
    • Referred to the ticket_purchase_hist primary key column sporting_event_ticket_id
CREATE TABLE event_hist ( 
event_id NUMBER NOT NULL,
event_ticket_id NUMBER NOT NULL,
CONSTRAINT event_hist_pk PRIMARY KEY (event_id),
CONSTRAINT event_hist_fk FOREIGN KEY (event_ticket_id) REFERENCES ticket_purchase_hist (sporting_event_ticket_id)
);

When we convert these tables using AWS SCT and deploy them to Amazon Aurora, we get the following reported errors because it expected unique constraint columns to be part of the partitioning columns:

ERROR:  unique constraint on partitioned table must include all partitioning columns
DETAIL:  PRIMARY KEY constraint on table "ticket_purchase_hist" lacks column "transaction_date" which is part of the partition key.
SQL state: 0A000

In PostgreSQL, to create a unique or primary key constraint on a partitioned table, the constraint’s columns must include all the partition key columns. This limitation exists because the individual indexes making up the constraint can only directly enforce uniqueness within their own partitions. Therefore, the partition structure itself must guarantee that there are no duplicates in different partitions.

To achieve migration of these table structures from Oracle to PostgreSQL, we discuss a few approaches that enforce uniqueness and overcome this PostgreSQL limitation.

Approach 1: Enforce uniqueness across partitions by defining partition keys using unique index composite columns

This approach is suitable when you want to maintain the benefits of partitioning for data management and query performance but need to enforce uniqueness across partitions.

Consider using this approach when your partitioned table has a partitioning key that doesn’t align with the primary key constraint columns. You can create a unique composite index on the required columns and partition the table based on these columns.

A benefit of this approach is that it allows you to maintain the advantages of partitioning while ensuring global uniqueness.

This approach is based on changing the partition key and partition strategy of the table. The following are the key points of this approach:

  • As suggested in the PostgreSQL documentation, to enforce uniqueness across partitions, a composite partition key can be created using global unique index columns (such as purchased_by_id and transferred_from_id in our example).
  • Range partitions are generally used with dates because modified partition columns are based on non-date (ID) columns. A better partition strategy would be to use hashes, because hash partitioning enables easy partitioning of data that doesn’t lend itself to range or list partitioning.
  • Primary keys or any other unique constraints (if present on the table) have to be created at the individual partition, and it enforces uniqueness only within its partition. Furthermore, it’s not possible to define foreign references on the primary key column (sporting_event_ticket_id) of the table in PostgreSQL.

Schema tables structure in Aurora PostgreSQL

The schema tables have the following structure:

  • TICKET_PURCHASE_HIST:
    • Hash-partitioned table on columns purchased_by_id and transferred_from_id with three partitions to accommodate all possible hash values
    • Constraints:
      • Unique index: ticket_purchase_hist_uk
      • Foreign key constraints:
        • ticket_purchase_hist_fk1
        • ticket_purchase_hist_fk2
      • Primary keys:
        • ticket_purchase_hist_p1_pk
        • ticket_purchase_hist_p2_pk
CREATE TABLE ticket_purchase_hist (     
    sporting_event_ticket_id BIGINT NOT NULL,
    purchased_by_id BIGINT NOT NULL,
    transferred_from_id BIGINT NOT NULL,
    location_id BIGINT NOT NULL,
    purchase_price BIGINT NOT NULL,
    transaction_date DATE NOT NULL,
    CONSTRAINT ticket_purchase_hist_uk UNIQUE (purchased_by_id, transferred_from_id),
    CONSTRAINT ticket_purchase_hist_fk1 FOREIGN KEY (location_id) REFERENCES location(location_id), 
    CONSTRAINT ticket_purchase_hist_fk2 FOREIGN KEY (transferred_from_id) REFERENCES person(id) 
) PARTITION BY HASH (purchased_by_id, transferred_from_id);

CREATE TABLE ticket_purchase_hist_p1 PARTITION OF ticket_purchase_hist
 (CONSTRAINT ticket_purchase_hist_p1_pk PRIMARY KEY (sporting_event_ticket_id))
 FOR VALUES WITH (modulus 3, remainder 0);

CREATE TABLE ticket_purchase_hist_p2 PARTITION OF ticket_purchase_hist
 (CONSTRAINT ticket_purchase_hist_p2_pk PRIMARY KEY (sporting_event_ticket_id))
 FOR VALUES WITH (modulus 3, remainder 1);

CREATE TABLE ticket_purchase_hist_p3 PARTITION OF ticket_purchase_hist
 (CONSTRAINT ticket_purchase_hist_p3_pk PRIMARY KEY (sporting_event_ticket_id))
 FOR VALUES WITH (modulus 3, remainder 2);
  • LOCATION:
    • Referenced by the ticket_purchase_hist_pkt table
CREATE TABLE location ( 
    location_id BIGINT NOT NULL,
    location_name CHARACTER VARYING(100) NOT NULL,
    CONSTRAINT location_pk PRIMARY KEY (location_id)
);
  • PERSON:
    • Referenced by the ticket_purchase_hist_ukt table
CREATE TABLE person ( 
    id BIGINT NOT NULL,
    name CHARACTER VARYING(100) NOT NULL,
    CONSTRAINT person_pk PRIMARY KEY (id)
);
  • EVENT_HIST:
    • Foreign reference to the TICKET_PURCHASE_HIST table primary key column (sporting_event_ticket_id) isn’t possible because the primary key is moved to its individual partitions, so trying to create such foreign reference results in the SQL error “SQL Error [42830]: ERROR: there is no unique constraint matching given keys for referenced table "ticket_purchase_hist"”
CREATE TABLE event_hist ( 
    event_id BIGINT NOT NULL,
    event_ticket_id BIGINT NOT NULL,
    CONSTRAINT event_hist_pk PRIMARY KEY (event_id)
);

Sample data and uniqueness validation

Because the purchased_by_id and transferred_from_id columns are part of the partition key, PostgreSQL ensures uniqueness of these columns across partitions. To demonstrate this, let’s insert some data into the table:

INSERT INTO person 
SELECT i, 'name'||i FROM generate_series(1,10) i; 

INSERT INTO location 
SELECT i, 'location'||i FROM generate_series(1,10) i;
 
INSERT INTO ticket_purchase_hist
SELECT i,i,i,i,i,'2020-01-05'::DATE + i*30 FROM generate_series(1,5) i; 
 
SELECT tableoid::regclass, * FROM ticket_purchase_hist;

Inserting a record with the same purchased_by_id and transferred_from_id values again should result in a SQL error:

INSERT INTO ticket_purchase_hist VALUES (6,1,1,6,6,'2020-07-05'::DATE);

We get the following output:

SQL Error [23505]: ERROR: duplicate key value violates unique constraint "ticket_purchase_hist_p1 purchased_by_id_transferred_from_id_key"¶
   Detail: Key (purchased_by_id, transferred_from_id)=(1, 1) already exists.

Therefore, restructuring the table to define partitions using Oracle global unique index columns ensures uniqueness across partitions in Amazon Aurora.

Partition pruning and index considerations

Note the following in regards to partition pruning and indexes:

  • In a hash-partitioned table created with composite columns in its partition key, partition pruning is possible only when all columns of the partition key are used in a query. This is because the hash of each column value that is part of the partition key is individually calculated and then combined to get a single 64-bit hash value. The modulus operation is performed on this hash value and the remainder is used to determine the partition for the inserted row.
  • Only equality operators support partition pruning because the < or > operators scan all the partitions due to the manner of tuple distribution in a hash-partitioned table.
  • There is no special handling for NULL values. A hash value is generated and combined as explained earlier to find the partition for the row to be inserted. However, all unique key columns have a NOT NULL check, so this case should not arise.

Caveats

Note the following limitations:

  • Primary keys can maintain uniqueness only within each partition, not across all partitions.
  • Primary keys can no longer be part of the base partitioned table and foreign key references can’t be made to the primary key column of the base partitioned table. Although foreign keys now can refer to the primary key that is moved to each partition table, inserts will fail to identify records if it appears in other partition, because when being inserting to a particular partition, it can’t refer to the data present in other partition.

Approach 2: Enforce uniqueness across partitions by restructuring the base partitioned table

Another way to solve the global unique index problem is to restructure the table data definition language (DDL) while moving to PostgreSQL.

This approach is suitable when you have more flexibility in redefining the table structure and partitioning strategy and can align the primary key columns with the partitioning key.

Consider restructuring the table when the existing partitioning and primary key design don’t align well. You must modify the table schema and partitioning strategy to ensure that the primary key enforces uniqueness across partitions.

This approach provides a cleaner and more intuitive solution by aligning the partitioning key with the primary key columns. It simplifies data management and maintenance.

The following are the key steps of this approach:

  • From the base table (ticket_purchase_hist), drop the unique constraint columns (purchased_by_id and transferred_from_id) and partition it by hash or range using the primary key (sporting_event_ticket_id) column.
  • Create an additional table with unique constraint columns (purchased_by_id and transferred_from_id) along with the existing primary key column sporting_event_ticket_id and partition key transaction_date column to further partition the table by hash or range using purchased_by_id and transferred_from_id as partition keys and move sporting_event_ticket_id (primary key) to individual partitions.
  • Create a foreign key relation between these two using sporting_event_ticket_id.

Schema table structure in Aurora PostgreSQL

The following figure illustrates the schema table structure.

The schema table structure details are as follows:

  • TICKET_PURCHASE_HIST:
    • Restructured as two tables (for our demonstration, we use TICKET_PURCHASE_HIST_PKT and TICKET_PURCHASE_HIST_UKT)
    • TICKET_PURCHASE_HIST_PKT has the primary key column and other data (except for the unique constraint columns) and is partitioned by hash on the primary key column
    • TICKET_PURCHASE_HIST_UKT has the unique constraint columns and primary key column (which will be moved to each partition table) and is partitioned by hash on unique constraint columns
    • Data integrity is maintained between these tables by defining a foreign key constraint from TICKET_PURCHASE_HIST_UKT on the column sporting_event_ticket_id to the TICKET_PURCHASE_HIST_PKT table.
CREATE TABLE ticket_purchase_hist_pkt ( 

    sporting_event_ticket_id BIGINT NOT NULL,
    location_id BIGINT NOT NULL,
    purchase_price BIGINT NOT NULL,
    transaction_date DATE NOT NULL, 
    CONSTRAINT ticket_purchase_hist_pkt_pk PRIMARY KEY (sporting_event_ticket_id),
    CONSTRAINT ticket_purchase_hist_pkt_fk FOREIGN KEY (location_id) REFERENCES location(location_id)
) PARTITION BY HASH (sporting_event_ticket_id);

CREATE TABLE ticket_purchase_hist_pkt_p1 PARTITION OF ticket_purchase_hist_pkt FOR VALUES WITH (modulus 3, remainder 0);
CREATE TABLE ticket_purchase_hist_pkt_p2 PARTITION OF ticket_purchase_hist_pkt FOR VALUES WITH (modulus 3, remainder 1);
CREATE TABLE ticket_purchase_hist_pkt_p3 PARTITION OF ticket_purchase_hist_pkt FOR VALUES WITH (modulus 3, remainder 2);

 
CREATE TABLE ticket_purchase_hist_ukt ( 
sporting_event_ticket_id BIGINT NOT NULL,
purchased_by_id BIGINT NOT NULL,
transferred_from_id BIGINT NOT NULL,
transaction_date DATE NOT NULL,
CONSTRAINT ticket_purchase_hist_ukt_uk UNIQUE (purchased_by_id, transferred_from_id),
CONSTRAINT ticket_purchase_hist_ukt_fk FOREIGN KEY (transferred_from_id) REFERENCES person(id),
CONSTRAINT ticket_purchase_hist_ukt_pkt_link FOREIGN KEY (sporting_event_ticket_id) REFERENCES ticket_purchase_hist_pkt (sporting_event_ticket_id) 
) PARTITION BY HASH (purchased_by_id, transferred_from_id);

CREATE TABLE ticket_purchase_hist_ukt_p1 PARTITION OF ticket_purchase_hist_ukt
(CONSTRAINT ticket_purchase_hist_ukt_p1_pk PRIMARY KEY (sporting_event_ticket_id))
FOR VALUES WITH (modulus 3, remainder 0);
CREATE TABLE ticket_purchase_hist_ukt_p2 PARTITION OF ticket_purchase_hist_ukt
(CONSTRAINT ticket_purchase_hist_ukt_p2_pk PRIMARY KEY (sporting_event_ticket_id))
FOR VALUES WITH (modulus 3, remainder 1);
CREATE TABLE ticket_purchase_hist_ukt_p3 PARTITION OF ticket_purchase_hist_ukt
(CONSTRAINT ticket_purchase_hist_ukt_p3_pk PRIMARY KEY (sporting_event_ticket_id))
FOR VALUES WITH (modulus 3, remainder 2);
  • LOCATION:
    • Referenced by the ticket_purchase_hist_pkt table
CREATE TABLE location ( 
    location_id BIGINT NOT NULL,
    location_name CHARACTER VARYING(100) NOT NULL,
    CONSTRAINT location_pk PRIMARY KEY (location_id)
);
  • PERSON:
    • Referenced by the ticket_purchase_hist_ukt table
CREATE TABLE person ( 
    id BIGINT NOT NULL,
    name CHARACTER VARYING(100) NOT NULL,
    CONSTRAINT person_pk PRIMARY KEY (id)
);
  • EVENT_HIST:
    • Referred to the TICKET_PURCHASE_HIST_PKT primary key column (sporting_event_ticket_id)
CREATE TABLE event_hist ( 
            event_id BIGINT NOT NULL,
    event_ticket_id BIGINT NOT NULL,
    CONSTRAINT event_hist_pk PRIMARY KEY (event_id),
    CONSTRAINT event_hist_fk FOREIGN KEY (event_id) REFERENCES ticket_purchase_hist_pkt(sporting_event_ticket_id)
);

Sample data and uniqueness validation

Because the purchased_by_id and transferred_from_id columns are part of the partition key in the TICKET_PURCHASE_HIST_UKT table, PostgreSQL ensures uniqueness of these columns across partitions and validates the sporting_event_ticket_id column using referential integrity between the TICKET_PURCHASE_HIST_PKT and TICKET_PURCHASE_HIST_UKT tables. To demonstrate this, let’s insert some data into the table:

INSERT INTO person 
SELECT i, 'name'||i FROM generate_series(1,10) i; 

INSERT INTO location 
SELECT i, 'location'||i FROM generate_series(1,10) i;
 
INSERT INTO ticket_purchase_hist_pkt
SELECT i,i,i,'2020-01-05'::DATE + i*30 FROM generate_series(1,5) i; 

INSERT INTO ticket_purchase_hist_ukt
SELECT i,i,i,'2020-01-05'::DATE + i*30 FROM generate_series(1,5) i; 
 
SELECT tableoid::regclass, * FROM ticket_purchase_hist_ukt;

Inserting a record with the same purchased_by_id and transferred_from_id values again should result in a SQL error:

INSERT INTO ticket_purchase_hist_ukt VALUES (6,1,1,'2020-07-05'::DATE);

We get the following output:

SQL Error [23505]: ERROR: duplicate key value violates unique constraint "ticket_purchase_hist_ukt_p1_purchased_by_id_transferred_fro_key"
  Detail: Key (purchased_by_id, transferred_from_id)=(1, 1) already exists.

Therefore, restructuring the table to define partitions using Oracle global unique index columns ensures uniqueness across partitions in PostgreSQL.

It’s recommended to use this approach if restructuring of the base table can be considered as an option by looking at all other dependencies on the database schema and application-level code changes.

Caveats

You can resolve the following issues from the previous approach using this restructuring approach:

  • Primary key uniqueness can be maintained across partitions by defining referential integrity, and the primary key can be validated at the TICKET_PURCHASE_HIST_PKT table before data inserts into the TICKET_PURCHASE_HIST_UKT table.
  • Because the primary key is still part of base partition table, all foreign keys from referenced tables can still be referenced. Foreign keys that are referring to unique constraint columns (purchased_by_id and transferred_from_id) can refer the columns from the TICKET_PURCHASE_HIST_UKT table.

Partition pruning and index consideration

Pruning scenarios work the same in the previous approach. To achieve the same functionality as it was before the table split, an appropriate join needs to put in queries to select data based on your requirement.

Table restructuring at source

In case of table restructuring in the target PostgreSQL database, all business logic and code conversions are required to be performed manually. Another possibility is to redesign the table the same way as the source Oracle database by a user who already has business knowledge of the table and related code so that AWS SCT can convert the table and its related code into the PostgreSQL equivalent without much manual conversion effort.

Approach 3: Enforce uniqueness by creating a flat table instead of partition table

In the previous approach, we saw that restructuring the base table resulted in a solution nearly equivalent to Oracle behavior, but it further entailed manual code conversion effort with changes to the underlying business logic at the database schema and application level.

In a case where table data volumes are of manageable size and archival is not the only reason for partitioning, creating a flat (non-partitioned) table might be considered as one possible approach.

This approach is suitable when partitioning is not a strict requirement for your data management and you can work with a single flat table.

Consider using a flat table when you don’t have specific partitioning needs and the primary key columns should enforce uniqueness across all records.

This approach simplifies the data model by avoiding the complexities of partitioning. If partitioning isn’t providing significant benefits for your use case, a flat table can be more straightforward to work with.

Schema table structure in Aurora PostgreSQL

The following figure illustrates the schema table structure.

The schema table structure details are as follows:

  • TICKET_PURCHASE_HIST:
    • A flat heap table (no partitions)
    • Contains constraints (primary key and unique indexes)
    • Contains a foreign key constraint to other tables
CREATE TABLE ticket_purchase_hist ( 
    sporting_event_ticket_id BIGINT NOT NULL,
    purchased_by_id BIGINT NOT NULL,
    transferred_from_id BIGINT NOT NULL,
    location_id BIGINT NOT NULL,
    purchase_price BIGINT NOT NULL,
    transaction_date DATE NOT NULL,
    CONSTRAINT ticket_purchase_hist_p1_pk PRIMARY KEY (sporting_event_ticket_id),
    CONSTRAINT ticket_purchase_hist_uk UNIQUE (purchased_by_id, transferred_from_id),
    CONSTRAINT ticket_purchase_hist_fk1 FOREIGN KEY (location_id) REFERENCES location(location_id), 
    CONSTRAINT ticket_purchase_hist_fk2 FOREIGN KEY (transferred_from_id) REFERENCES person(id) 
);
  • LOCATION:
    • Referenced by the TICKET_PURCHASE_HIST table
CREATE TABLE location ( 
    location_id BIGINT NOT NULL,
    location_name CHARACTER VARYING(100) NOT NULL,
    CONSTRAINT location_pk PRIMARY KEY (location_id)
);
  • PERSON:
    • Referenced by the TICKET_PURCHASE_HIST table
CREATE TABLE person ( 
    id BIGINT NOT NULL,
    name CHARACTER VARYING(100) NOT NULL,
    CONSTRAINT person_pk PRIMARY KEY (id)
);
  • EVENT_HIST:
    • Referenced by the TICKET_PURCHASE_HIST table
CREATE TABLE event_hist ( 
            event_id BIGINT NOT NULL,
    event_ticket_id BIGINT NOT NULL,
    CONSTRAINT event_hist_pk PRIMARY KEY (event_id),
    CONSTRAINT event_hist_fk FOREIGN KEY (event_id) REFERENCES   ticket_purchase_hist (sporting_event_ticket_id)
);

Caveats

In this section, we discuss key considerations for this approach.

Table size

The maximum size allowed in a PostgreSQL database is 32 TB (as of this writing), which is based on block size, and the default block size is 8,192 bytes. PostgreSQL block size is configurable and can be increased to 32,768 bytes per block, which would give a maximum size of 128 TB. Note that changing requires an unload and reload to change the block size.

Indexes

The maintenance_work_mem parameter is used for building indexes. Setting it to an appropriate value is important to achieve better index performance. Another feature of PostgreSQL that you can use here is to parallelize a single index build. You can create multiple indexes on the same table at the same time (using different database connections to invoke each one) and can get some degree of informal parallelization.

VACUUM

PostgreSQL uses a VACUUM process to free the space used by old invisible tuple versions and to reclaim storage. Rows updated and deleted in a table are marked as dead tuples to be cleaned up at a later point by the VACUUM process. If a huge table is facing such a situation, instead of waiting to run a manual VACUUM or a VACUUM FULL, you can configure AUTOVACUUM, a periodic background utility daemon that automates the removal of deleted and older updated tuples. This is important to avoid leading to table bloat. Two important parameters are autovacuum_vacuum_threshold (default value is 50) and autovacuum_vacuum_scale_factor (default value is 0.2).

For a small-to-medium-sized table, this works fine, For example, for a 10,000-row table, AUTOVACUUM starts after ((10,000 x 0.2) + 50) = 2,050 dead rows, which looks fine. However, for a table with 1 million rows, AUTOVACUUM starts after ((1,000,000 x 0.2) + 50) = 200,050 dead rows, which could take AUTOVACUUM a much longer time to run VACUUM. Therefore, setting these parameters to an appropriate value for a large single table could help in better VACUUM management.

Maintenance

With huge data being stored in databases, performance and scalability are two main factors that can be affected. As the table size increases with data load, more data scanning, swapping pages to memory, and other table operation costs increase. Partitioning is a good solution because it can help divide a large table into smaller tables, thereby reducing table scans and memory swap problems, which ultimately increases performance.

In this approach, to achieve global uniqueness, we moved away from a partition table and created a flat table. Therefore, it’s recommended to have a proper approach to handle archiving and not let table size increase with data load. There are other options that can also be considered; for example, a logical replication can be set on the table to get each modified record to an archival table. You can schedule a database job to archive the data to another table.

Conclusion

In this post, we shared options to implement a solution for global uniqueness across partitions while migrating Oracle partitioned tables to Amazon RDS for PostgreSQL or Amazon Aurora PostgreSQL-Compatible Edition. Each approach has its advantages, and the recommendation will vary based on factors such as the existing schema, data volume, query patterns, and the importance of maintaining partitioning for performance and data. Based on your requirements and the system’s ability to adjust to a few caveats, you can adopt any of the approaches to best fit your migration needs.

Leave any thoughts or questions in the comments section.


About the Authors

Bhanu Ganesh Gudivada is a Database Consultant on the AWS Professional Services team at AWS and specializes in database migrations. He is helping and enabling customers to build high-available, cost-effective database solutions and migrate their commercial engines to the AWS cloud. Curious to learn and implement new technologies around the databases and orchestrate migrations through automation.

Rajkumar Raghuwanshi is a Database Consultant with AWS Professional Services based out of Pune, India. With good knowledge on relational databases adding hands-on in homogenous and heterogenous database migrations, helps customers in migrating to AWS cloud and their optimizations.

Rajeshkumar Sabankar is a Database Specialty Architect with Amazon Web Services. He works with internal Amazon customers to build secure, scalable and resilient architectures in AWS cloud and help customers perform migrations from on-premise databases to AWS RDS and Aurora Databases.