AWS Database Blog
PostgreSQL logical replication: How to replicate only the data that you need
Many organizations using PostgreSQL need to replicate data between distinct PostgreSQL logical databases, and want to control exactly what gets copied. A healthcare platform might need to send appointment data to an analytics database without exposing Protected Health Information (PHI). A multi-tenant software as a service (SaaS) application might need separate read replicas containing only a single customer’s data.
Starting with PostgreSQL 15, native logical replication supports both column-level and row-level filtering directly within publications. You can control exactly which columns and rows flow from a PostgreSQL source database to a PostgreSQL target database, without middleware or custom code. Note that this is a PostgreSQL-to-PostgreSQL replication solution, so both the source and target must be PostgreSQL instances.
In this post, we show how logical replication with fine-grained filtering works in PostgreSQL, when to use it, and how to implement it using a realistic healthcare compliance scenario. Whether you’re running Amazon Relational Database Service (Amazon RDS) for PostgreSQL, Amazon Aurora PostgreSQL-Compatible Edition, or a self-managed PostgreSQL database on an Amazon Elastic Compute Cloud (Amazon EC2) instance, the approach is the same.
How logical replication works in PostgreSQL
PostgreSQL logical replication operates on a publish-subscribe model. The source database creates a publication that defines what data to share, and the target database creates a subscription that connects to that publication and receives changes.
The following steps describe how this process works:
- The source database writes all changes to the PostgreSQL Write-Ahead Log (WAL) at the logical level, capturing row-level detail.
- A publication acts as a filter on top of the WAL. It defines which tables, which columns within those tables, and which rows (using
WHEREclauses) should be broadcast. When a publication is created, PostgreSQL doesn’t create a separate buffer or copy of the data. Instead, the source’s WAL sender process reads the WAL, applies the publication’s column and row filters in-memory, and only then sends the filtered result to the subscriber. - The target database’s subscription connects to the source, reads the filtered WAL stream, and applies the changes locally.
The key insight is that filtering happens at the source before data ever leaves the instance. The source’s WAL sender process decodes the WAL, applies the publication filters, and sends only the matching data to the subscriber. Sensitive columns or out-of-scope rows don’t travel over the network and never land on the target.
Two filtering modes
- Column-level filtering – you can specify exactly which columns to include in a publication. For example, you might replicate employee_id, name, and department while excluding salary and ssn. You define this with a column list in the
CREATE PUBLICATIONstatement: - Row-level filtering – you can specify a WHERE clause that determines which rows to replicate. For example, you might replicate only orders from a specific region or only active products:
You can also combine both in a single publication for fine-grained control. The PostgreSQL docs on publication column lists cover the full syntax.
Enabling logical replication across environments
The SQL syntax for publications and subscriptions is the same across Amazon RDS for PostgreSQL, Amazon Aurora, and self-managed PostgreSQL. The only difference is how you enable logical replication. For Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition, set rds.logical_replication = 1 in your custom parameter group (instance-level for RDS, cluster-level for Aurora) and reboot. For self-managed PostgreSQL, set wal_level = logical in postgresql.conf and restart the server. The step-by-step deployment section later in this post covers each environment in detail.
Use cases
Multi-tenant SaaS data isolation
Row-level filtering enables clean per-tenant data isolation on a shared database. You create a publication per tenant with a WHERE (tenant_id = 'customer_xyz') clause, and each customer’s replica receives only their rows. Adding a new tenant means creating one new publication and one new subscription a few SQL statements, not an architecture change.
Keep scalability in mind. Each subscription uses a replication slot on the source, which spawns a dedicated WAL sender process (roughly 4 MB of memory each). PostgreSQL defaults to 10 replication slots (max_replication_slots) and 10 WAL sender processes (max_wal_senders), both configurable at server start. Each slot also retains WAL segments until its subscriber consumes them, so WAL disk usage can grow if a subscriber falls behind. For very large tenant counts (hundreds+), consider grouping smaller tenants into shared publications to keep resource usage manageable.
Regional data distribution for ecommerce
Row-level filtering lets regional fulfillment centers receive only the order data for their geography. For example, a North America center receives only US, Canadian, and Mexican orders, while Europe and Asia-Pacific each receive their respective regions. Each region typically holds 15-30% of total orders, reducing replication volume and storage costs by 60-80%.
Financial services PCI DSS compliance
Payment processors need to analyze transaction patterns (amounts, timestamps, merchant IDs) without storing full Primary Account Numbers (PANs) or CVV codes. Column-level filtering replicates transaction metadata while excluding restricted cardholder data columns entirely, reducing PCI DSS scope on the target system.
Development and testing environments
Column-level filtering provides a live, continuously updated feed of production data minus sensitive columns. Developers get real data volumes, relationships, and distributions without seeing customer PII and because it’s live replication rather than a nightly batch job, the data is always current.
Retail active inventory replication
Retail chains with large catalogs (500,000+ SKUs) can use row-level filtering with a WHERE (status = 'active' AND (qty_on_hand > 0 OR qty_on_order > 0)) clause so each store receives only the 50,000-80,000 active SKUs it needs – an 80-90% reduction in replication data.
Healthcare data compliance (HIPAA)
Healthcare organizations handle highly sensitive data: Social Security numbers, diagnosis codes, medication histories, and insurance details. With column-level filtering, you can replicate operational data (appointment dates, doctor IDs, billing amounts, procedure codes) while keeping PHI columns (patient names, SSNs, diagnosis codes, insurance IDs) on the source. The analytics database doesn’t see a single SSN, and because the sensitive data doesn’t leave the source, audit trail requirements are streamlined.
The following walkthrough implements this healthcare use case.
Prerequisites
Before you start, make sure that you have the following in place:
- PostgreSQL 15 or later both source and target. Column-level filtering and row-level
WHEREclause filtering in publications were introduced in PostgreSQL 15.0 and are available in all subsequent minor versions. The source and target don’t need to run the same major version, but both must be PostgreSQL 15 or later to support these features. Check the RDS PostgreSQL15 Release notes for the full feature list. - Network connectivity between source and target instances. The target must be able to reach the source on port 5432 (or your configured PostgreSQL port). For RDS/Aurora, ensure that your security groups allow inbound traffic from the target’s IP or security group. See RDS security group configuration.
- A PostgreSQL user with replication privileges on the source. The user needs the
rds_replicationrole (for RDS/Aurora) or theREPLICATIONattribute (for self-managed). This user also needsSELECTprivileges on the tables being published. - Logical replication enabled on the source instance:
- RDS/Aurora: Set
rds.logical_replication = 1in your custom parameter group (for RDS) and reboot. - Self-managed: Set
wal_level = logicalinpostgresql.confand restart.
- RDS/Aurora: Set
- Sufficient WAL retention. Logical replication slots retain WAL segments until all the subscribers consumes them. Monitor disk usage, especially if the subscriber goes offline for extended periods. For RDS, the
max_slot_wal_keep_sizeparameter can limit WAL retention. See managing replication slots on RDS. - A PostgreSQL client such as psql to connect to both source and target instances. You can install and run psql from an Amazon EC2 instance in the same VPC as your databases. Alternatively, you can launch AWS CloudShell in VPC mode within the same VPC and install psql there.
Step-by-step deployment: Healthcare compliance example
Let’s implement the HIPAA compliance use case. We will set up a source database with patient appointment data and replicate only the non-PHI columns to an analytics target.
Step 1: Enable logical replication on the source
For Amazon RDS for PostgreSQL:
- Modify your custom DB parameter group to enable logical replication:
- Reboot the DB instance to apply the change:
- Connect to the source instance and verify that logical replication is active:
You should see logical in the output.
For Amazon Aurora PostgreSQL-Compatible Edition:
- Modify your custom DB cluster parameter group:
- Reboot the writer instance of the cluster:
- Connect to the writer instance and verify:
For self-managed PostgreSQL:
- Edit postgresql.conf and set:
- Restart the PostgreSQL service:
- Connect and verify:
Step 2: Create the source tables
Connect to your source database and create the patient appointments table. This table contains both operational data and PHI:
A note on data types: the target table’s column data types must match or be compatible with the source. Logical replication transmits data in text format, so the target applies standard PostgreSQL type input functions to parse incoming values. Mismatched types (for example, VARCHAR on the source and INTEGER on the target) will cause replication errors. If you’re replicating between different PostgreSQL major versions, verify that the data type behavior hasn’t changed. The PostgreSQL logical replication documentation on row filtering covers column list and type requirements in detail.
Step 3: Insert sample data
Populate the table with test records spanning multiple departments and doctors:
Step 4: Create the publication with column filtering
Now create a publication that includes only the non-PHI columns. The patient_name, ssn, date_of_birth, and diagnosis_code columns are excluded by listing only the columns that we want to replicate:
Verify that the publication was created correctly:
The pg_publication and pg_publication_tables system catalog views are your go-to for inspecting what’s being published.
At this point, the publication is metadata only. It defines what to replicate but doesn’t start moving data. No replication slot exists yet, and no WAL decoding is happening. Data will only begin flowing after a subscriber creates a subscription to this publication in Step 6.
Step 5: Create the target table
Connect to your PostgreSQL target (analytics) database and create a matching table. The table must include at least the columns defined in the publication:
Notice that this table only has the eight non-PHI columns. You can also create the full 12-column table. The unreplicated columns would remain NULL, but defining only the columns that you need makes the intent clear and prevents accidental data population later.
Step 6: Create the subscription on the target
Replace <source-endpoint>, <dbname>, <replication_user>, and <password> with your actual connection details.
The subscription will immediately begin an initial table synchronization, copying all existing rows (filtered by the publication’s column list) to the target. After the initial sync, it switches to streaming mode and applies changes in near real-time.
You can monitor subscription status using the pg_stat_subscription view:
Step 7: Verify the replication
On the target, query the replicated data:
Expected output:
All eight rows are present with the eight operational columns. No patient names, SSNs, dates of birth, or diagnosis codes appear on the target. The analytics team can now run queries like:
Step 8: Test Ongoing Replication
Back on the source, insert a new record:
Within seconds, query the target:
You will see the new row with only the eight operational columns. The PHI fields – patient_name, ssn, date_of_birth, and diagnosis_code don’t make it to the target. Ongoing changes (INSERTs, UPDATEs, DELETEs) on the source are continuously replicated with the same column filtering applied.
A note on limitations
Logical replication with fine-grained filtering replication is powerful, but it’s worth knowing a few constraints before you go all-in:
- Replica identity: For
UPDATEandDELETEreplication to work correctly, the published table needs a replica identity. The default is the primary key that works for most cases. If you’re filtering columns and your primary key column isn’t included in the publication,UPDATEandDELETEoperations will fail with an error because PostgreSQL can’t identify which row to modify on the target. - DDL changes aren’t replicated: Schema changes (such as
ALTER TABLE) don’t flow through logical replication automatically. You must apply them manually on the target before they take effect on the source, or the replication will break. See the logical replication restrictions docs for the full list. - Sequences aren’t replicated: PostgreSQL sequences are auto-incrementing counters commonly used to generate unique IDs for primary key columns (for example, with
SERIALorIDENTITYcolumns). Logical replication copies the generated ID values as data but doesn’t synchronize the sequence state itself. This means the target database’s sequence counters might be out of sync with the actual data. If you promote the target database to become the new primary (for example, during a disaster recovery scenario), you must manually advance the sequences on the target to values higher than the maximum replicated IDs, or new inserts will fail with duplicate key errors. - Performance overhead: The initial table sync can take significant time for large tables and puts read load on the source. Consider scheduling it during off-peak hours and monitor sync progress using
pg_subscription_rel. Beyond the initial sync, ongoing replication also consumes resources on the source: each active subscription runs a dedicated WAL sender process that continuously decodes WAL entries and applies publication filters. This adds CPU overhead proportional to the write volume on published tables. For high-throughput workloads, monitor CPU utilization on the source and consider scaling up if WAL sender processes consume a significant share of available CPU.
Cleanup
When you’re done testing, remove the replication resources to avoid unnecessary costs and WAL accumulation.
On the target, drop the subscription first:
On the source, drop the publication:
Drop the test tables on both source and target:
If you created dedicated RDS or Aurora instances for this walkthrough, delete them through the AWS Management Console or CLI to stop incurring charges. If you modified an existing parameter group, consider reverting rds.logical_replication to 0 if you no longer need logical replication, and reboot the instance.
Conclusion
With PostgreSQL’s native logical replication, you can control exactly what data flows between databases down to specific columns and rows without any external tools or middleware. Column-level filtering keeps sensitive data like PHI and payment card numbers off target systems entirely, while row-level filtering enables clean data isolation for multi-tenant architectures, regional distribution, and time-based partitioning.
The key takeaway: logical replication with fine-grained control shifts data filtering from the application layer into the database engine itself. This means fewer moving parts, less code to maintain, and stronger chances that restricted data doesn’t leave the source. Whether you’re working toward HIPAA compliance, PCI DSS certification, or trying to give developers production-like data without the security headaches, publications with column lists and WHERE clauses are a powerful tool already built into PostgreSQL 15 and later.