PostgreSQL logical replication: How to replicate only the data that you need

Many organizations using PostgreSQL need to replicate data between distinct PostgreSQL logical databases, and want to control exactly what gets copied. A healthcare platform might need to send appointment data to an analytics database without exposing Protected Health Information (PHI). A multi-tenant software as a service (SaaS) application might need separate read replicas containing only a single customer’s data.

Starting with PostgreSQL 15, native logical replication supports both column-level and row-level filtering directly within publications. You can control exactly which columns and rows flow from a PostgreSQL source database to a PostgreSQL target database, without middleware or custom code. Note that this is a PostgreSQL-to-PostgreSQL replication solution, so both the source and target must be PostgreSQL instances.

In this post, we show how logical replication with fine-grained filtering works in PostgreSQL, when to use it, and how to implement it using a realistic healthcare compliance scenario. Whether you’re running Amazon Relational Database Service (Amazon RDS) for PostgreSQL, Amazon Aurora PostgreSQL-Compatible Edition, or a self-managed PostgreSQL database on an Amazon Elastic Compute Cloud (Amazon EC2) instance, the approach is the same.

How logical replication works in PostgreSQL

PostgreSQL logical replication operates on a publish-subscribe model. The source database creates a publication that defines what data to share, and the target database creates a subscription that connects to that publication and receives changes.

The following steps describe how this process works:

The source database writes all changes to the PostgreSQL Write-Ahead Log (WAL) at the logical level, capturing row-level detail.
A publication acts as a filter on top of the WAL. It defines which tables, which columns within those tables, and which rows (using WHERE clauses) should be broadcast. When a publication is created, PostgreSQL doesn’t create a separate buffer or copy of the data. Instead, the source’s WAL sender process reads the WAL, applies the publication’s column and row filters in-memory, and only then sends the filtered result to the subscriber.
The target database’s subscription connects to the source, reads the filtered WAL stream, and applies the changes locally.

The key insight is that filtering happens at the source before data ever leaves the instance. The source’s WAL sender process decodes the WAL, applies the publication filters, and sends only the matching data to the subscriber. Sensitive columns or out-of-scope rows don’t travel over the network and never land on the target.

Two filtering modes

Column-level filtering – you can specify exactly which columns to include in a publication. For example, you might replicate employee_id, name, and department while excluding salary and ssn. You define this with a column list in the CREATE PUBLICATION statement:
```
CREATE PUBLICATION pub_emp_safe
FOR TABLE employees (employee_id, name, department);
```
Row-level filtering – you can specify a WHERE clause that determines which rows to replicate. For example, you might replicate only orders from a specific region or only active products:
```
CREATE PUBLICATION pub_orders_na
FOR TABLE orders WHERE (region IN ('US', 'CA', 'MX'));
```

You can also combine both in a single publication for fine-grained control. The PostgreSQL docs on publication column lists cover the full syntax.

Enabling logical replication across environments

The SQL syntax for publications and subscriptions is the same across Amazon RDS for PostgreSQL, Amazon Aurora, and self-managed PostgreSQL. The only difference is how you enable logical replication. For Amazon RDS for PostgreSQL and Amazon Aurora PostgreSQL-Compatible Edition, set rds.logical_replication = 1 in your custom parameter group (instance-level for RDS, cluster-level for Aurora) and reboot. For self-managed PostgreSQL, set wal_level = logical in postgresql.conf and restart the server. The step-by-step deployment section later in this post covers each environment in detail.

Use cases

Multi-tenant SaaS data isolation

Row-level filtering enables clean per-tenant data isolation on a shared database. You create a publication per tenant with a WHERE (tenant_id = 'customer_xyz') clause, and each customer’s replica receives only their rows. Adding a new tenant means creating one new publication and one new subscription a few SQL statements, not an architecture change.

Keep scalability in mind. Each subscription uses a replication slot on the source, which spawns a dedicated WAL sender process (roughly 4 MB of memory each). PostgreSQL defaults to 10 replication slots (max_replication_slots) and 10 WAL sender processes (max_wal_senders), both configurable at server start. Each slot also retains WAL segments until its subscriber consumes them, so WAL disk usage can grow if a subscriber falls behind. For very large tenant counts (hundreds+), consider grouping smaller tenants into shared publications to keep resource usage manageable.

Regional data distribution for ecommerce

Row-level filtering lets regional fulfillment centers receive only the order data for their geography. For example, a North America center receives only US, Canadian, and Mexican orders, while Europe and Asia-Pacific each receive their respective regions. Each region typically holds 15-30% of total orders, reducing replication volume and storage costs by 60-80%.

Financial services PCI DSS compliance

Payment processors need to analyze transaction patterns (amounts, timestamps, merchant IDs) without storing full Primary Account Numbers (PANs) or CVV codes. Column-level filtering replicates transaction metadata while excluding restricted cardholder data columns entirely, reducing PCI DSS scope on the target system.

Development and testing environments

Column-level filtering provides a live, continuously updated feed of production data minus sensitive columns. Developers get real data volumes, relationships, and distributions without seeing customer PII and because it’s live replication rather than a nightly batch job, the data is always current.

Retail active inventory replication

Retail chains with large catalogs (500,000+ SKUs) can use row-level filtering with a WHERE (status = 'active' AND (qty_on_hand > 0 OR qty_on_order > 0)) clause so each store receives only the 50,000-80,000 active SKUs it needs – an 80-90% reduction in replication data.

Healthcare data compliance (HIPAA)

Healthcare organizations handle highly sensitive data: Social Security numbers, diagnosis codes, medication histories, and insurance details. With column-level filtering, you can replicate operational data (appointment dates, doctor IDs, billing amounts, procedure codes) while keeping PHI columns (patient names, SSNs, diagnosis codes, insurance IDs) on the source. The analytics database doesn’t see a single SSN, and because the sensitive data doesn’t leave the source, audit trail requirements are streamlined.

The following walkthrough implements this healthcare use case.

Prerequisites

Before you start, make sure that you have the following in place:

PostgreSQL 15 or later both source and target. Column-level filtering and row-level WHERE clause filtering in publications were introduced in PostgreSQL 15.0 and are available in all subsequent minor versions. The source and target don’t need to run the same major version, but both must be PostgreSQL 15 or later to support these features. Check the RDS PostgreSQL15 Release notes for the full feature list.
Network connectivity between source and target instances. The target must be able to reach the source on port 5432 (or your configured PostgreSQL port). For RDS/Aurora, ensure that your security groups allow inbound traffic from the target’s IP or security group. See RDS security group configuration.
A PostgreSQL user with replication privileges on the source. The user needs the rds_replication role (for RDS/Aurora) or the REPLICATION attribute (for self-managed). This user also needs SELECT privileges on the tables being published.
Logical replication enabled on the source instance:
- RDS/Aurora: Set rds.logical_replication = 1 in your custom parameter group (for RDS) and reboot.
- Self-managed: Set wal_level = logical in postgresql.conf and restart.
Sufficient WAL retention. Logical replication slots retain WAL segments until all the subscribers consumes them. Monitor disk usage, especially if the subscriber goes offline for extended periods. For RDS, the max_slot_wal_keep_size parameter can limit WAL retention. See managing replication slots on RDS.
A PostgreSQL client such as psql to connect to both source and target instances. You can install and run psql from an Amazon EC2 instance in the same VPC as your databases. Alternatively, you can launch AWS CloudShell in VPC mode within the same VPC and install psql there.

Step-by-step deployment: Healthcare compliance example

Let’s implement the HIPAA compliance use case. We will set up a source database with patient appointment data and replicate only the non-PHI columns to an analytics target.

Step 1: Enable logical replication on the source

For Amazon RDS for PostgreSQL:

Modify your custom DB parameter group to enable logical replication:

aws rds modify-db-parameter-group \
--db-parameter-group-name <your-parameter-group-name> \
--parameters "ParameterName=rds.logical_replication,ParameterValue=1,ApplyMethod=pending-reboot"

Reboot the DB instance to apply the change:

aws rds reboot-db-instance \
--db-instance-identifier <your-db-instance-id>

Connect to the source instance and verify that logical replication is active:
```
SHOW wal_level;
```
You should see logical in the output.

For Amazon Aurora PostgreSQL-Compatible Edition:

Modify your custom DB cluster parameter group:

aws rds modify-db-cluster-parameter-group \
--db-cluster-parameter-group-name <your-cluster-parameter-group-name> \
--parameters "ParameterName=rds.logical_replication,ParameterValue=1,ApplyMethod=pending-reboot"

Reboot the writer instance of the cluster:

aws rds reboot-db-instance \
--db-instance-identifier <your-writer-instance-id>

Connect to the writer instance and verify:
```
SHOW wal_level;
```

For self-managed PostgreSQL:

Edit postgresql.conf and set:
```
wal_level = logical
```
Restart the PostgreSQL service:
```
sudo systemctl restart postgresql
```
Connect and verify:
```
SHOW wal_level;
```

Step 2: Create the source tables

Connect to your source database and create the patient appointments table. This table contains both operational data and PHI:

CREATE TABLE patient_appointments (
    appointment_id   SERIAL PRIMARY KEY,
    patient_id       INTEGER NOT NULL,
    patient_name     VARCHAR(100),          -- PHI: exclude from replication
    ssn              VARCHAR(11),           -- PHI: exclude from replication
    date_of_birth    DATE,                  -- PHI: exclude from replication
    diagnosis_code   VARCHAR(10),           -- PHI: exclude from replication
    appointment_date DATE NOT NULL,
    doctor_id        INTEGER NOT NULL,
    department       VARCHAR(50),
    billing_amount   DECIMAL(10,2),
    procedure_code   VARCHAR(10),
    visit_type       VARCHAR(20)
);

A note on data types: the target table’s column data types must match or be compatible with the source. Logical replication transmits data in text format, so the target applies standard PostgreSQL type input functions to parse incoming values. Mismatched types (for example, VARCHAR on the source and INTEGER on the target) will cause replication errors. If you’re replicating between different PostgreSQL major versions, verify that the data type behavior hasn’t changed. The PostgreSQL logical replication documentation on row filtering covers column list and type requirements in detail.

Step 3: Insert sample data

Populate the table with test records spanning multiple departments and doctors:

INSERT INTO patient_appointments
    (patient_id, patient_name, ssn, date_of_birth, diagnosis_code,
     appointment_date, doctor_id, department, billing_amount, procedure_code, visit_type)
VALUES
    (1001, 'John Smith',    '123-45-6789', '1985-03-15', 'J06.9',  '2025-01-10', 201, 'General Medicine',   150.00, 'CPT99213', 'Follow-up'),
    (1002, 'Jane Doe',      '987-65-4321', '1990-07-22', 'M54.5',  '2025-01-10', 202, 'Orthopedics',        275.00, 'CPT99214', 'New Patient'),
    (1003, 'Robert Brown',  '456-78-9012', '1978-11-03', 'I10',    '2025-01-11', 201, 'General Medicine',   125.00, 'CPT99212', 'Follow-up'),
    (1004, 'Emily Wilson',  '321-54-9876', '1995-01-28', 'K21.0',  '2025-01-11', 203, 'Gastroenterology',   350.00, 'CPT99215', 'New Patient'),
    (1005, 'Michael Lee',   '654-32-1098', '1982-09-14', 'J45.20', '2025-01-12', 204, 'Pulmonology',        200.00, 'CPT99213', 'Follow-up'),
    (1006, 'Sarah Johnson', '789-01-2345', '1988-05-30', 'E11.9',  '2025-01-12', 201, 'General Medicine',   175.00, 'CPT99214', 'Follow-up'),
    (1007, 'David Garcia',  '234-56-7890', '1972-12-08', 'M79.3',  '2025-01-13', 202, 'Orthopedics',        300.00, 'CPT99214', 'Follow-up'),
    (1008, 'Lisa Martinez', '567-89-0123', '1999-04-17', 'J06.9',  '2025-01-13', 204, 'Pulmonology',        150.00, 'CPT99213', 'New Patient');

Step 4: Create the publication with column filtering

Now create a publication that includes only the non-PHI columns. The patient_name, ssn, date_of_birth, and diagnosis_code columns are excluded by listing only the columns that we want to replicate:

CREATE PUBLICATION pub_appointments_analytics
FOR TABLE patient_appointments (
    appointment_id, patient_id, appointment_date, doctor_id,
    department, billing_amount, procedure_code, visit_type
);

Verify that the publication was created correctly:

SELECT * FROM pg_publication;
SELECT * FROM pg_publication_tables;

The pg_publication and pg_publication_tables system catalog views are your go-to for inspecting what’s being published.

At this point, the publication is metadata only. It defines what to replicate but doesn’t start moving data. No replication slot exists yet, and no WAL decoding is happening. Data will only begin flowing after a subscriber creates a subscription to this publication in Step 6.

Step 5: Create the target table

Connect to your PostgreSQL target (analytics) database and create a matching table. The table must include at least the columns defined in the publication:

CREATE TABLE patient_appointments (
    appointment_id   SERIAL PRIMARY KEY,
    patient_id       INTEGER NOT NULL,
    appointment_date DATE NOT NULL,
    doctor_id        INTEGER NOT NULL,
    department       VARCHAR(50),
    billing_amount   DECIMAL(10,2),
    procedure_code   VARCHAR(10),
    visit_type       VARCHAR(20)
);

Notice that this table only has the eight non-PHI columns. You can also create the full 12-column table. The unreplicated columns would remain NULL, but defining only the columns that you need makes the intent clear and prevents accidental data population later.

Step 6: Create the subscription on the target

CREATE SUBSCRIPTION sub_appointments_analytics
CONNECTION 'host=<source-endpoint> port=5432 dbname=<dbname> user=<replication_user> password=<password>'
PUBLICATION pub_appointments_analytics;

Replace <source-endpoint>, <dbname>, <replication_user>, and <password> with your actual connection details.

The subscription will immediately begin an initial table synchronization, copying all existing rows (filtered by the publication’s column list) to the target. After the initial sync, it switches to streaming mode and applies changes in near real-time.

You can monitor subscription status using the pg_stat_subscription view:

SELECT * FROM pg_stat_subscription;

Step 7: Verify the replication

On the target, query the replicated data:

SELECT * FROM patient_appointments ORDER BY appointment_id;

Expected output:

appointment_id | patient_id | appointment_date | doctor_id |    department      | billing_amount | procedure_code | visit_type
----------------+------------+------------------+-----------+--------------------+----------------+----------------+------------
              1 |       1001 | 2025-01-10       |       201 | General Medicine   |         150.00 | CPT99213       | Follow-up
              2 |       1002 | 2025-01-10       |       202 | Orthopedics        |         275.00 | CPT99214       | New Patient
              3 |       1003 | 2025-01-11       |       201 | General Medicine   |         125.00 | CPT99212       | Follow-up
              4 |       1004 | 2025-01-11       |       203 | Gastroenterology   |         350.00 | CPT99215       | New Patient
              5 |       1005 | 2025-01-12       |       204 | Pulmonology        |         200.00 | CPT99213       | Follow-up
              6 |       1006 | 2025-01-12       |       201 | General Medicine   |         175.00 | CPT99214       | Follow-up
              7 |       1007 | 2025-01-13       |       202 | Orthopedics        |         300.00 | CPT99214       | Follow-up
              8 |       1008 | 2025-01-13       |       204 | Pulmonology        |         150.00 | CPT99213       | New Patient

All eight rows are present with the eight operational columns. No patient names, SSNs, dates of birth, or diagnosis codes appear on the target. The analytics team can now run queries like:

-- Appointment volume by department
SELECT department, COUNT(*) AS total_appointments, SUM(billing_amount) AS total_billing
FROM patient_appointments
GROUP BY department
ORDER BY total_appointments DESC;

-- Doctor workload analysis
SELECT doctor_id, COUNT(*) AS appointments, AVG(billing_amount) AS avg_billing
FROM patient_appointments
GROUP BY doctor_id
ORDER BY appointments DESC;

Step 8: Test Ongoing Replication

Back on the source, insert a new record:

INSERT INTO patient_appointments
    (patient_id, patient_name, ssn, date_of_birth, diagnosis_code,
     appointment_date, doctor_id, department, billing_amount, procedure_code, visit_type)
VALUES
    (1009, 'Anna Taylor', '890-12-3456', '1991-06-25', 'N39.0',
     '2025-01-14', 203, 'Gastroenterology', 225.00, 'CPT99213', 'Follow-up');

Within seconds, query the target:

SELECT * FROM patient_appointments WHERE patient_id = 1009;

You will see the new row with only the eight operational columns. The PHI fields – patient_name, ssn, date_of_birth, and diagnosis_code don’t make it to the target. Ongoing changes (INSERTs, UPDATEs, DELETEs) on the source are continuously replicated with the same column filtering applied.

A note on limitations

Logical replication with fine-grained filtering replication is powerful, but it’s worth knowing a few constraints before you go all-in:

Replica identity: For UPDATE and DELETE replication to work correctly, the published table needs a replica identity. The default is the primary key that works for most cases. If you’re filtering columns and your primary key column isn’t included in the publication, UPDATE and DELETE operations will fail with an error because PostgreSQL can’t identify which row to modify on the target.
DDL changes aren’t replicated: Schema changes (such as ALTER TABLE) don’t flow through logical replication automatically. You must apply them manually on the target before they take effect on the source, or the replication will break. See the logical replication restrictions docs for the full list.
Sequences aren’t replicated: PostgreSQL sequences are auto-incrementing counters commonly used to generate unique IDs for primary key columns (for example, with SERIAL or IDENTITY columns). Logical replication copies the generated ID values as data but doesn’t synchronize the sequence state itself. This means the target database’s sequence counters might be out of sync with the actual data. If you promote the target database to become the new primary (for example, during a disaster recovery scenario), you must manually advance the sequences on the target to values higher than the maximum replicated IDs, or new inserts will fail with duplicate key errors.
Performance overhead: The initial table sync can take significant time for large tables and puts read load on the source. Consider scheduling it during off-peak hours and monitor sync progress using pg_subscription_rel. Beyond the initial sync, ongoing replication also consumes resources on the source: each active subscription runs a dedicated WAL sender process that continuously decodes WAL entries and applies publication filters. This adds CPU overhead proportional to the write volume on published tables. For high-throughput workloads, monitor CPU utilization on the source and consider scaling up if WAL sender processes consume a significant share of available CPU.

Cleanup

When you’re done testing, remove the replication resources to avoid unnecessary costs and WAL accumulation.

On the target, drop the subscription first:

DROP SUBSCRIPTION sub_appointments_analytics;

On the source, drop the publication:

DROP PUBLICATION pub_appointments_analytics;

Drop the test tables on both source and target:

DROP TABLE IF EXISTS patient_appointments;

If you created dedicated RDS or Aurora instances for this walkthrough, delete them through the AWS Management Console or CLI to stop incurring charges. If you modified an existing parameter group, consider reverting rds.logical_replication to 0 if you no longer need logical replication, and reboot the instance.

Conclusion

With PostgreSQL’s native logical replication, you can control exactly what data flows between databases down to specific columns and rows without any external tools or middleware. Column-level filtering keeps sensitive data like PHI and payment card numbers off target systems entirely, while row-level filtering enables clean data isolation for multi-tenant architectures, regional distribution, and time-based partitioning.

The key takeaway: logical replication with fine-grained control shifts data filtering from the application layer into the database engine itself. This means fewer moving parts, less code to maintain, and stronger chances that restricted data doesn’t leave the source. Whether you’re working toward HIPAA compliance, PCI DSS certification, or trying to give developers production-like data without the security headaches, publications with column lists and WHERE clauses are a powerful tool already built into PostgreSQL 15 and later.

AWS Database Blog