AWS Database Blog

Introducing multi-source replication on Amazon RDS for MySQL

When Amazon Relational Database Service (Amazon RDS) launched in 2009, MySQL was the first engine supported on the platform. Since that time, customer use cases have grown and evolved. Amazon RDS for MySQL has supported replication to multiple targets for some time—now it also supports multi-source replication.

In this post, we discuss multi-source replication on Amazon RDS for MySQL, its use cases, configuration best practices, and a walkthrough of an example use case.

Introduction to multi-source replication

MySQL multi-source replication is a replication topology that allows a MySQL DB instance to receive and apply binary log events from multiple source DB instances. In this topology, the replica utilizes a distinct channel per source DB instance to receive the binary log events and apply them.

Each channel uses its own I/O thread to receive binary log events from the respective source and then records these events in the channel’s relay logs. The channel then uses its own SQL threads to apply the changes in the channel’s relay logs to the target database. The channel name is appended to the relay log’s file names to uniquely separate them from other channel logs. In addition, we can apply replication filters at the global and channel level (version 8.0 or greater) to replicate a subset of changes based on the application requirements.

The following image illustrates the multi-source replication topology.

You can configure multi-source replication between DB instances in same AWS Region as well as different Regions. The DB instances can be in the same AWS account or in different AWS accounts.

Use cases for multi-source replication

There are multiple use cases for multi-source replication. We will cover some of the more common use cases in the following section.

Merging shards

The primary use case for multi-source replication is data merging. There are multiple scenarios where data merging may be called for. For example, exceptionally large or high-traffic databases might require their data to be sharded across multiple DB instances. Occasionally, if the volume of data or traffic shrinks, it might be desirable to reduce the number of shards in this configuration. In this scenario, you can use multi-source replication to merge multiple shards’ data into one.

Consolidating SaaS tenants

With software as a service (SaaS) applications, tenant management is a primary concern. Larger tenants will often have dedicated DB instances, whereas smaller tenants will share resources on a single DB instance. As business changes over time and these tenants’ data grows and shrinks, the ability to move back and forth between these two models becomes increasingly important. In the case of workloads that previously ran on their own dedicated DB instances, you can use multi-source replication to move those workloads to a consolidated DB instance.

Running data analytics

Businesses often require reports comprised of data from various sources. With multi-source replication, you can create a reporting database that receives live updates from multiple sources, thereby reducing the complexity of running analytics on data stored across multiple DB instances.

Selective backups

Certain application policies require long-term data retention of specific data and maintain the data for customer reporting (based on demand). Retaining a complete snapshot is not always cost-effective, especially when data is spread across multiple DB instances. In cases where even the overall DB instances sizes are huge, the actual data to be retained is limited to certain tables. In these cases, you can use multi-source replication and apply replication filters to replicate the required tables from multiple sources to a single database and maintain the retention on the single database with the required dataset for long term based on your requirements.

Best practices and considerations

When it comes to multi-source replication, there are multiple best practices and considerations regarding configuration and operations. The next section will dive deep into some of those best practices.

Replica engine version

The engine version of your multi-source replica DB instance has to be same or higher than the highest engine version across all source DB instances in the replication topology. For example, if source-1 is running engine version 8.0.34 and source-2 is running on engine version 8.0.35, then it’s recommended to use engine version 8.0.35 or higher for the multi-source replica DB instance. Similarly, source DB instances running 5.7 major version can replicate to an RDS for MySQL DB instance running 8.0 major version.

Replica database instance configuration

When an RDS for MySQL DB instance is configured as a multi-source replica, resource provisioning (CPU, memory, network) is important to ensure the replication is working seamlessly and able to handle the workload from multiple source DB instances.

It’s important to consider the regular workload and potential variations in workload patterns on individual source DB instances. Allocating sufficient CPU and memory to the replica DB instance is crucial to apply the changes from multiple source DB instances and maintain the acceptable replication lag across all replication channels. The amount of data being received by a multi-source replica DB instance varies based on the workload and configurations (binlog_format set to ROW generates more data) of the source DB instances. Therefore, you should ensure that the multi-source replica DB instance class provides the required network throughput, because the replica DB instance will receive data (binary logs) from multiple source DB instances.

It’s recommended to start with the highest DB instance class in use among all of the configured source DB instances as your initial configuration. Furthermore, you should constantly monitor the replication lag, CPU utilization, memory usage, and network receive and transmit throughput to identify any resource contention and modify the DB instance class accordingly. All of these metrics can be monitored using Amazon CloudWatch instance-level metrics for Amazon RDS.

Replica storage configuration

The DB instance configured as a multi-source replica must be allocated with ample storage to accommodate data from multiple source DB instances and handle certain space-intensive operations. You must calculate the combined size of the data from all source DB instances and consider the data growth projections in order to ensure sufficient storage capacity allocated to the DB instance. For example, if you have 2 source DB instances, each with 500 GB of data, you will need more than 1 TB of storage initially. If you anticipate these DB instances to grow to 750 GB each in the next 12 months, you may want to allocate 1.5 TB of storage. Furthermore, additional storage must be added by accounting for temporary storage needs during operations like ALTER TABLE, index creation, and on-disk temporary table usage.

For efficient performance of your multi-source replication DB instance, consider using Provisioned IOPS SSD (io1) storage for replica DB instances. This provides low latency and allows you to change the number of IOPS based on the workload requirements. Initially, consider allocating IOPS equal to the combined IOPS usage on all source DB instances. Closely monitor the replication lag, IOPS utilization, and disk queue depth to identify any resource contention, and modify the allocated IOPS for the DB instance accordingly.

Furthermore, whenever any storage allocation changes are implemented on any of the source DB instances, make sure sufficient storage is allocated to the multi-source replica DB instance as well.

When replication filters are configured, consider the storage space based on the amount of data being replicated to the replica DB instance. The storage allocation factors depend on the amount of initial data being copied and the growth rate of the data. The amount of workload associated with the data needs to be handled appropriately with the respective storage configuration (because IOPS depends on the storage type).

It’s recommended to enable storage auto scaling at least 50% higher than the currently allocated storage value on the replica DB instance to dynamically adjust capacity as needed. This can help avoid storage full issues and prevent replication failures due to insufficient space.

Enable read-only on the replica

It is strongly recommended to enable the read-only parameter to prevent any write operations on the replica. This helps prevent accidental writes on the replica that may lead to replication failures or data inconsistency on the replica DB instance.

Configure replication filters

In circumstances where only a subset of data must be replicated, replication filters provide the necessary mechanism. With multi-source replication, you can configure replication filters either at the global level or the channel level.

Replication filters must be configured appropriately to avoid conflicts in the replication while replicating from multiple sources. In the case where different sources use the same schema naming convention, consider using the replicate_rewrite_db parameter to replicate the data into separate databases accordingly.

Whenever replication filters are modified in a DB parameter group, the replica SQL threads for all affected channels are restarted to apply the changes dynamically. In the case where a global filter is modified, all replication channels are restarted.

Configurations related to global filters are applied first, and then channel-specific filters are applied on the DB instance. Channel-level filters override global filters.

For example, when replicate_ignore_db is configured to a value of db1,`channel_22`:db2, it implies that database db1 will be ignored at the global level and database db2 will be ignored at the channel level (channel_22). In this case, replication will ignore db1 for all channels except channel_22, and only db2 will be ignored for channel_22. When we specify any channel-level replication filters, they override global-level replication filters. The following screenshot illustrates this in more detail.

Enable multi-threaded replication

For workloads involving multiple updates to multiple tables, multi-threaded replication may provide a performance improvement. To enable multi-threaded replication, modify the replica_parallel_workers (or slave_parallel_workers) parameter to a value greater than 1. The multi-threaded replication is at global level and configuration applies to all replication channels. If we configure it to value N, then every channel will be allocated with N number of threads in addition to a coordinator thread to manage them for every channel on the replica DB instance.

Because the replica_parallel_workers (or slave_parallel_workers) parameter is applicable to every channel, it’s important to understand that there’s no single value for this parameter that will be helpful for all replication channels, because the optimal number of workers depends on your specific workload, resources, number of channels configured, and binary log configurations on the respective source DB instances. Continuous monitoring and adjustments are key to maintaining optimal performance in multi-source replication.

When you’re configuring the parameter to work for most of the channels, you must consider several factors.

Firstly, it’s recommended to configure the number of active worker threads to less than or equal to two times the number of CPU cores. For example, let’s assume your multi-source replica is using r5.8xlarge, which has 32 vCPUs, and the replica is replicating data from eight different channels. In this case, configuring replica_parallel_workers (or slave_parallel_workers) to a maximum of 4 can help effectively use the CPU resources on the DB instance when every thread is performing the work on the replica. If the same replica is used to run any read operations, reduce the replica threads to a lower value to provide some resource allocation to the read operations. This way, you can utilize CPU resources effectively without resource contention.

Secondly, the DB instance needs sufficient memory to keep most of the working dataset in memory for quick access. Even with parallel replication enabled, replication works with a limited number of threads compared to the source DB instances. If the replica is not used for read-only operations that require additional connection memory for joins, sorts, and so on, then you can increase the size of the innodb_buffer_pool_size parameter from the default to allocate more memory to the buffer pool to retain a large amount of data in memory to improve performance. Furthermore, you should monitor the memory and swap utilization on the DB instance continuously to avoid resource contention. To achieve better performance, always choose the right DB instance class and configurations to achieve very minimal swap utilization for better performance.

Finally, identify the potential work being run by the respective threads across multiple channels and reconfigure replica_parallel_workers (or slave_parallel_workers) accordingly. In the following example, the DB instance is configured as a multi-source replica and is replicating data from three different source DB instances. It’s configured with replica_parallel_workers set to 8. However, when we look into the number of transactions run by individual worker threads, we observe that on the default channel (”), 99.99% of transactions are run by one worker thread, and on channel_99 and channel_22, most of the transactions are run by two worker threads. In this case, configuring the number of parallel workers to 2 would be helpful to most of the channels and considered an ideal value.

select
a.channel_name,
rw1.thread_id as replica_worer_thread_id,
ts1.count_star as number_of_trx_executed,
round((ts1.count_star / sum_count_star) * 100, 2) as percent_of_trx_executed_by_worker
from
(
select
rw.channel_name,
sum(ts.count_star) as sum_count_star
from
performance_schema.events_transactions_summary_by_thread_by_event_name as ts
join performance_schema.replication_applier_status_by_worker as rw on ts.thread_id = rw.thread_id
group by
rw.channel_name
) as a
join performance_schema.replication_applier_status_by_worker as rw1 on a.channel_name = rw1.channel_name
join performance_schema.events_transactions_summary_by_thread_by_event_name as ts1 on rw1.thread_id = ts1.thread_id;

Configure replica_parallel_workers (or slave_parallel_workers) to the default value (4) and change the values based on resource utilization and number of workers threads that are sufficient to run most of the workload on the maximum number of channels accordingly.

Optimize write throughput

The replica DB instance replays writes coming in from multiple sources, so it’s important to optimize write throughput on the replica. To accomplish this, you can use the RDS Optimized Writes feature. You can improve the performance of write transactions on the replica DB instance by enabling RDS Optimized Writes. When your RDS for MySQL database uses RDS Optimized Writes, it can achieve up to two times higher write transaction throughput, which will be critical for minimizing replica lag. Furthermore, it’s recommended to use the latest 8.0 minor versions (8.0.35 and higher), which can deliver up to three times higher write throughput.

Optimize read throughput

One of the most common use cases for multi-source replication is reporting and analytics. Given the read-heavy nature of these workloads, it is important to take advantage of the RDS Optimized Reads feature which can speed up query processing by up to 2x. In order to take advantage of this feature, simply make sure that you are using an instance type with an instance store, such as db.m5d or db.m6gd. Optimized Reads will be enabled automatically.

Monitor replication

To make sure that the replica is in sync with its multiple sources, you can run the SHOW SLAVE STATUS command to review the replication status for all of the channels at the same time. Alternatively, you can run the SHOW SLAVE STATUS FOR CHANNEL ‘channel_name’ to get the replication status for a single channel.

The replication lag for each channel is also logged to Amazon CloudWatch. You can locate the ReplicationChannelLag metrics by using the instance identifier and the replication channel name. Note that ReplicationChannelLag is specific to DBInstanceIdentifier (multi-source replica DB instance) and ReplicationChannelName. You can view the ReplicationChannelLag metrics for multi-source replication channels in CloudWatch only.

It’s a best practice to regularly monitor replication lag alongside other resources to ensure optimal configuration and performance of the replica DB instance. To find additional best practices around Amazon RDS for MySQL, refer to Best practices for configuring parameters for Amazon RDS for MySQL, part 1: Parameters related to performance and Part 2: Parameters related to replication.

Performance considerations

Consider enabling the replica_compressed_protocol (or slave_compressed_protocol) parameter on the replica to enable compression of the source connection for all replication channels. This is helpful if most of the source DB instances use ROW-based binary logging and transmit a large number of events to the replica DB instance. Note that enabling this parameter will increase CPU utilization, which must be considered while configuring the replica DB instance to avoid resource contention.

Review Best practices for configuring parameters for Amazon RDS for MySQL, part 2: Parameters related to replication for additional parameter configurations to improve replica performance.

Set up multi-source replication in Amazon RDS for MySQL

In this section, we walk through the setup for a multi-source replica. Note that the source database may be either on premises or running in AWS as either an Amazon Elastic Compute Cloud (Amazon EC2) instance or RDS for MySQL DB instance.

To configure binary log replication, complete the following steps:

  1. Configure every source RDS for MySQL DB instance (in this case, we consider two source DB instances) as read-only by attaching a custom parameter group to the DB instance and change the parameter read_only to true.
  2. On the Amazon EC2 console, add the IP address of the source RDS for MySQL DB instance to the VPC security group of the multi-source replica (Amazon RDS for MySQL) DB instance.
    To identify the IP address of the RDS for MySQL DB instance, run dig <RDS Endpoint> and add the output to the security group. If you’re running this command from an EC2 instance that exists in same the VPC as the source RDS for MySQL DB instance, then you’ll receive a private IP address; otherwise, you’ll receive a public IP address. Therefore, run this command from the EC2 instance in the same VPC as the target RDS for MySQL replica (multi-source replica) DB instance. Make sure you RDS security group allows access from your source database IP address.
  3. Using the client of your choice, connect to the source DB (RDS for MySQL) instance and create a user to use for replication and grant the required privileges for replication configurations to the same database user:
    CREATE USER 'repl_user'@'mydomain.com' IDENTIFIED BY 'password';
    GRANT REPLICATION CLIENT, REPLICATION SLAVE ON *.* TO 'repl_user'@'mydomain.com';
  4. Run the SHOW MASTER STATUS command on the source RDS for MySQL DB instance to determine the binary log file and position.
    You will receive output similar to the following:

    File Position
    -----------------------------------
    mysql-bin-changelog.000031      107
    -----------------------------------
  5. Copy the data from the source RDS for MySQL DB instance to the multi-source replica DB instance (running on Amazon RDS for MySQL) using mysqldump:
    mysqldump --databases database_name \
    --single-transaction \
    --compress \
    --order-by-primary \
    -u RDS_user_name \
    -pRDS_password \
    --host=<RDS Endpoint> | mysql \
    --host=<RDS End point> \
    --port=3306 \
    -u RDS_user_name \
    -pRDS_password
  6. Make the source RDS for MySQL DB instances writable again by modifying the custom DB parameter group attached to respective instances and update the read_only DB parameter to OFF.
  7. After you complete these steps for every source DB instance, connect to the RDS for MySQL DB instance being configured as a multi-source replica using the primary user credentials.
  8. Configure the replication from the source RDS for MySQL DB instance by using the mysql.rds_set_external_source_for_channel procedure. Use the binary log file name and position that was determined in step 2. The following example configures replication on channel-1 and channel-2 from two different source DB instances:
    CALL mysql.rds_set_external_source_for_channel ('<RDS for MySQL End Point of source-1>', 3306, 'repl_user', 'password', 'mysql-bin-changelog.000031', 107, 0, ‘channel-1');
    
    CALL mysql.rds_set_external_source_for_channel ('<RDS for MySQL End Point of source-2>, 3306, 'repl_user', 'password', 'mysql-bin-changelog.000031', 107, 0, ‘channel-2’);
  9. On the RDS for MySQL DB instance configured as the replica, run the mysql.rds_start_replication_for_channel procedure to start replication:
    CALL mysql.rds_start_replication_for_channel('channel-1');
    CALL mysql.rds_start_replication_for_channel('channel-2');

Example of replicate_rewrite_db in multi-source replication

Let’s consider a scenario where two source DB instances (source-1 and source-2) have the database employees. When you’re configuring multi-source replication from both source DB instances, it’s important to ensure that there are no conflicts to operations on the database employees. Because the same database exists in both sources, without any replication filters, the changes from both sources will be applied on the same schema on the replica DB instance, which leads to data inconsistencies and replication failures.

To avoid such issues, on the multi-source replica DB instance, you must write the employees data from either of the source DB instances or both of the source DB instances to separate databases. The following example shows that you can use the replica_rewrite_db parameter and apply the channel-level replication filter to replicate changes from source-1 to the employees_22 database and source-2 to the employess_99 database on the replica DB instance.

The following AWS Command Line Interface (AWS CLI) command can help configure the replicate_rewrite_db parameter to rewrite changes from the employees database to the employees_22 database on channel_22 and the employees database to the employees_99 database on channel_99 accordingly:

aws rds modify-db-parameter-group 
--db-parameter-group-name <custom_db_parameter_group_name> 
--parameters "ParameterName=replicate-rewrite-db,ParameterValue='\`channel_22\`:employees->employees_22,\`channel_99\`:employees->employees_99',ApplyMethod=immediate"

Now you can query the replication filter configurations from the performance_schema.replication_applier_filters table:

MySQL [(none)]> show databases;
+--------------------+
| Database |
+--------------------+
| employees_22 | -> Changes from Source-1 employees database
| employees_99 | -> Changes from Source-2 employees database
| information_schema |
| mysql |
| performance schema |
| sbtest |
| sbtest22 |
| sbtest99 |
| sys |
+--------------------+
9 rows in set (0.00 sec)

MySQL [(none)]> select channel_name, filter_name, filter_rule, counter from performance_schema.replication_applier_filters where filter_name='replicate_rewrite_db';
+--------------+----------------------+--------------------------+---------+
| channel_name | filter_name | filter_rule | counter |
+--------------+----------------------+--------------------------+---------+
| channel 22 | REPLICATE_REWRITE_DB | (employees,employees_22) | 0 |
| channel_99 | REPLICATE_REWRITE_DB | (employees,employees_99) | 0 |
+--------------+----------------------+--------------------------+---------+
2 rows in set (0.00 sec)

When you now perform any delete operation on SOURCE-1, the changes are replicated to the employes_22 database only. Also, you can see the number of times the respective filter has been used by monitoring the counter column in the performance_schema.replication_applier_filters table.

The following steps can help you validate the replication configurations specified above.

  1. Delete data from source-1:
    mysql> set @tmp_val:=10001;
    Query OK, 0 row affected (0.00 sec)
    
    mysql> delete from employees where emp_no=@tmp_val;
    Query OK, 1 row affected (0.38 sec)
    
    mysql> select count(1) from employees where emp_no=@tmp_val;
    +----------+
    | count(1) |
    +----------+
    | 0 |
    +----------+
    1 row in set (0.16 sec)
  2. Review the same dataset on the replica DB instance. Because the employees database from source-1 is replicated to the employees_22 database on the replica, you need to review the same database. At the same time, when you check the values of performance_schema.replication_applier_filters.counter, you can see that the value increased to 1 for channel_22:
    MySQL [(none)]> use employees_22
    Database changed
    
    MySQL [employees_22]> set @tmp_val:=10001;
    Query OK, 0 row affected (0.00 sec)
    
    MySQL [employees_22]> select count(1) from employees where emp_no=@tmp_val;
    +----------+
    | count(1) |
    +----------+
    | 0 |
    +----------+
    1 row in set (0.13 sec)
    
    MySQL [employees_22]> select channel_name, filter_name, filter_rule, counter from performance_schema.replication_applier_filters where filter_name='replicate_rewrite_db';
    +--------------+----------------------+--------------------------+---------+
    | channel_name | filter_name | filter_rule | counter |
    +--------------+----------------------+--------------------------+---------+
    | channel 22 | REPLICATE_REWRITE_DB | (employees,employees_22) | 1 |
    | channel_99 | REPLICATE_REWRITE_DB | (employees,employees_99) | 0 |
    +--------------+----------------------+--------------------------+---------+
    2 rows in set (0.00 sec)
  3. Run some operations on the source-2 instance:
    mysql> set @tmp_val:=10001;
    Query OK, 0 row affected (0.00 sec)
    
    mysql> use employees
    Database changed
    
    mysql> select count(1) from salaries where emp_no=@tmp_val;
    +----------+
    | count(1) |
    +----------+
    | 17 |
    +----------+
    1 row in set (1.78 sec)
    
    mysql> delete from salaries where emp_no=@tmp_val;
    Query OK, 17 rows affected (2.83 sec)
    
    mysql> select count(1) from salaries where emp_no=@tmp_val;
    +----------+
    | count(1) |
    +----------+
    | 0 |
    +----------+
    1 row in set (1.74 sec)
  4. Review the same dataset on the replica DB instance. Because the employees database from source-2 is replicated to the employees_99 database on the replica, you need to review the same database. At the same time, when you check the values of performance_schema.replication_applier_filters.counter, you can see that the value increased to 1 for channel_99:
    MySQL [(none)]> use employees_99
    Database changed
    
    MySQL [employees_99]> set @tmp_val:=10001;
    Query OK, 0 row affected (0.00 sec)
    
    MySQL [employees_99]> select count(1) from salaries where emp_no=@tmp_val;
    +----------+
    | count(1) |
    +----------+
    | 0 |
    +----------+
    1 row in set (1.19 sec)
    
    MySQL [employees_99]> select channel_name, filter_name, filter_rule, counter from performance_schema.replication_applier_filters where filter_name='replicate_rewrite_db';
    +--------------+----------------------+--------------------------+---------+
    | channel_name | filter_name | filter_rule | counter |
    +--------------+----------------------+--------------------------+---------+
    | channel 22 | REPLICATE_REWRITE_DB | (employees,employees_22) | 1 |
    | channel_99 | REPLICATE_REWRITE_DB | (employees,employees_99) | 1 |
    +--------------+----------------------+--------------------------+---------+
    2 rows in set (0.00 sec)
  5. When you perform a few more operations on the source-2 instance, the counter related to channel_99 increases accordingly:
    MySQL [(none)]> select channel_name, filter_name, filter_rule, counter from performance_schema.replication_applier_filters where filter_name='replicate_rewrite_db';
    +--------------+----------------------+--------------------------+---------+
    | channel_name | filter_name | filter_rule | counter |
    +--------------+----------------------+--------------------------+---------+
    | channel 22 | REPLICATE_REWRITE_DB | (employees,employees_22) | 13 |
    | channel_99 | REPLICATE_REWRITE_DB | (employees,employees_99) | 1 |
    +--------------+----------------------+--------------------------+---------+
    2 rows in set (0.00 sec)

With this approach, you can configure multi-source replication from multiple source DB instances with the same schema naming convention.

Keep in mind the following:

  • This example is valid for Amazon RDS for MySQL 8.0 version only. Replication filters at the channel level are not supported in Amazon RDS for MySQL 5.7 version.
  • On multi-source replica DB instances, replication filters should be tested thoroughly and applied appropriately to avoid any data conflicts.

Conclusion

In this post, we discussed multi-source replication and how to configure it on Amazon RDS for MySQL. We also discussed different use cases and best practices to follow while using multi-source replication. Multi-source replication is available on Amazon RDS for MySQL 8.0.35 and higher minor versions, and 5.7.44 and higher minor versions. We encourage you to try this new feature to perform additional complex operations with minimal configurations changes to achieve your application requirements.

For more details about this feature, refer to Configuring multi-source-replication for RDS for MySQL.


About the Authors

Chelluru Vidyadhar is a Database Engineer with the Amazon RDS team at Amazon Web Services.

Steve Abraham is a Principal Solutions Architect for Amazon Web Services. He works with our customers to provide guidance and technical assistance on database projects, helping them improve the value of their solutions when using AWS.