How to set up a Binlog Server for Amazon RDS for MySQL and MariaDB using MariaDB MaxScale
One of the key features of Amazon RDS for MySQL and Amazon RDS for MariaDB is the ability to create Read Replicas. You can easily create up to five replicas for a single master database instance via the AWS Management Console or the AWS CLI. Amazon RDS then handles all the work of making a backup of the master, restoring the backup as a replica, and then establishing a replication channel to the master and replica. This fully automated handling of replication in Amazon RDS is called managed replication.
In situations where you want non-standard replication topologies, you can use a Binlog Server—for example, if you want more than five replicas, or you want to forward replication log records to downstream applications. Unlike a replica, a Binlog Server does not apply the log records from the master; rather it provides a caching layer between the master and one or more subscribers. In this post, I discuss the advantages (and a few limitations) of using this approach. I also walk through the process of using MariaDB MaxScale and non-managed replication to set up a Binlog Server for Amazon RDS for MySQL and MariaDB.
Managed replication in Amazon RDS for MySQL and MariaDB
Managed replication in Amazon RDS ensures that replication from the master to the replica starts with the next transaction that is applied to the master after the backup was taken, so that there is no data loss in the setup. Amazon RDS continuously monitors replication and makes adjustments. For example, it properly resumes replication after a master or replica crashes and sends alerts if replication breaks because of a bad transaction (that is, a poison pill on the master).
Non-managed replication in Amazon RDS for MySQL and MariaDB
While managed replication is provided through the console and the Amazon RDS web services API, non-managed replication is provided by a set of stored procedures. These include the following:
- rds_set_external_master – Establishes a custom replication configuration from an Amazon RDS for MySQL or MariaDB instance to a master. This master can be an Amazon RDS instance or possibly an external MySQL-compatible instance (for example, a MySQL instance running on Amazon EC2). You can think of this procedure as a wrapper around the native CHANGE MASTER TO command in MySQL.
- rds_set_external_master_gtid – An Amazon RDS for MariaDB-only procedure that establishes custom replication using global transaction identifier (GTID)-based coordinates instead of file-based coordinates.
- rds_reset_external_master – Removes a custom replication configuration that had previously been set up via mysql.rds_set_external_master. You can think of this procedure as a wrapper around the native RESET SLAVE command in MySQL.
- rds_start_replication – Starts replication after having established a custom replication configured or after having stopped replication. You can think of this procedure as a wrapper around the native START SLAVE command in MySQL.
- rds_stop_replication – Stops replication—either managed or non-managed. You can think of this procedure as a wrapper around the native STOP SLAVE command in MySQL.
Binlog Server overview
Some MySQL and MariaDB users use a proxy server like MariaDB MaxScale to take advantage of features like automatic read/write splitting, schema-based sharding, and change data capture. Another unique feature it offers is the ability to create a Binlog Server that acts as an intermediate binary log caching layer between a master and replica. The master sends its binary logs to the proxy that is running the Binlog Server, and in turn, replicas read their binary logs from the proxy.
This option offers a number of advantages:
- The load in serving binary logs to replicas is lessened on the master because it only needs to communicate with the proxy. The proxy in turn services the replicas.
- The proxy has the exact copy of the binary logs as they were originally on the master. You can save and analyze these logs without affecting the master or the replicas.
- You can easily automate the failover of masters and move replicas to another master because the Binlog Server proxy abstracts the dependency between master and replica.
Jean-François Gagné of Booking.com has written a number of blog posts on the advantages of a Binlog Server and how to set one up in an on-premises environment using MaxScale.
Setting up a Binlog Server in Amazon RDS
You can set up a Binlog Server in Amazon RDS using MariaDB MaxScale and Amazon RDS non-managed replication. Given the certain restrictions in Amazon RDS, there are limitations in the use of a Binlog Server compared to an on-premises environment. Still, it is possible to set up a basic configuration in both Amazon RDS for MySQL and Amazon RDS for MariaDB.
To start, you need to have an RDS instance to use as a master. The setup here assumes that there are no replicas yet and the master is being newly created. First, create an Amazon RDS for MySQL or MariaDB instance using a version that supports non-managed replication. This will be an Amazon RDS for MySQL instance with a major version of 5.6 or later, or any version of Amazon RDS for MariaDB. If possible, don’t create the instance initially with backup retention enabled. The reason is that later, you will want to ensure that the first binary log that is generated by the instance is not automatically purged by Amazon RDS automation.
In my test setup, I created an RDS for MySQL 5.7.21 instance in the US West (Oregon) Region named jaime-binlog-server-master, with the default port of 3306. Also, in case I need to make parameter changes, I created a custom parameter group named jaime-binlog-server-pg for my instance.
Configure binlog retention
After creating an RDS instance, enable extended retention of binary logs on the host by executing the following command using a SQL client:
This command ensures that binary logs are retained for 24 hours on the host after rotation. Otherwise, Amazon RDS automation would purge the log from the host about five minutes after rotation. After you run the command, it’s okay to enable backup retention on the RDS instance by using the AWS CLI or the console. Enabling backup retention also enables binary logging. Because you have the extended retention, the generated binary logs will reside for extra time on the host. You need this to allow the Binlog Server to later download logs starting from the first one generated.
Create a Binlog Server
Create a Binlog Server by first provisioning an Amazon EC2 instance. The instance must be able to access the Amazon RDS master instance that you created earlier. This might mean that you create the EC2 instance in the same virtual private cloud (VPC) as the RDS instance. Or you could authorize a CIDR IP range, which includes the IP address of the EC2 instance, to the security group that is associated with the RDS instance.
Which Amazon Machine Image (AMI) to use for the EC2 instance depends on your preference. However, the installation of the MaxScale software depends on the Linux distribution that is used. In my test setup, I chose Red Hat Enterprise Linux (RHEL) 7.4 using ami-223f945a.
Creating an EC2 instance based on RHEL 7 means that you can easily install MaxScale, along with MariaDB client tools, using the MariaDB Package Repository.
curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash
yum install maxscale
yum install MariaDB-server
Note that the current version of MaxScale as of this writing is version 2.2.5.
Create a MaxScale user
MariaDB MaxScale needs a special user that it can connect to the MySQL instance so that it can execute commands to discover details about other users and artifacts on the instance. This can be the normal Amazon RDS master or a special user just for MaxScale.
In my test setup, I created a maxscale user, associated it with the hostname of my EC2 user, and ran a series of commands to grant the appropriate privileges needed by MaxScale.
Additionally, you need a user with REPLICATION SLAVE privilege because MaxScale will connect to the master as this user and download binary logs. In general, the Amazon RDS master user has this privilege. But just in case, you can run the following command to explicitly grant the privilege to the master user named admin.
MaxScale normally runs off of a configuration file named
/etc/maxscale.cnf. The file defines the kinds of services to run on MaxScale as well as service-specific parameters. A Binlog Server template configuration file is included in the setup. To configure MaxScale to run as a Binlog Server, copy the template file as
Then edit the file to make the following changes:
passwd=F7XfBfJl(or whatever is chosen for the
version_string=5.7.21-log. The version string will depend on the version of Amazon RDS for MySQL or MariaDB. In my case, my RDS instance is based on MySQL 5.7.21, so I replaced
router_optionsstring must include a server ID, master user name, master user password,
- For the server ID, use the same server ID as the RDS instance by querying
SELECT @@server_idand noting the result.
mariadb10-compatibility=0is needed because this is a MySQL instance that doesn’t use the same GTID format as in MariaDB 10.0 or later.
filestem=mysql-bin-changeloglets MaxScale know that binary logs are prefixed as
mysql-bin-changelog, which is a standard naming convention in Amazon RDS. For example, with a server ID of
1961731445and a master password of
router_optionsline looks like the following:
- For the server ID, use the same server ID as the RDS instance by querying
address=master.example.comto the endpoint of your RDS instance. In my case, I have
- In the
[Binlog Listener]section, change
port=3306or whatever port was configured for your RDS for MySQL instance.
Overall, this is what my
/etc/maxscale.cnf file looks like:
By default, binary logs are downloaded to the
/var/lib/maxscale directory on the EC2 host running MaxScale. Another master-specific configuration file needs to be in this directory to tell MaxScale what
CHANGE MASTER TO command it needs to run to establish replication with the RDS instance. This file is named
master.ini, and at a minimum, it contains properties for
master_password. These need to be set to the Amazon RDS endpoint, port, master user name, and master user password respectively.
For example, this is what the
master.ini file looks like in my test setup.
Alternatively, MaxScale can be started without a
master.ini file and the file auto-generated by connecting to the proxy as the maxscale user and issuing a
CHANGE MASTER TO command. This is useful when needing to connect an existing master that has been running for a long enough time that the initial binary log file is no longer on the host. Otherwise a configuration based on an explicit
master.ini assumes it starts from the initial binary log (e.g.,
At this point, it should be safe to start up MaxScale. First, verify that MaxScale can connect to the Amazon RDS instance as the maxscale user via the Amazon RDS endpoint and port. Because the MariaDB client tools should be installed on the EC2 host, run the mysql client to verify connectivity.
For example, this is the command I executed to verify connectivity:
If you can connect, it’s safe to start MaxScale using the following command:
As MaxScale is running, you can also check status with this command:
You can also run the maxadmin utility with the show services command to receive a summary of all services such as the Binlog_Service.
If the Binlog Server is running properly, you will see binary logs accumulate in the
/var/lib/maxscale directory, starting with
mysql-bin-changelog.000001 (assuming an explicit
master.ini file). The first few lines of the service summary from maxadmin will look something like this:
If there was no explicit
master.ini file, then connect to the proxy as the maxscale user and issue a
CHANGE MASTER TO command. For example, suppose that you want to start downloading binary logs from position 4 of the binary log file
mysql-bin-changelog.007076. On the EC2 host running MaxScale, you would connect using the mysql client to host address 127.0.0.1 as the maxscale user that you configured earlier.
Then you would issue this
CHANGE MASTER TO command:
You can add replicas to the Binlog Server by issuing a call to mysql.rds_set_external_master with the appropriate parameters. You can do the initial setup of the replica in a couple of ways:
- Create an Amazon RDS managed replica, wait for replication to do a full sync, and stop replication by calling rds_stop_replication. Note the
Exec_Master_Log_Posvalues from SHOW SLAVE STATUS output. Promote the Read Replica to convert it from a managed Amazon RDS replica to non-managed.
- Create a standard RDS instance, and populate the instance from the master by using a tool like mysqldump or AWS Database Migration Service (AWS DMS). If using mysqldump, you need to note the binary log position on the master when taking a consistent dump of data. Normally, you can do this by specifying
--master-data=2as part of the mysqldump However, this doesn’t work in Amazon RDS for MySQL because of internal restrictions with FLUSH TABLES.
Whatever the case, when you finally have a replica to add, call mysql.rds_set_external_master. Pass in the host address of the EC2 host that is running MaxScale, along with the Amazon RDS master user credentials and the binary log file coordinates from where you want to start replication from.
In my test setup, I started from the first binary log and issued the following command:
Amazon RDS Binlog Server limitations
Although MaxScale supports Amazon RDS, be aware of several limitations compared to when running non-Amazon RDS for MySQL instances, such as the following:
- You can modify the log_slave_updates parameter in Amazon RDS for MySQL and Amazon RDS for MariaDB. The parameter is always enabled in Amazon RDS for MySQL and MariaDB.
- There is no Amazon RDS automation around MaxScale, such as Multi-AZ or purging of binary logs on the Binlog Server. You must handle all of this logic.
- There is currently no support for GTID-based replication. This is not a problem in Amazon RDS for MySQL, which does not support GTID-based replication. For Amazon RDS for MariaDB, the rds_set_external_master stored procedure must be used instead of the mysql.rds_set_external_master_gtid stored procedure.
A Binlog Server can be useful as an intermediate binary log caching layer between a master and one or more replicas. This post discussed advantages and limitations when using this approach and walked you through the process of setting up a Binlog Server using MariaDB MaxScale. It is a great option for gaining more control over the replication process in Amazon RDS for MySQL and MariaDB.
About the Authors
Jaime Lichauco is a database engineer on the RDS team at Amazon Web Services.