AWS Database Blog

How to set up a Binlog Server for Amazon RDS for MySQL and MariaDB using MariaDB MaxScale

One of the key features of Amazon RDS for MySQL and Amazon RDS for MariaDB is the ability to create Read Replicas. You can easily create up to five replicas for a single master database instance via the AWS Management Console or the AWS CLI. Amazon RDS then handles all the work of making a backup of the master, restoring the backup as a replica, and then establishing a replication channel to the master and replica. This fully automated handling of replication in Amazon RDS is called managed replication.

In situations where you want non-standard replication topologies, you can use a Binlog Server—for example, if you want more than five replicas, or you want to forward replication log records to downstream applications. Unlike a replica, a Binlog Server does not apply the log records from the master; rather it provides a caching layer between the master and one or more subscribers. In this post, I discuss the advantages (and a few limitations) of using this approach. I also walk through the process of using MariaDB MaxScale and non-managed replication to set up a Binlog Server for Amazon RDS for MySQL and MariaDB.

Managed replication in Amazon RDS for MySQL and MariaDB

Managed replication in Amazon RDS ensures that replication from the master to the replica starts with the next transaction that is applied to the master after the backup was taken, so that there is no data loss in the setup. Amazon RDS continuously monitors replication and makes adjustments. For example, it properly resumes replication after a master or replica crashes and sends alerts if replication breaks because of a bad transaction (that is, a poison pill on the master).

Non-managed replication in Amazon RDS for MySQL and MariaDB

While managed replication is provided through the console and the Amazon RDS web services API, non-managed replication is provided by a set of stored procedures. These include the following:

  • rds_set_external_master – Establishes a custom replication configuration from an Amazon RDS for MySQL or MariaDB instance to a master. This master can be an Amazon RDS instance or possibly an external MySQL-compatible instance (for example, a MySQL instance running on Amazon EC2). You can think of this procedure as a wrapper around the native CHANGE MASTER TO command in MySQL.
  • rds_set_external_master_gtid – An Amazon RDS for MariaDB-only procedure that establishes custom replication using global transaction identifier (GTID)-based coordinates instead of file-based coordinates.
  • rds_reset_external_master – Removes a custom replication configuration that had previously been set up via mysql.rds_set_external_master. You can think of this procedure as a wrapper around the native RESET SLAVE command in MySQL.
  • rds_start_replication – Starts replication after having established a custom replication configured or after having stopped replication. You can think of this procedure as a wrapper around the native START SLAVE command in MySQL.
  • rds_stop_replication – Stops replication—either managed or non-managed. You can think of this procedure as a wrapper around the native STOP SLAVE command in MySQL.

Binlog Server overview

Some MySQL and MariaDB users use a proxy server like MariaDB MaxScale to take advantage of features like automatic read/write splitting, schema-based sharding, and change data capture. Another unique feature it offers is the ability to create a Binlog Server that acts as an intermediate binary log caching layer between a master and replica. The master sends its binary logs to the proxy that is running the Binlog Server, and in turn, replicas read their binary logs from the proxy.

This option offers a number of advantages:

  • The load in serving binary logs to replicas is lessened on the master because it only needs to communicate with the proxy. The proxy in turn services the replicas.
  • The proxy has the exact copy of the binary logs as they were originally on the master. You can save and analyze these logs without affecting the master or the replicas.
  • You can easily automate the failover of masters and move replicas to another master because the Binlog Server proxy abstracts the dependency between master and replica.

Jean-François Gagné of Booking.com has written a number of blog posts on the advantages of a Binlog Server and how to set one up in an on-premises environment using MaxScale.

Setting up a Binlog Server in Amazon RDS

You can set up a Binlog Server in Amazon RDS using MariaDB MaxScale and Amazon RDS non-managed replication. Given the certain restrictions in Amazon RDS, there are limitations in the use of a Binlog Server compared to an on-premises environment. Still, it is possible to set up a basic configuration in both Amazon RDS for MySQL and Amazon RDS for MariaDB.

To start, you need to have an RDS instance to use as a master. The setup here assumes that there are no replicas yet and the master is being newly created. First, create an Amazon RDS for MySQL or MariaDB instance using a version that supports non-managed replication. This will be an Amazon RDS for MySQL instance with a major version of 5.6 or later, or any version of Amazon RDS for MariaDB. If possible, don’t create the instance initially with backup retention enabled. The reason is that later, you will want to ensure that the first binary log that is generated by the instance is not automatically purged by Amazon RDS automation.

In my test setup, I created an RDS for MySQL 5.7.21 instance in the US West (Oregon) Region named jaime-binlog-server-master, with the default port of 3306. Also, in case I need to make parameter changes, I created a custom parameter group named jaime-binlog-server-pg for my instance.

Configure binlog retention

After creating an RDS instance, enable extended retention of binary logs on the host by executing the following command using a SQL client:

call mysql.rds_set_configuration('binlog retention hours', 24);

This command ensures that binary logs are retained for 24 hours on the host after rotation. Otherwise, Amazon RDS automation would purge the log from the host about five minutes after rotation. After you run the command, it’s okay to enable backup retention on the RDS instance by using the AWS CLI or the console. Enabling backup retention also enables binary logging. Because you have the extended retention, the generated binary logs will reside for extra time on the host. You need this to allow the Binlog Server to later download logs starting from the first one generated.

Create a Binlog Server

Create a Binlog Server by first provisioning an Amazon EC2 instance. The instance must be able to access the Amazon RDS master instance that you created earlier. This might mean that you create the EC2 instance in the same virtual private cloud (VPC) as the RDS instance. Or you could authorize a CIDR IP range, which includes the IP address of the EC2 instance, to the security group that is associated with the RDS instance.

Which Amazon Machine Image (AMI) to use for the EC2 instance depends on your preference. However, the installation of the MaxScale software depends on the Linux distribution that is used. In my test setup, I chose Red Hat Enterprise Linux (RHEL) 7.4 using ami-223f945a.Screenshot to Choose an Amazon Machine Image (AMI)

Creating an EC2 instance based on RHEL 7 means that you can easily install MaxScale, along with MariaDB client tools, using the MariaDB Package Repository.

  • curl -sS https://downloads.mariadb.com/MariaDB/mariadb_repo_setup | sudo bash
  • yum install maxscale
  • yum install MariaDB-server

Note that the current version of MaxScale as of this writing is version 2.2.5.

Create a MaxScale user

MariaDB MaxScale needs a special user that it can connect to the MySQL instance so that it can execute commands to discover details about other users and artifacts on the instance. This can be the normal Amazon RDS master or a special user just for MaxScale.

In my test setup, I created a maxscale user, associated it with the hostname of my EC2 user, and ran a series of commands to grant the appropriate privileges needed by MaxScale.

CREATE USER 'maxscale'@'ec2-23-131-67-19.us-west-2.compute.amazonaws.com' IDENTIFIED BY 'F7XfBfJl';
GRANT SELECT ON mysql.user TO 'maxscale'@'ec2-23-131-67-19.us-west-2.compute.amazonaws.com';
GRANT SELECT ON mysql.db TO 'maxscale'@'ec2-23-131-67-19.us-west-2.compute.amazonaws.com';
GRANT SELECT ON mysql.tables_priv TO 'maxscale'@'ec2-23-131-67-19.us-west-2.compute.amazonaws.com';
GRANT SHOW DATABASES ON *.* TO 'maxscale'@'ec2-23-131-67-19.us-west-2.compute.amazonaws.com';

Additionally, you need a user with REPLICATION SLAVE privilege because MaxScale will connect to the master as this user and download binary logs. In general, the Amazon RDS master user has this privilege. But just in case, you can run the following command to explicitly grant the privilege to the master user named admin.

GRANT REPLICATION SLAVE ON *.* TO 'admin'@'%';

Configure maxscale.cnf

MaxScale normally runs off of a configuration file named /etc/maxscale.cnf. The file defines the kinds of services to run on MaxScale as well as service-specific parameters. A Binlog Server template configuration file is included in the setup. To configure MaxScale to run as a Binlog Server, copy the template file as /etc/maxscale.cnf.

sudo cp /usr/share/maxscale/maxscale_binlogserver_template.cnf /etc/maxscale.cnf

Then edit the file to make the following changes:

  1. Change user=repl to user=maxscale and passwd=slavepass to passwd=F7XfBfJl (or whatever is chosen for the maxscale user password).
  2. Change version_string=5.6.15-log to version_string=5.7.21-log. The version string will depend on the version of Amazon RDS for MySQL or MariaDB. In my case, my RDS instance is based on MySQL 5.7.21, so I replaced 6.15 with 5.7.21.
  3. The router_options string must include a server ID, master user name, master user password, mariadb10-compatibility=0, and filestem=mysql-bin-changelog.
    • For the server ID, use the same server ID as the RDS instance by querying SELECT @@server_id and noting the result.
    • The mariadb10-compatibility=0 is needed because this is a MySQL instance that doesn’t use the same GTID format as in MariaDB 10.0 or later.
    • The filestem=mysql-bin-changelog lets MaxScale know that binary logs are prefixed as mysql-bin-changelog, which is a standard naming convention in Amazon RDS. For example, with a server ID of 1961731445 and a master password of qqnalh9u, my router_options line looks like the following:
      router_options=server-id=1961731445,user=admin,password=qqnalh9u,mariadb10-compatibility=0,filestem=mysql-bin-changelog
  4. Change address=master.example.com to the endpoint of your RDS instance. In my case, I have address=jaime-binlog-server-master.wzyowmbtiken.us-west-2.rds.amazonaws.com
  5. In the [Binlog Listener] section, change port=5306 to port=3306 or whatever port was configured for your RDS for MySQL instance.

Overall, this is what my /etc/maxscale.cnf file looks like:

#
# Example maxscale.cnf for the Binlog Server.
#
#

#######################################################################
# MaxScale Global configuration.
#
# Valid options are:
#       threads=<number of threads>
[maxscale]
threads=6

# Other parameters.
#log_messages=1
#log_trace=1
#log_debug=1
#non_blocking_polls
#poll_sleep


#######################################################################
# A series of service definitions
#
# Valid options are:
#       type=service
#       router=<name of router module>
#       servers=<server name>,<server name>,...
#       user=<User to fetch password information with>
#       passwd=<Password of the user, plain text currently>
#       version_string=<specific string for server handshake,
#               default is the MariaDB embedded library version>
#       router_options=<option[=value]>,<option[=value]>,...
#

#######################################################################
# The MaxScale Binlog Server Service.
#
# The name of this service will be used as the directory name
#   in the cache directory where the binlogs will be saved.
# If this name is changed, it must be changed in the listener
#   configuration below.
[Binlog_Service]

# type must be service
# router must be binlogrouter
type=service
router=binlogrouter

# servers should include a single name corresponding to the master
#    where the Binlog Server will download its binlogs from.
servers=master

# user, password and version: see generic definition.
# Note: user should have the following grants:
#       SELECT ON mysql.user
#       SELECT ON mysql.db
#       SHOW DATABASES ON *.*
user=maxscale
passwd=F7XfBfJl
version_string=5.7.21-log

# The router_options set parameters to the binlogrouter:
#    server-id=
#       The server-id that MaxScale uses when it connects
#       to the real master server. Again it will report
#       the master's server-id to the slaves that connect
#       to it.
#    user=
#       The user that MaxScale uses to log in to the real master.
#       Note: user should have "REPLICATION SLAVE" grant.
#    password=
#       The password that MaxScale uses to log in to the real master.
#    filestem=
#       The prefix of the binlogs downloaded from master.
router_options=server-id=1961731445,user=admin,password=qqnalh9u,mariadb10-compatibility=0,filestem=mysql-bin-changelog

######################################################################
# Configuration of the master from which binlogs are downloaded.
#
[master]
type=server
address=jaime-binlog-server-master.wzyowmbtiken.us-west-2.rds.amazonaws.com
port=3306
protocol=MySQLBackend


######################################################################
# Configuration of the listening service of the Binlog Server.
#
[Binlog Listener]
type=listener
service=Binlog_Service
protocol=MySQLClient
port=3306


######################################################################
# Debug Service and Listener.
#
[Debug Service]
type=service
router=debugcli

[Debug Listener]
type=listener
service=Debug Service
protocol=telnetd
address=localhost
port=4442


######################################################################
# CLI Service and Listener.
#
[CLI Service]
type=service
router=cli

[CLI Listener]
type=listener
service=CLI Service
protocol=maxscaled
address=localhost
socket=default


# EOF.

Configure master.ini

By default, binary logs are downloaded to the /var/lib/maxscale directory on the EC2 host running MaxScale. Another master-specific configuration file needs to be in this directory to tell MaxScale what CHANGE MASTER TO command it needs to run to establish replication with the RDS instance. This file is named master.ini, and at a minimum, it contains properties for master_host, master_port, master_user, and master_password. These need to be set to the Amazon RDS endpoint, port, master user name, and master user password respectively.

For example, this is what the master.ini file looks like in my test setup.

[binlog_configuration]
master_host=jaime-binlog-server-master.wzyowmbtiken.us-west-2.rds.amazonaws.com
master_port=3306
master_user=admin
master_password=qqnalh9u

Alternatively, MaxScale can be started without a master.ini file and the file auto-generated by connecting to the proxy as the maxscale user and issuing a CHANGE MASTER TO command. This is useful when needing to connect an existing master that has been running for a long enough time that the initial binary log file is no longer on the host. Otherwise a configuration based on an explicit master.ini assumes it starts from the initial binary log (e.g., mysql-bin-changelog.000001).

Start MaxScale

At this point, it should be safe to start up MaxScale. First, verify that MaxScale can connect to the Amazon RDS instance as the maxscale user via the Amazon RDS endpoint and port. Because the MariaDB client tools should be installed on the EC2 host, run the mysql client to verify connectivity.

For example, this is the command I executed to verify connectivity:

mysql -hjaime-binlog-server-replica. wzyowmbtiken.us-west-2.rds.amazonaws.com -umaxscale -p F7XfBfJl --port=3306

If you can connect, it’s safe to start MaxScale using the following command:

sudo service maxscale start

As MaxScale is running, you can also check status with this command:

sudo systemctl status maxscale.service

You can also run the maxadmin utility with the show services command to receive a summary of all services such as the Binlog_Service.

If the Binlog Server is running properly, you will see binary logs accumulate in the /var/lib/maxscale directory, starting with mysql-bin-changelog.000001 (assuming an explicit master.ini file). The first few lines of the service summary from maxadmin will look something like this:

        Service:                             Binlog_Service
        Router:                              binlogrouter
        State:                               Started
        Master connection DCB:               0x18f3850
        Master connection state:             Binlog Dump
        Binlog directory:                    /var/lib/maxscale

If there was no explicit master.ini file, then connect to the proxy as the maxscale user and issue a CHANGE MASTER TO command. For example, suppose that you want to start downloading binary logs from position 4 of the binary log file mysql-bin-changelog.007076. On the EC2 host running MaxScale, you would connect using the mysql client to host address 127.0.0.1 as the maxscale user that you configured earlier.

mysql -h127.0.0.1 -umaxscale -pF7XfBfJl

Then you would issue this CHANGE MASTER TO command:

CHANGE MASTER TO MASTER_HOST = 'jaime-binlog-server-master.wzyowmbtiken.us-west-2.rds.amazonaws.com', MASTER_PORT = 3306, MASTER_USER = 'admin', MASTER_PASSWORD = 'qqnalh9u', MASTER_LOG_FILE = 'mysql-bin-changelog.007076', MASTER_LOG_POS = 4;

Adding replicas

You can add replicas to the Binlog Server by issuing a call to mysql.rds_set_external_master with the appropriate parameters. You can do the initial setup of the replica in a couple of ways:

  1. Create an Amazon RDS managed replica, wait for replication to do a full sync, and stop replication by calling rds_stop_replication. Note the Relay_Master_Log_File and Exec_Master_Log_Pos values from SHOW SLAVE STATUS output. Promote the Read Replica to convert it from a managed Amazon RDS replica to non-managed.
  2. Create a standard RDS instance, and populate the instance from the master by using a tool like mysqldump or AWS Database Migration Service (AWS DMS). If using mysqldump, you need to note the binary log position on the master when taking a consistent dump of data. Normally, you can do this by specifying --master-data=2 as part of the mysqldump However, this doesn’t work in Amazon RDS for MySQL because of internal restrictions with FLUSH TABLES.

Whatever the case, when you finally have a replica to add, call mysql.rds_set_external_master. Pass in the host address of the EC2 host that is running MaxScale, along with the Amazon RDS master user credentials and the binary log file coordinates from where you want to start replication from.

In my test setup, I started from the first binary log and issued the following command:

CALL mysql.rds_set_external_master ('ec2-23-131-67-19.us-west-2.compute.amazonaws.com', 3306, 'admin', 'qqnalh9u', 'mysql-bin-changelog.000001', 4, 0);

Amazon RDS Binlog Server limitations

Although MaxScale supports Amazon RDS, be aware of several limitations compared to when running non-Amazon RDS for MySQL instances, such as the following:

  • You can modify the log_slave_updates parameter in Amazon RDS for MySQL and Amazon RDS for MariaDB. The parameter is always enabled in Amazon RDS for MySQL and MariaDB.
  • There is no Amazon RDS automation around MaxScale, such as Multi-AZ or purging of binary logs on the Binlog Server. You must handle all of this logic.
  • There is currently no support for GTID-based replication. This is not a problem in Amazon RDS for MySQL, which does not support GTID-based replication. For Amazon RDS for MariaDB, the rds_set_external_master stored procedure must be used instead of the mysql.rds_set_external_master_gtid stored procedure.

Summary

A Binlog Server can be useful as an intermediate binary log caching layer between a master and one or more replicas. This post discussed advantages and limitations when using this approach and walked you through the process of setting up a Binlog Server using MariaDB MaxScale. It is a great option for gaining more control over the replication process in Amazon RDS for MySQL and MariaDB.


About the Authors

Jaime Lichauco is a database engineer on the RDS team at Amazon Web Services.