AWS Database Blog

Reduce network transfer time with connection compression in Amazon RDS for MySQL and Amazon RDS for MariaDB

Our customers have seen many cases where network bandwidth between the MySQL client and database becomes the source of bottleneck. This leads to increase in the query latencies causing performance impact. This is especially seen for workloads which have high network bandwidth requirement. MySQL and MariaDB solve this problem by allowing you to compress traffic between the client and the database server.

In this post, we discuss how to use connection compression with Amazon Relational Database Service (Amazon RDS) for MySQL and Amazon RDS for MariaDB, and explore their benefits with a test case.

MySQL Client connection compression options

In version 8.0.18, MySQL introduced support for the protocol_compression_algorithms system variable, which lists the compression algorithms that the database server permits for incoming connections. With Amazon RDS for MySQL, this parameter is set to zlib,zstd,uncompressed, which means that by default, we allow three types of connections: zlib compressed, zstd compressed, and uncompressed.

While using the MySQL client, the --compression-algorithm command line option allows the client to list the desired compression algorithms while connecting to the database instance. This value defaults to uncompressed.

The --compress command line option that was available with earlier versions of MySQL client tools has been deprecated as of MySQL 8.0.18. With MySQL client versions prior to 8.0.18, the client supports only the zlib compression algorithm, in addition to uncompressed connections. You cannot specify a compression algorithm or the compression level with this option, so we recommend upgrading to the latest client driver to take advantage of compression.

To set up the connection, both the client and the database server must agree on a mutually permitted compression algorithm. Since Amazon RDS for MySQL supports all 3 compression options, the client can choose any algorithm. Therefore, if you specify multiple options with --compression-algorithms, the first compression algorithm supported by the database server gets picked. For example, if --compression-algorithms=zlib,zstd, then the zlib compression algorithm is picked. If the database server doesn’t support the zlib compression algorithm (which isn’t the case with Amazon RDS for MySQL), then the zstd compression algorithm is picked.

Zstd has proven to be more efficient algorithm as compared to ZLib in terms of compression speed, decompression speed, and compression ratio. You may evaluate both compression algorithms to see which one is optimal for your workload.

If you’re using the zstd compression algorithm to establish connections, you also have the flexibility to choose the compression level. The --zstd-compression-level command line option for the MySQL client allows you to configure the zstd compression level for the connection. This defaults to 3. The permitted levels are 1–22.

For additional reading refer to the presentation on ZStandard in ZFS which compares ZSTD with other compression algorithms and also compares the ZSTD compression levels

Although higher values for --zstd-compression-level provide a greater degree of compression, this also leads to increased utilization of the compute resources required to compress and decompress the result set packets, thereby reducing query performance. We must need to carefully choose the compression level to strike a balance between compressing the network traffic and utilizing compute resources.

Configure your compression algorithm and compression level

To get the most out of this feature, it’s important to choose the right compression algorithm and compression level for your workload. One way to achieve this is to benchmark the performance of your workloads using different compression algorithms(zstd, zlib, and uncompressed) and in the case of zstd, different compression levels which can range from 1-22.

In this section, we present a few examples using the MySQL client to establish and monitor client connections.

Connect to the RDS instance using the MySQL client and compressed connections with the following command:

# Connects to the RDS instance using the MySQL client using the zstd compression algorithm with a compression level of 7.

$ mysql -uuser1 -p -hmysql8031.xxxxxxxxxxxx.us-west-2.rds.amazonaws.com --zstd-compression-level=7 --compression-algorithms=zstd
Enter password:

There are several methods to check if your connection to the MySQL instance is compressed.

First, you can use the status(\s) command on the MySQL command line. Look for the Protocol field, which shows the value of Compressed:

mysql> \s
--------------
mysql  Ver 8.0.30 for Linux on x86_64 (MySQL Community Server - GPL)

Connection id:    819
Current database: 
Current user:   user1@10.1.3.147
SSL:      Cipher in use is ECDHE-RSA-AES128-GCM-SHA256
Current pager:    stdout
Using outfile:    ''
Using delimiter:  ;
Server version:   8.0.31 Source distribution
Protocol version: 10
Connection:   mysql8031.xxxxxxxxxxxx.us-west-2.rds.amazonaws.com via TCP/IP
Server characterset:  utf8mb4
Db     characterset:  utf8mb4
Client characterset:  utf8mb4
Conn.  characterset:  utf8mb4
TCP port:   3306
Protocol: Compressed
Binary data as:   Hexadecimal
Uptime:     2 days 22 hours 29 min 59 sec

Threads: 4  Questions: 1204176  Slow queries: 0  Opens: 215  Flush tables: 3  Open tables: 134  Queries per second avg: 4.744
--------------

Note: The protocol version field provides information on the version of TLS used and should not be confused with the ZSTD compression level.

You can also check status variables like Compression, Compression_algorithm, and Compression_level, to check the compression status of your current connection.

mysql> show status like '%compress%';
+-----------------------+-------+
| Variable_name         | Value |
+-----------------------+-------+
| Compression           | ON    |
| Compression_algorithm | zstd  |
| Compression_level     | 7     |
+-----------------------+-------+
3 rows in set (0.00 sec)

If performance schema is enabled, you can also check if compression is turned on for other connections from the application using the following query. You can use this query to get the processlist output and determine whether compression is turned on or off.

mysql> select b.PROCESSLIST_ID as Connection_id, VARIABLE_VALUE as 'Compression', PROCESSLIST_USER as user,PROCESSLIST_HOST as 'Host', b.PROCESSLIST_DB as 'Database' , b.PROCESSLIST_COMMAND as 'Command',b.PROCESSLIST_TIME as 'Time' , b.PROCESSLIST_STATE as 'State',   b.PROCESSLIST_INFO as 'Info'  from performance_schema.status_by_thread a join performance_schema.threads b on a.THREAD_ID = b.THREAD_ID where a.VARIABLE_NAME = 'Compression'  and b.PROCESSLIST_ID  <> connection_id();

+---------------+-------------+----------+-------------+----------+---------+------+-----------+-------------------------+
| Connection_id | Compression | user     | Host        | Database | Command | Time | State     | Info                    |
+---------------+-------------+----------+-------------+----------+---------+------+-----------+-------------------------+
|          1109 | OFF         | rdsadmin | localhost   | NULL     | Sleep   |    4 | NULL      | NULL                    |
|          1110 | OFF         | rdsadmin | localhost   | NULL     | Sleep   |    0 | NULL      | NULL                    |
|          1542 | ON          | user1    | 10.1.28.197 | NULL     | Query   |    3 | executing | select * from testdb.t1 |
+---------------+-------------+----------+-------------+----------+---------+------+-----------+-------------------------+
3 rows in set (0.00 sec)

Connection Compression with Amazon RDS for MariaDB

The MariaDB Client also supports connection compression. The --compress parameter passed to MariaDB client forces the use of compression in server/client protocol. You cannot specify the compression algorithm or compression levels with the --compress argument.

You can use the status(\s) command on the command line and look for the Protocol field, which shows the value of Compressed:

MariaDB [(none)]> \s
--------------
mysql  Ver 15.1 Distrib 10.5.10-MariaDB, for Linux (x86_64) using  EditLine wrapper

Connection id:		57079
Current database:	
Current user:		reinvent@172.31.49.222
SSL:			Not in use
Current pager:		stdout
Using outfile:		''
Using delimiter:	;
Server:			MariaDB
Server version:		10.6.10-MariaDB-log managed by https://aws.amazon.com/rds/
Protocol version:	10
Connection:		maria1067.xxxxxxxxxxxx.us-east-1.rds.amazonaws.com via TCP/IP
Server characterset:	latin1
Db     characterset:	latin1
Client characterset:	utf8mb3
Conn.  characterset:	utf8mb3
TCP port:		3306
Protocol:		Compressed
Uptime:			28 days 18 hours 7 min 1 sec

Threads: 4  Questions: 4947224  Slow queries: 0  Opens: 424  Open tables: 13  Queries per second avg: 1.991
--------------

You can also check the status variable Compression, to get the compression status of your current connection.

MariaDB [(none)]> show status like 'Compression';
+---------------+-------+
| Variable_name | Value |
+---------------+-------+
| Compression   | ON    |
+---------------+-------+
1 row in set (0.002 sec)

Performance benchmarking

Let’s understand the utility of connection compression with an example.

We create an Amazon Elastic Compute Cloud (Amazon EC2) instance and install the MySQL Client. We also create an Amazon RDS for MySQL database instance with the following configuration.

  • Client – Amazon EC2 t4g.2xlarge instance, 8 VCPUs, 32 GB RAM
  • Database server – Amazon RDS for MySQL 8.0.31 with a db.r6g.2xlarge instance, 8 VCPUs, 64 GB RAM

Note that using burstable instance classes like t2/t3/t4g as the client, can starve on CPU when the CPU credits are exhausted.

We use a single select query with a full table scan for the simplicity of demonstration.

Create a test table using sysbench.

mysql> create database testdb;
Query OK, 1 row affected (0.00 sec)

Now use sysbench to insert 20,000,000 rows into the table:

$ sysbench --test=/usr/share/sysbench/tests/include/oltp_legacy/insert.lua --mysql-port=3306 --db-driver=mysql --mysql-table-engine=innodb --oltp-tables-count=1 --oltp-table-size=20000000 --threads=10 --mysql-user=user1 --mysql-db=testdb --mysql-password=<password> --mysql-host=mysql8031.xxxxxxxxxxxx.us-west-2.rds.amazonaws.com prepare
WARNING: the --test option is deprecated. You can pass a script name or path on the command line without any options.
sysbench 1.0.20 (using bundled LuaJIT 2.1.0-beta2)

Creating table 'sbtest1'...
Inserting 20000000 records into 'sbtest1'
Creating secondary indexes on 'sbtest1'...

Now that we have populated the table, we start running a query with a full table scan, which scans all the rows and returns all the rows in its result set back to the client.

After connecting using the MySQL client, you can filter the result set from being displayed on the terminal by using the pager command at the MySQL client:

mysql> pager grep "rows in"
PAGER set to 'grep "rows in"'

We run a select on a table with full table scan.

mysql> select * from testdb.sbtest1;
20000000 rows in set (3 min 2.50 sec)

We measure the Network Transfer Speed using Enhanced Monitoring feature available on RDS. In the Enhanced monitoring OS metrics, we add the network RX and TX to get the total data transferred. You can also use the nload tool to measure network bandwidth on the EC2 instance where MySQL Client is installed. Our results are as follows:

ZSTD Compression Level Max Network bandwidth Utilization(tx+rx) (MBps) Time to Execute Query and Return Result Set(seconds)
0 231 17.25
3 28 72.11
7 16 126.55
10 11 182.45

In this experiment, we see no performance gains by using compressed connections. The network bandwidth utilization reduces when the compression level is increased, but we also see an increase in the time to run the query and return the result set. The query completion time increased along with the increase in the zstd compression levels. This increase in execution time is due to the additional overheads required to compress the result set. Now let’s use a table with larger row size:

mysql> create database testdb;
Query OK, 1 row affected (0.07 sec)

mysql> use testdb;
Database changed

We populated random data in to the table, as shown in the following queries, instead of using sysbench.

mysql> create table t1(col1 int auto_increment primary key, col2 TEXT,col3 TEXT,col4 TEXT, col5 TEXT,col6 TEXT,col7 TEXT,col8 TEXT,col9 TEXT, col10 TEXT);
Query OK, 0 rows affected (0.08 sec)

mysql> insert into testdb.t1 values(null, repeat(md5(rand()*1000),2000), repeat(md5(rand()*1000),2000), repeat(md5(rand()*1000), 2000), repeat(md5(rand()*1000), 2000), repeat(md5(rand()*1000), 2000), repeat(md5(rand()*1000), 2000), repeat(md5(rand()*1000), 2000), repeat(md5(rand()*1000), 2000), repeat(md5(rand()*1000), 2000)); 
Query OK, 1 row affected (0.12 sec)
$ for i in `seq 0 16`;do mysql  -uuser1 -p<password>  -hmysql8031.xxxxxxxxxxxx.us-west-2.rds.amazonaws.com  -e"use testdb;select repeat(md5(rand()),2000) into @tmp; insert into t1 (col2) select @tmp from t1;";done

We run a select on the table with a full table scan.

mysql> select * from t1;
32768 rows in set (1 min 16.64 sec)

Our results are as follows:

ZSTD Compression Level Max Network bandwidth Utilization(tx+rx) (MBps) Time to Execute Query and Return Result Set(seconds)
0 594 32.43
3 1.7 11.01
7 0.26 69.92
10 0.23 72.64

In this experiment, we used the repeat function to load data into the table. Due to the redundant nature of the table data, the compression algorithm was able to efficiently compress the data. We see that the network bandwidth reduced considerably with compressed connections. It reduced from 594 MBps to 1.7 MBps with a zstd compression level of 3. This also reduced the query completion time. However, when the compression level was further increased to 7, the query took more time to complete.

Conclusion

In this post, we discussed how connection compression is helpful for certain kinds of workloads on Amazon RDS for MySQL and Amazon RDS for MariaDB where network bandwidth is a constraint and result sets are large. We also showed you how the degree of compression depends on the nature of the data set.

You can see the benefit in cases where network transfer time is larger compared to the cost of compute resources required for compression and decompression operations of the connection payload.

In order to check if connection compression is the right fit for your workload, we suggest you perform benchmark tests.

More details on Connection Compression with MySQL are available in the MySQL Reference Manual. Add a comment to share with us how useful is connection compression for your workload.


About the Author

Mershad Irani is a Database Engineer (DBE) at Amazon Web Services.