AWS Database Blog

Performance testing MySQL migration environments using query playback and traffic mirroring – Part 3

This is the third post in a series where we dive deep into performance testing of MySQL environments being migrated from on premises. In Part 1, we compared the query playback and traffic mirroring approaches at a high level. In Part 2, we showed how to set up and configure query playback. In this post, we show you how to set up and configure traffic mirroring.

Solution overview

The core of our solution uses ProxySQL for mirroring SQL query traffic from an existing on-premises database to Amazon Aurora MySQL-Compatible Edition or other MySQL-compatible databases such as Amazon Relational Database Service (Amazon RDS) for MySQL or MySQL running on Amazon Elastic Compute Cloud (Amazon EC2). This method facilitates real-time testing in a migration environment under actual workload conditions.

The following diagram illustrates an example architecture for traffic mirroring with ProxySQL.

Database mirroring architecture with proxysql

In this architecture, we use AWS Direct Connect to establish a dedicated network connection from the on-premises environment to AWS. This provides optimal bandwidth and network performance, reducing potential bottlenecks or variable latencies that could skew the testing results. It also provides a private connection between on-premises and AWS for the continually mirrored traffic.

Prerequisites

Complete the following prerequisites:

  1. Create a backup of your on-premises database using Percona XtraBackup. Restore the backup locally to create a local test environment.
  2. Copy the backup to your Amazon Simple Storage Service private (Amazon S3) bucket.
  3. Restore the backup to create your Aurora MySQL-Compatible database.
    1. Make sure the RDS primary user that is created as part of the restore process doesn’t conflict with any users that are currently stored in the database.
    2. In the case of testing a newer Aurora MySQL version, restore the backup to create your Aurora MySQL database and perform an in-place upgrade from Aurora MySQL 2 (5.7 compatible) to Aurora MySQL 3 (8.0 compatible).
    3. When the restores are complete, take an RDS snapshot of the newly created environments for easier recreation.

Implement traffic mirroring

To implement traffic mirroring with ProxySQL, complete the following steps:

  1. Install and configure ProxySQL in your on-premises environment. For a production setup, make sure ProxySQL is configured to be highly available in order to avoid a single point of failure.
  2. After you add your primary instances to ProxySQL, you need to create additional routing rules to add the mirrored resource to your list of MySQL servers:
    INSERT INTO mysql_servers(hostgroup_id, hostname, port, comment) 
    VALUES (100, 'mirroring.cluster-c5h2nrbsz4ea.eu-west-2.rds.amazonaws.com', 3306, "mirror_cluster");
  3. Define which SQL queries should be mirrored to the AWS environment by inserting appropriate rules into the mysql_query_rules This could be all queries or a subset based on specific criteria such as query type or originating schema. The following example shows how you can send all non-locking SELECT queries to the mirror (in this case, SELECT FOR UPDATE is only sent the original production instance). However, it is also possible to send other types of traffic such as INSERT, UPDATE, or DELETE.
    # Basic rule that sends all traffic to host group 1
    INSERT INTO mysql_query_rules (
    rule_id,
    active,
    match_pattern,
    destination_hostgroup,
    apply)
    VALUES (1, 1, '^SELECT', 2, 1);
    
    INSERT INTO mysql_query_rules (
    rule_id,
    active,
    match_pattern,
    destination_hostgroup,
    apply)
    VALUES (1,1,'^SELECT.*FOR UPDATE',1,1),(2,1,'^SELECT',1,1);
    
    # Update rules to mirror select traffic
    
    UPDATE mysql_query_rules SET mirror_hostgroup = 100 where rule_id=2;
  4. Apply the new configurations by loading them into the runtime and saving them to disk:
    LOAD MYSQL SERVERS TO RUNTIME;
    SAVE MYSQL SERVERS TO DISK;
    LOAD MYSQL QUERY RULES TO RUNTIME;
    SAVE MYSQL QUERY RULES TO DISK;

    Now you have a workload running against a compatible source and target, either synthetic or your actual production database.

  5. You can use the built-in stats from ProxySQL to review query performance across both databases:
    ProxySQL Admin> select hostgroup, digest, digest_text, count_star, first_seen, last_seen, sum_time, min_time, max_time, sum_rows_sent from stats_mysql_query_digest_reset order by count_star desc limit 10;
    +-----------+--------------------+-----------------------------------+------------+------------+------------+----------+----------+----------+---------------+
    | hostgroup | digest             | digest_text                       | count_star | first_seen | last_seen  | sum_time | min_time | max_time | sum_rows_sent |
    +-----------+--------------------+-----------------------------------+------------+------------+------------+----------+----------+----------+---------------+
    | 100       | 0xd84e4e04982951c1 | SELECT c FROM sbtest9 WHERE id=?  | 4490       | 1721385319 | 1721385327 | 4170962  | 612      | 13533    | 4490          |
    | 1         | 0xd84e4e04982951c1 | SELECT c FROM sbtest9 WHERE id=?  | 4490       | 1721385319 | 1721385327 | 1840826  | 140      | 11668    | 4490          |
    | 1         | 0x9b090963f41ad781 | SELECT c FROM sbtest10 WHERE id=? | 4470       | 1721385319 | 1721385327 | 1803901  | 146      | 10639    | 4470          |
    | 100       | 0x9b090963f41ad781 | SELECT c FROM sbtest10 WHERE id=? | 4470       | 1721385319 | 1721385327 | 4104662  | 618      | 10639    | 4470          |
    | 100       | 0x1e7b7ac5611f30c2 | SELECT c FROM sbtest6 WHERE id=?  | 4450       | 1721385319 | 1721385327 | 4113451  | 618      | 15026    | 4450          |
    | 1         | 0x1e7b7ac5611f30c2 | SELECT c FROM sbtest6 WHERE id=?  | 4449       | 1721385319 | 1721385327 | 1817809  | 143      | 11785    | 4449          |
    | 100       | 0x99396ec34e1f41d4 | SELECT c FROM sbtest8 WHERE id=?  | 4390       | 1721385319 | 1721385327 | 4105750  | 615      | 16475    | 4390          |
    | 1         | 0x99396ec34e1f41d4 | SELECT c FROM sbtest8 WHERE id=?  | 4390       | 1721385319 | 1721385327 | 1744455  | 155      | 10209    | 4390          |
    | 1         | 0x6dd78c71ff7350ae | SELECT c FROM sbtest7 WHERE id=?  | 4381       | 1721385319 | 1721385327 | 1789731  | 147      | 8038     | 4381          |
    | 100       | 0x6dd78c71ff7350ae | SELECT c FROM sbtest7 WHERE id=?  | 4380       | 1721385319 | 1721385327 | 4110784  | 615      | 8908     | 4380          |
    +-----------+--------------------+-----------------------------------+------------+------------+------------+----------+----------+----------+---------------+
    10 rows in set (0.01 sec)

You can write custom queries against the stats table to provide more information on just how much better or worse the performance is, as well as rewrite queries on the proxy that are being sent to the mirrored host group in order to test for performance improvements. Detailed information on these steps are outside of the scope of this post, but are covered in the ProxySQL documentation.

Considerations

ProxySQL mirroring does not support prepared statements.

Additionally, it’s crucial to monitor the latency introduced by ProxySQL mirroring to make sure it doesn’t negatively impact your production workload. Although ProxySQL is designed to be lightweight, the added network hop and processing could introduce minimal latency. Additionally, if you’re already using ProxySQL in production, enabling the mirroring feature will increase resource utilization.

Let’s examine the performance differences when connecting directly to MySQL running on Amazon EC2 from an EC2 jump host vs. ProxySQL.

The following query statistics illustrates connecting directly from the jump host to MySQL on Amazon EC2:

SQL statistics:
    queries performed:
        read:                            7573034
        write:                           0
        other:                           1081862
        total:                           8654896
    transactions:                        540931 (901.52 per sec.)
    queries:                             8654896 (14424.33 per sec.)
    ignored errors:                      0      (0.00 per sec.)
    reconnects:                          0      (0.00 per sec.)

General statistics:
    total time:                          600.0189s
    total number of events:              540931

Latency (ms):
         min:                                    6.21
         avg:                                   35.49
         max:                                  161.45
         95th percentile:                       46.63
         sum:                             19196339.31

Threads fairness:
    events (avg/stddev):           16904.0938/580.69
    execution time (avg/stddev):   599.8856/0.01

The following query statistics illustrates connecting using ProxySQL, also hosted on Amazon EC2 (with mirroring disabled):

SQL statistics:
queries performed:
read:                            6923070
write:                           0
other:                           989010
total:                           7912080
transactions:                        494505 (824.10 per sec.)
queries:                             7912080 (13185.58 per sec.)
ignored errors:                      0      (0.00 per sec.)
reconnects:                          0      (0.00 per sec.)

General statistics:
total time:                          600.0527s
total number of events:              494505

Latency (ms):
min:                                   10.29
avg:                                   38.82
max:                                  220.05
95th percentile:                       55.82
sum:                             19197185.20

Threads fairness:
events (avg/stddev):           15453.2812/50.61
execution time (avg/stddev):   599.9120/0.02

The results show that after multiple runs, there was minimal performance difference between having mirroring enabled or disabled. However, there was a difference in CPU utilization. Therefore, you need to make sure the ProxySQL instance is sized appropriately so you don’t introduce performance degradation. For brevity, the most recent run is shown in the following examples.

In order to demonstrate the overhead of mirroring being enabled, we are going to show the sysbench performance numbers with mirroring enabled and with mirroring disabled as well as showing the EC2 instance CPU utilization for both configurations.

The following statistics illustrates how the sysbench connection ran from a jump host connected through ProxySQL with mirroring disabled:

SQL statistics:
queries performed:
read:                            6923070
write:                           0
other:                           989010
total:                           7912080
transactions:                        494505 (824.10 per sec.)
queries:                             7912080 (13185.58 per sec.)
ignored errors:                      0      (0.00 per sec.)
reconnects:                          0      (0.00 per sec.)

General statistics:
total time:                          600.0527s
total number of events:              494505

Latency (ms):
min:                                   10.29
avg:                                   38.82
max:                                  220.05
95th percentile:                       55.82
sum:                             19197185.20

Threads fairness:
events (avg/stddev):           15453.2812/50.61
execution time (avg/stddev):   599.9120/0.02

The following statistics illustrates how the sysbench connection ran from the same jump host but connected through ProxySQL with mirroring enabled:

SQL statistics:
queries performed:
read:                            6938372
write:                           0
other:                           991196
total:                           7929568
transactions:                        495598 (825.96 per sec.)
queries:                             7929568 (13215.32 per sec.)
ignored errors:                      0      (0.00 per sec.)
reconnects:                          0      (0.00 per sec.)

General statistics:
total time:                          600.0264s
total number of events:              495598

Latency (ms):
min:                                   11.58
avg:                                   38.74
max:                                  199.12
95th percentile:                       54.83
sum:                             19197444.19

Threads fairness:
events (avg/stddev):           15487.4375/14.97
execution time (avg/stddev):   599.9201/0.01

The following screenshot shows the CPU utilization metric with sysbench running and mirroring disabled (measured on the ProxySQL instance).

Sysbench with mirroring disabled

The following screenshot shows the CPU utilization metric with sysbench running and mirroring enabled (measured on the ProxySQL instance).

Sysbench with mirroring enabled

To avoid issues during testing, keep the following in mind:

  • Data consistency – Before you begin traffic mirroring, verify that the data in your on-premises database and the AWS migration environment are synchronized to achieve accurate testing results.
  • Concurrent testing – ProxySQL’s capability to mirror traffic to multiple targets enables testing different migration strategies or configurations simultaneously, offering valuable insights for decision-making.
  • Traffic dependency – Because this approach relies on live production traffic, the quality and relevance of the testing are directly tied to the nature and volume of the incoming queries. Plan for testing during peak and off-peak periods to get a comprehensive understanding of performance under various conditions.
  • High availability – The configuration shown in this post demonstrates how to set up ProxySQL on Amazon EC2 for the purposes of mirroring traffic. However, in a real production environment, you shouldn’t configure ProxySQL in this manner because you need to make sure it’s set up in such a way as to maintain high availability as well as implementing additional features. The preceding example is the bare minimum setup for mirroring.
    • ProxySQL clustering and other high availability options are outside of the scope of this post. Refer to Decentralized Cluster Nodes for more information.
    • Additionally, if you’re going to host ProxySQL co-located with your application, it will affect the end-to-end performance of queries against the remote machine due to the reduction in network hops. Although this doesn’t pose an issue, it’s something that needs to be carefully considered when benchmarking.
  • Latency – As you will have one database instance hosted on-premises close to the ProxySQL instance there will be a latency impact on the target due to having additional network hops. Whilst this doesn’t affect the running of the tests it is something to keep in mind.

In this blog post we have primarily focused on migrations from on-premesis to RDS/Amazon Aurora MySQL, however it is also possible to use ProxySQL mirroring in order to test a migration from RDS MySQL to Amazon Aurora MySQL or to test performance of different database engine versions such as upgrading from Aurora MySQL 2 to Aurora MySQL 3.

Conclusion

In this post, we showed you how to set up and configure ProxySQL mirroring to mirror your workload from your production database to another instance in order to carry out performance testing.

Traffic mirroring with ProxySQL is a powerful technique for testing database migrations to AWS, providing real-world insights into how applications will perform in a cloud environment. By carefully planning and monitoring the mirroring process, businesses can achieve a smooth transition to AWS, minimizing risks and optimizing performance.

In this series, we showed you two different tools to test database performance across different versions of MySQL on different platforms, and compared the performance of MySQL vs. Aurora MySQL-Compatible. Both tools mentioned have different strengths and weaknesses, and fit different use cases, but allow you to perform performance and regression testing before performing an upgrade or a migration in order to determine whether the new version or platforms is able to meet your application’s needs.

To dive into more details around migrating databases, look to our step by step guides and prescriptive guidance.


About the Authors

Arnab Ghosh is a Sr. Solutions Architect for AWS in North America, helping enterprise customers build resilient and cost-efficient architectures. He has over 15 years of experience in architecting, designing, and developing enterprise applications solving complex business problems.

Patrick Gryczka is a Sr. Solutions Architect specializing in serverless technologies and the sports industry. Beyond the cloud, Patrick fills his time and space with cats, science fiction, Python, and Rust.

Simon Stewart is a Database Specialist Solutions Architect at AWS, focusing on MySQL and Amazon Aurora MySQL. Simon helps AWS customers with their architecture design and provides efficient solutions to run their database workloads on AWS. When not helping customers with their databases on AWS, he enjoys tinkering with his homelab.