Amazon Aurora 既具有高端商用数据库的性能和可用性,又具有开源数据库的简单性和成本效益。它提供了比标准 MySQL 高五倍的吞吐量,并且具有更高的可扩展性、持久性和安全性。 Amazon Aurora使用了计算和存储分离的架构,数据库集群包含一个或多个数据库计算实例以及一个跨多可用区的数据存储层。
图片:Amazon Aurora 集群架构图
Amazon Aurora Parallel Query(并行查询)是 Aurora数据库的一项功能,适用于兼容 MySQL 的 Amazon Aurora。Aurora 最新的 MySQL 5.6 和 MySQL 5.7 兼容版本均支持并行查询。 并行查询充分利用了 Aurora 的架构,将处理向下推送到 Aurora 存储层,将计算分布到数千个节点上。通过将分析查询处理卸载到 Aurora 存储层,并行查询减少了与事务工作负载对网络、CPU 和缓冲池的争用,可以将查询速度提高多达两个数量级,同时保持核心事务工作负载的高吞吐量。
并行查询非常适合 Aurora MySQL 数据库集群里具有包含数百万行的表以及需要数分钟或数小时才能完成的分析查询。本文通过测试了各场景下的查询耗时,并行查询的启用对于OLTP事务性查询影响甚微,而对于OLAP分析性查询则能显著提高速度。例如在实验场景下:在千万行至亿行的数据集里,使用db.r5.2xlarge机型,运行一个多表联合分析查询,禁用并行查询时,耗时为22 分1.33 秒,开启并行查询后,耗时仅为39.74秒。
以下为实验内容,详细说明了试验步骤并展现了并行查询所带来的优化效果。
试验
· 环境准备
试验需要预置Aurora MySQL数据库集群以及MySQL客户端实例。
并行查询现已在由光环新网运营的 AWS 中国(北京)区域以及由西云数据运营的 AWS 中国(宁夏)区域正式推出。本实验使用了AWS中国宁夏区域。
- Aurora MySQL数据库集群
要创建具有并行查询的 Aurora MySQL 集群,可以使用与其他 Aurora MySQL 集群相同的 AWS 管理控制台和 AWS CLI 方法。您可以创建新的集群以使用并行查询,也可以通过从 MySQL 兼容的 Aurora 数据库集群的快照还原,创建一个数据库集群以使用并行查询。
在选择 Aurora MySQL 引擎版本时,建议您选择与 MySQL 5.7 兼容的最新引擎Aurora MySQL 2.09 或更高版本,以及与 MySQL 5.6 兼容的Aurora MySQL 1.23 或更高版本。使用这些版本,使用并行查询的限制最少,这些版本还具有最大的灵活性,可以随时打开或关闭并行查询。
本实验采用了Aurora MySQL 2.09版本。
图片:选择支持并行查询功能的版本
集群中的数据库实例必须使用 db.r* 实例类才可支持并行查询。
本实验针对大小两种数据集,选用了大小两种机型。针对10G级数据集,选用了db.r5.xlarge实例类型。针对100G级数据集,选用了db.r5.2xlarge实例类型。
图片:选择支持并行查询功能的实例类型
更多详情请参考Aurora用户指南中的创建使用并行查询的数据库集群。
- MySQL客户端实例
根据Aurora MySQL集群连接配置的选项,在可访问数据库集群的网络环境内创建一台EC2实例。由于需要生成预加载测量数据,注意保证实例配置的存储空间充足。
更多详情请参考启用实例以及连接到 Amazon Aurora 数据库集群。
· 数据准备
本试验使用了TPC-H 数据集。您可以从 TPC-H 网站获取生成示例数据的表定义、查询以及 dbgen 程序。请参考以下步骤将数据集导入到Aurora MySQL集群。
1. 下载TPC-H工具
打开TPC Download Current页面,找到TPC-H项,当前最新版本为2.18.0,点击下载。
图片:下载TPC-H工具
将下载的工具压缩包拷贝到MySQL客户端实例,解压。
2. 生成测试数据
进入子目录dbgen,编辑makefile文件:
[ec2-user@ip-172-31-42-211 dbgen]$ cp makefile.suite makefile
[ec2-user@ip-172-31-42-211 dbgen]$ vi makefile
更新makefile文件中的编译器cc为gcc,数据库引擎DATABASE为MYSQL,操作系统MACHINE为LINUX,负载WORKLOAD为TPCH:
################
## CHANGE NAME OF ANSI COMPILER HERE
################
CC = gcc
# Current values for DATABASE are: INFORMIX, DB2, TDAT (Teradata)
# SQLSERVER, SYBASE, ORACLE, VECTORWISE
# Current values for MACHINE are: ATT, DOS, HP, IBM, ICL, MVS,
# SGI, SUN, U2200, VMS, LINUX, WIN32
# Current values for WORKLOAD are: TPCH
DATABASE=MYSQL
MACHINE = LINUX
WORKLOAD = TPCH
#
由于makefile中的默认数据库选项没有MYSQL选项,需要在tpch.h文件中手动添加依赖。
打开tpcd.h,在文件中添加代码片段:
#ifdef MYSQL
#define GEN_QUERY_PLAN ""
#define START_TRAN "START TRANSACTION"
#define END_TRAN "COMMIT"
#define SET_OUTPUT ""
#define SET_ROWCOUNT "limit %d;\n"
#define SET_DBASE "use %s;\n"
#endif
执行编译命令make,生成dbgen数据生成工具:
[ec2-user@ip-172-31-42-211 dbgen]$ make
使用dbgen工具生成数据,在参数中可指定生成数据的大小。运行结束后,产生了8个tbl文件,对应到8个表。
本实验测试了大小两种数据集,分别分成了10G的数据文件以及100G的数据文件。
小数据集指定了10的参数,生成数据文件如下:
[ec2-user@ip-172-31-42-211 dbgen]$ ./dbgen -s 100
[ec2-user@ip-172-31-42-211 dbgen]$ ls *.tbl -lh
-rw-rw-r-- 1 ec2-user ec2-user 234M customer.tbl
-rw-rw-r-- 1 ec2-user ec2-user 7.3G lineitem.tbl
-rw-rw-r-- 1 ec2-user ec2-user 2.2K nation.tbl
-rw-rw-r-- 1 ec2-user ec2-user 1.7G orders.tbl
-rw-rw-r-- 1 ec2-user ec2-user 233M part.tbl
-rw-rw-r-- 1 ec2-user ec2-user 1.2G partsupp.tbl
-rw-rw-r-- 1 ec2-user ec2-user 389 region.tbl
-rw-rw-r-- 1 ec2-user ec2-user 14M supplier.tbl
大数据集指定了100的参数,生成数据文件如下:
[ec2-user@ip-172-31-42-211 dbgen]$ ./dbgen -s 100
[ec2-user@ip-172-31-42-211 dbgen]$ ls *.tbl -lh
-rw-rw-r-- 1 ec2-user ec2-user 2.3G customer.tbl
-rw-rw-r-- 1 ec2-user ec2-user 75G lineitem.tbl
-rw-rw-r-- 1 ec2-user ec2-user 2.2K nation.tbl
-rw-rw-r-- 1 ec2-user ec2-user 17G orders.tbl
-rw-rw-r-- 1 ec2-user ec2-user 2.3G part.tbl
-rw-rw-r-- 1 ec2-user ec2-user 12G partsupp.tbl
-rw-rw-r-- 1 ec2-user ec2-user 389 region.tbl
-rw-rw-r-- 1 ec2-user ec2-user 137M supplier.tbl
3. 导入测试数据
tpch工具包里提供了两个脚本:dss.ddl 数据库及表的初始化脚本,dss.ri 数据表的主键索引及外键脚本。我们将先执行dss.ddl脚本创建数据库及表,再执行dss.ri创建对应表的主键索引及外键关联,最后导入数据。
由于脚本不适用于MySQL,需要做相应调整。
更新dss.ddl头部:
create database tpch;
use tpch;
连接MySQL:
mysql --host=database-test-instance-1.xxx.rds.cn-northwest-1.amazonaws.com.cn --user=admin --password=xxx
运行dss.ddl脚本,注意替换对应dss.ri文件路径:
MySQL [(none)]> \. /home/ec2-user/2.18.0_rc2/dbgen/dss.ddl
查看已建成功的tpch数据库:
MySQL [tpch]> show databases;
+--------------------+
| Database |
+--------------------+
| information_schema |
| mysql |
| performance_schema |
| sys |
| tpch |
+--------------------+
查看已建成功的数据库表:
MySQL [tpch]> use tpch;
Database changed
MySQL [tpch]> show tables;
+----------------+
| Tables_in_tpch |
+----------------+
| CUSTOMER |
| LINEITEM |
| NATION |
| ORDERS |
| PART |
| PARTSUPP |
| REGION |
| SUPPLIER |
+----------------+
更新dss.ri,包括修改连接方式,将CONNECT TO TPCD更新为use tpch;替换数据库名TPCD.为tpch.;增加外键关联列;将大写表名改为小写表名,以适配查询语句;更新完的dss.ri完整内容为:
use tpch;
-- For table REGION
ALTER TABLE tpch.REGION
ADD PRIMARY KEY (R_REGIONKEY);
-- For table NATION
ALTER TABLE tpch.NATION
ADD PRIMARY KEY (N_NATIONKEY);
ALTER TABLE tpch.NATION
ADD FOREIGN KEY NATION_FK1 (N_REGIONKEY) references tpch.REGION(R_REGIONKEY);
COMMIT WORK;
-- For table PART
ALTER TABLE tpch.PART
ADD PRIMARY KEY (P_PARTKEY);
COMMIT WORK;
-- For table SUPPLIER
ALTER TABLE tpch.SUPPLIER
ADD PRIMARY KEY (S_SUPPKEY);
ALTER TABLE tpch.SUPPLIER
ADD FOREIGN KEY SUPPLIER_FK1 (S_NATIONKEY) references tpch.NATION(N_NATIONKEY);
COMMIT WORK;
-- For table PARTSUPP
ALTER TABLE tpch.PARTSUPP
ADD PRIMARY KEY (PS_PARTKEY,PS_SUPPKEY);
COMMIT WORK;
-- For table CUSTOMER
ALTER TABLE tpch.CUSTOMER
ADD PRIMARY KEY (C_CUSTKEY);
ALTER TABLE tpch.CUSTOMER
ADD FOREIGN KEY CUSTOMER_FK1 (C_NATIONKEY) references tpch.NATION(N_NATIONKEY);
COMMIT WORK;
-- For table LINEITEM
ALTER TABLE tpch.LINEITEM
ADD PRIMARY KEY (L_ORDERKEY,L_LINENUMBER);
COMMIT WORK;
-- For table ORDERS
ALTER TABLE tpch.ORDERS
ADD PRIMARY KEY (O_ORDERKEY);
COMMIT WORK;
-- For table PARTSUPP
ALTER TABLE tpch.PARTSUPP
ADD FOREIGN KEY PARTSUPP_FK1 (PS_SUPPKEY) references tpch.SUPPLIER(S_SUPPKEY);
COMMIT WORK;
ALTER TABLE tpch.PARTSUPP
ADD FOREIGN KEY PARTSUPP_FK2 (PS_PARTKEY) references tpch.PART(P_PARTKEY);
COMMIT WORK;
-- For table ORDERS
ALTER TABLE tpch.ORDERS
ADD FOREIGN KEY ORDERS_FK1 (O_CUSTKEY) references tpch.CUSTOMER(C_CUSTKEY);
COMMIT WORK;
-- For table LINEITEM
ALTER TABLE tpch.LINEITEM
ADD FOREIGN KEY LINEITEM_FK1 (L_ORDERKEY) references tpch.ORDERS(O_ORDERKEY);
COMMIT WORK;
ALTER TABLE tpch.LINEITEM
ADD FOREIGN KEY LINEITEM_FK2 (L_PARTKEY,L_SUPPKEY) references
tpch.PARTSUPP(PS_PARTKEY,PS_SUPPKEY);
COMMIT WORK;
alter table CUSTOMER rename to customer ;
alter table LINEITEM rename to lineitem ;
alter table NATION rename to nation ;
alter table ORDERS rename to orders ;
alter table PART rename to part ;
alter table PARTSUPP rename to partsupp ;
alter table REGION rename to region ;
alter table SUPPLIER rename to supplier ;
运行dss.ri脚本建立数据库表主键及外键,注意替换对应dss.ri文件路径:
MySQL [(none)]> \. /home/ec2-user/2.18.0_rc2/dbgen/dss.ri
按顺序执行脚本将tbl文件导入数据库相应表,注意替换对应tbl文件路径:
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/region.tbl' into table region fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/nation.tbl' into table nation fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/part.tbl' into table part fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/supplier.tbl' into table supplier fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/partsupp.tbl' into table partsupp fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/customer.tbl' into table customer fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/orders.tbl' into table orders fields terminated by '|' lines terminated by '|\n';
load data local infile '/home/ec2-user/2.18.0_rc2/dbgen/lineitem.tbl' into table lineitem fields terminated by '|' lines terminated by '|\n';
导入完成后,可查看已导入的数据。测试查询中使用的表包括:customer, lineitem, orders,请观察各表的导入状况。
以下为10G级的数据表导入状况,数据表有百万行至千万行级数据:
MySQL [tpch]> show table status;
+----------+--------+---------+------------+----------+----------------+-------------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length |
+----------+--------+---------+------------+----------+----------------+-------------+
| customer | InnoDB | 10 | Dynamic | 1480453 | 194 | 288112640 |
| lineitem | InnoDB | 10 | Dynamic | 55000836 | 148 | 8158969856 |
| nation | InnoDB | 10 | Dynamic | 25 | 655 | 16384 |
| orders | InnoDB | 10 | Dynamic | 14213703 | 131 | 1873805312 |
| part | InnoDB | 10 | Dynamic | 1912304 | 162 | 310149120 |
| partsupp | InnoDB | 10 | Dynamic | 7467820 | 268 | 2005925888 |
| region | InnoDB | 10 | Dynamic | 5 | 3276 | 16384 |
| supplier | InnoDB | 10 | Dynamic | 98503 | 186 | 18366464 |
+----------+--------+---------+------------+----------+----------------+-------------+
以下为100G级的数据表导入状况,数据表有千万行至亿行级数据:
+----------+--------+---------+------------+-----------+----------------+-------------+
| Name | Engine | Version | Row_format | Rows | Avg_row_length | Data_length |
+----------+--------+---------+------------+-----------+----------------+-------------+
| customer | InnoDB | 10 | Dynamic | 13528638 | 194 | 2631925760 |
| lineitem | InnoDB | 10 | Dynamic | 106592268 | 137 | 14687404032 |
| nation | InnoDB | 10 | Dynamic | 25 | 655 | 16384 |
| orders | InnoDB | 10 | Dynamic | 141958614 | 131 | 18708692992 |
| part | InnoDB | 10 | Dynamic | 18240054 | 162 | 2970615808 |
| partsupp | InnoDB | 10 | Dynamic | 79991226 | 252 | 20198719488 |
| region | InnoDB | 10 | Dynamic | 5 | 3276 | 16384 |
| supplier | InnoDB | 10 | Dynamic | 988185 | 178 | 176865280 |
+----------+--------+---------+------------+-----------+----------------+-------------+
· 查询测试
在 Aurora MySQL 1.23 或 2.09 及更高版本中,默认情况下,并行查询和哈希联接设置都处于关闭状态。并行查询的启用或禁用可通过修改数据库的参数aurora_parallel_query。默认参数组为只读,修改参数组请参考使用数据库参数组和数据库集群参数组。
检查集群并行查询启动状态可用以下命令:
mysql> select @@aurora_parallel_query;
+-------------------------+
| @@aurora_parallel_query |
+-------------------------+
| 1 |
+-------------------------+
默认情况下,即使启用了并行查询,Aurora查询优化器会根据查询自动决定是否使用并行查询,可在会话级别使用命令强制开启,覆盖查询优化器的自动选择:
mysql> set aurora_pq_force = 1;
本实验分别针对单表事务查询,单表分析查询,以及多表分析查询,使用了不同查询脚本进行了测试。
查询脚本里涉及到的表包括:orders, customer, lineitem.
orders表结构如下:
MySQL [tpch]> describe orders;
+-----------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| O_ORDERKEY | int(11) | NO | PRI | NULL | |
| O_CUSTKEY | int(11) | NO | MUL | NULL | |
| O_ORDERSTATUS | char(1) | NO | | NULL | |
| O_TOTALPRICE | decimal(15,2) | NO | | NULL | |
| O_ORDERDATE | date | NO | | NULL | |
| O_ORDERPRIORITY | char(15) | NO | | NULL | |
| O_CLERK | char(15) | NO | | NULL | |
| O_SHIPPRIORITY | int(11) | NO | | NULL | |
| O_COMMENT | varchar(79) | NO | | NULL | |
+-----------------+---------------+------+-----+---------+-------+
customer表结构如下:
MySQL [tpch]> describe customer;
+--------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+--------------+---------------+------+-----+---------+-------+
| C_CUSTKEY | int(11) | NO | PRI | NULL | |
| C_NAME | varchar(25) | NO | | NULL | |
| C_ADDRESS | varchar(40) | NO | | NULL | |
| C_NATIONKEY | int(11) | NO | MUL | NULL | |
| C_PHONE | char(15) | NO | | NULL | |
| C_ACCTBAL | decimal(15,2) | NO | | NULL | |
| C_MKTSEGMENT | char(10) | NO | | NULL | |
| C_COMMENT | varchar(117) | NO | | NULL | |
+--------------+---------------+------+-----+---------+-------+
lineitem表结构如下:
MySQL [tpch]> describe lineitem;
+-----------------+---------------+------+-----+---------+-------+
| Field | Type | Null | Key | Default | Extra |
+-----------------+---------------+------+-----+---------+-------+
| L_ORDERKEY | int(11) | NO | PRI | NULL | |
| L_PARTKEY | int(11) | NO | MUL | NULL | |
| L_SUPPKEY | int(11) | NO | | NULL | |
| L_LINENUMBER | int(11) | NO | PRI | NULL | |
| L_QUANTITY | decimal(15,2) | NO | | NULL | |
| L_EXTENDEDPRICE | decimal(15,2) | NO | | NULL | |
| L_DISCOUNT | decimal(15,2) | NO | | NULL | |
| L_TAX | decimal(15,2) | NO | | NULL | |
| L_RETURNFLAG | char(1) | NO | | NULL | |
| L_LINESTATUS | char(1) | NO | | NULL | |
| L_SHIPDATE | date | NO | | NULL | |
| L_COMMITDATE | date | NO | | NULL | |
| L_RECEIPTDATE | date | NO | | NULL | |
| L_SHIPINSTRUCT | char(25) | NO | | NULL | |
| L_SHIPMODE | char(10) | NO | | NULL | |
| L_COMMENT | varchar(44) | NO | | NULL | |
+-----------------+---------------+------+-----+---------+-------+
单表事务查询脚本如下,针对orders表的不同列的指定了条件进行查询:
SELECT *
FROM orders
WHERE o_custkey = 3689999
AND o_orderdate > date '1995-03-14'
AND o_orderstatus = 'O'
LIMIT 15;
单表分析查询脚本如下,针对orders表的不同列的指定了条件,并使用统计函数avg进行分析查询:
SELECT avg(o_totalprice)
FROM orders
WHERE o_orderdate > date '1995-03-14'
AND o_orderstatus = 'O'
AND o_orderpriority not in ('1-URGENT', '2-HIGH');
多表分析查询脚本如下,关联了多表customer, orders, lineitem,并指定条件进行分析查询:
SELECT
l_orderkey,
SUM(l_extendedprice * (1-l_discount)) AS revenue,
o_orderdate,
o_shippriority
FROM customer, orders, lineitem
WHERE
c_mktsegment='AUTOMOBILE'
AND c_custkey = o_custkey
AND l_orderkey = o_orderkey
AND o_orderdate < date '1995-03-14'
AND l_shipdate > date '1995-03-14'
GROUP BY
l_orderkey,
o_orderdate,
o_shippriority
ORDER BY
revenue DESC,
o_orderdate LIMIT 15;
在多表联合查询场景下,Aurora MySQL在需要使用 equijoin 联接大量数据时,哈希联接可以提高查询性能。为并行查询集群启用哈希联接,可设置集群配置参数 aurora_disable_hash_join=OFF,启用与并行查询结合使用的哈希联接优化。
检查集群哈希联接优化启动状态可用以下命令:
mysql> select @@aurora_disable_hash_join;
+----------------------------+
| @@aurora_disable_hash_join |
+----------------------------+
| 0 |
+----------------------------+
Aurora查询优化器会根据查询自动决定是否使用哈希联接,可在会话级别使用命令强制开启,覆盖查询优化器的自动选择:
mysql> SET optimizer_switch='hash_join=on';
mysql> SET optimizer_switch='hash_join_cost_based=off';
本实验分别在导入了大小两种数据集的两个数据库集群上(分别为:导入了10G测试数据集的db.r5.xlarge数据库集群,导入了100G测试数据集的db.r5.2xlarge的数据库集群),参照以下步骤进行了测试:
- 禁用并行查询,检查确认并行查询的禁用状态
- 运行单表事务查询脚本
- 分析单表事务查询执行计划
- 运行单表分析查询脚本
- 分析单表分析查询执行计划
- 运行多表分析查询脚本
- 分析多表分析查询执行计划
- 启用并行查询,启用哈希连接,检查确认并行查询及哈希连接的启用状态
- 运行单表事务查询脚本
- 分析单表事务查询执行计划
- 运行单表分析查询脚本
- 分析单表分析查询执行计划
- 运行多表分析查询脚本
- 分析多表分析查询执行计划
通过分析查询计划,可看到传统查询计划与并行查询计划之间的区别。在启用并行查询后,查询中的步骤可以使用并行查询优化,如 EXPLAIN 输出中的 Extra 列所示。这些步骤的 I/O 密集型和 CPU 密集型处理将向下推送到存储层。
以下为100G测试数据集的查询结果及执行计划分析:
单表事务查询(禁用并行查询)的结果:
MySQL [tpch]> SELECT *
-> FROM orders
-> WHERE o_custkey = 3689999
-> AND o_orderdate > date '1995-03-14'
-> AND o_orderstatus = 'O'
-> LIMIT 15;
+------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+-----------------------------------------------------------------+
| O_ORDERKEY | O_CUSTKEY | O_ORDERSTATUS | O_TOTALPRICE | O_ORDERDATE | O_ORDERPRIORITY | O_CLERK | O_SHIPPRIORITY | O_COMMENT |
+------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+-----------------------------------------------------------------+
| 1 | 3689999 | O | 224560.83 | 1996-01-02 | 5-LOW | Clerk#000095055 | 0 | nstructions sleep furiously among |
| 37007300 | 3689999 | O | 189889.17 | 1998-05-12 | 5-LOW | Clerk#000097117 | 0 | ully. carefully busy accoun |
| 110398694 | 3689999 | O | 40190.10 | 1998-06-08 | 3-MEDIUM | Clerk#000071343 | 0 | ideas? quickly thin accounts wake slyly. blithely |
| 166279651 | 3689999 | O | 8270.88 | 1997-10-23 | 1-URGENT | Clerk#000039384 | 0 | ic, final accounts sleep. blithely pending requests nag slyly u |
| 276450979 | 3689999 | O | 43595.81 | 1997-04-26 | 3-MEDIUM | Clerk#000010520 | 0 | quickly alongside of the furiously expr |
| 404928295 | 3689999 | O | 20719.85 | 1996-11-05 | 2-HIGH | Clerk#000027012 | 0 | y regular platelets |
+------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+-----------------------------------------------------------------+
6 rows in set (0.00 sec)
单表事务查询(禁用并行查询)的执行计划分析:
+----+-------------+--------+...+-------------+
| id | select_type | table |...| Extra |
+----+-------------+--------+...+-------------+
| 1 | SIMPLE | orders |...| Using where |
+----+-------------+--------+...+-------------+
单表事务查询(启用并行查询)的结果:
MySQL [tpch]> SELECT *
-> FROM orders
-> WHERE o_custkey = 3689999
-> AND o_orderdate > date '1995-03-14'
-> AND o_orderstatus = 'O'
-> LIMIT 15;
+------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+-----------------------------------------------------------------+
| O_ORDERKEY | O_CUSTKEY | O_ORDERSTATUS | O_TOTALPRICE | O_ORDERDATE | O_ORDERPRIORITY | O_CLERK | O_SHIPPRIORITY | O_COMMENT |
+------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+-----------------------------------------------------------------+
| 1 | 3689999 | O | 224560.83 | 1996-01-02 | 5-LOW | Clerk#000095055 | 0 | nstructions sleep furiously among |
| 37007300 | 3689999 | O | 189889.17 | 1998-05-12 | 5-LOW | Clerk#000097117 | 0 | ully. carefully busy accoun |
| 110398694 | 3689999 | O | 40190.10 | 1998-06-08 | 3-MEDIUM | Clerk#000071343 | 0 | ideas? quickly thin accounts wake slyly. blithely |
| 166279651 | 3689999 | O | 8270.88 | 1997-10-23 | 1-URGENT | Clerk#000039384 | 0 | ic, final accounts sleep. blithely pending requests nag slyly u |
| 276450979 | 3689999 | O | 43595.81 | 1997-04-26 | 3-MEDIUM | Clerk#000010520 | 0 | quickly alongside of the furiously expr |
| 404928295 | 3689999 | O | 20719.85 | 1996-11-05 | 2-HIGH | Clerk#000027012 | 0 | y regular platelets |
+------------+-----------+---------------+--------------+-------------+-----------------+-----------------+----------------+-----------------------------------------------------------------+
6 rows in set (0.00 sec)
单表事务查询(启用并行查询)的执行计划分析:
+----+-------------+--------+...+-------------+
| id | select_type | table |...| Extra |
+----+-------------+--------+...+-------------+
| 1 | SIMPLE | orders |...| Using where |
+----+-------------+--------+...+-------------+
单表分析查询(禁用并行查询)的结果:
MySQL [tpch]> SELECT avg(o_totalprice)
-> FROM orders
-> WHERE o_orderdate > date '1995-03-14'
-> AND o_orderstatus = 'O'
-> AND o_orderpriority not in ('1-URGENT', '2-HIGH');
+-------------------+
| avg(o_totalprice) |
+-------------------+
| 150271.119856 |
+-------------------+
1 row in set (5 min 39.08 sec)
单表分析查询(禁用并行查询)的执行计划分析:
+----+-------------+--------+...+-------------+
| id | select_type | table |...| Extra |
+----+-------------+--------+...+-------------+
| 1 | SIMPLE | orders |...| Using where |
+----+-------------+--------+...+-------------+
单表分析查询(启用并行查询)的结果:
MySQL [tpch]> SELECT avg(o_totalprice)
-> FROM orders
-> WHERE o_orderdate > date '1995-03-14'
-> AND o_orderstatus = 'O'
-> AND o_orderpriority not in ('1-URGENT', '2-HIGH');
+-------------------+
| avg(o_totalprice) |
+-------------------+
| 150271.119856 |
+-------------------+
1 row in set (20.24 sec)
单表分析查询(启用并行查询)的执行计划分析:
+----+-------------+--------+...+----------------------------------------------------------------------------+
| id | select_type | table |...| Extra |
+----+-------------+--------+...+----------------------------------------------------------------------------+
| 1 | SIMPLE | orders |...| Using where; Using parallel query (5 columns, 2 filters, 1 exprs; 0 extra) |
+----+-------------+--------+...+----------------------------------------------------------------------------+
多表分析查询(禁用并行查询)的结果:
MySQL [tpch]> SELECT
-> l_orderkey,
-> SUM(l_extendedprice * (1-l_discount)) AS revenue,
-> o_orderdate,
-> o_shippriority
-> FROM customer, orders, lineitem
-> WHERE
-> c_mktsegment='AUTOMOBILE'
-> AND c_custkey = o_custkey
-> AND l_orderkey = o_orderkey
-> AND o_orderdate < date '1995-03-14'
-> AND l_shipdate > date '1995-03-14'
-> GROUP BY
-> l_orderkey,
-> o_orderdate,
-> o_shippriority
-> ORDER BY
-> revenue DESC,
-> o_orderdate LIMIT 15;
+------------+-------------+-------------+----------------+
| l_orderkey | revenue | o_orderdate | o_shippriority |
+------------+-------------+-------------+----------------+
| 81011334 | 455300.0146 | 1995-03-07 | 0 |
| 28840519 | 454748.2485 | 1995-03-08 | 0 |
| 16384100 | 450935.1906 | 1995-03-02 | 0 |
| 72587110 | 443895.1245 | 1995-03-01 | 0 |
| 11982337 | 433364.5961 | 1995-02-15 | 0 |
| 34736612 | 428316.3377 | 1995-02-19 | 0 |
| 62597284 | 425985.1162 | 1995-03-04 | 0 |
| 59481859 | 421696.5251 | 1995-03-12 | 0 |
| 76740996 | 421355.8745 | 1995-02-25 | 0 |
| 20601378 | 419369.0300 | 1995-03-13 | 0 |
| 23482308 | 418992.5933 | 1995-02-14 | 0 |
| 3400066 | 418830.9286 | 1995-03-06 | 0 |
| 53367108 | 413322.3462 | 1995-03-06 | 0 |
| 44846022 | 412002.8474 | 1995-03-06 | 0 |
| 41160167 | 409386.8393 | 1995-03-09 | 0 |
+------------+-------------+-------------+----------------+
15 rows in set (22 min 1.33 sec)
多表分析查询(禁用并行查询)的执行计划分析:
+----+-------------+----------+...+----------------------------------------------+
| id | select_type | table |...| Extra |
+----+-------------+----------+...+----------------------------------------------+
| 1 | SIMPLE | customer |...| Using where; Using temporary; Using filesort |
| 1 | SIMPLE | orders |...| Using where |
| 1 | SIMPLE | lineitem |...| Using where |
+----+-------------+----------+...+----------------------------------------------+
多表分析查询(启用并行查询)的结果:
MySQL [tpch]> SELECT
-> l_orderkey,
-> SUM(l_extendedprice * (1-l_discount)) AS revenue,
-> o_orderdate,
-> o_shippriority
-> FROM customer, orders, lineitem
-> WHERE
-> c_mktsegment='AUTOMOBILE'
-> AND c_custkey = o_custkey
-> AND l_orderkey = o_orderkey
-> AND o_orderdate < date '1995-03-14'
-> AND l_shipdate > date '1995-03-14'
-> GROUP BY
-> l_orderkey,
-> o_orderdate,
-> o_shippriority
-> ORDER BY
-> revenue DESC,
-> o_orderdate LIMIT 15;
+------------+-------------+-------------+----------------+
| l_orderkey | revenue | o_orderdate | o_shippriority |
+------------+-------------+-------------+----------------+
| 81011334 | 455300.0146 | 1995-03-07 | 0 |
| 28840519 | 454748.2485 | 1995-03-08 | 0 |
| 16384100 | 450935.1906 | 1995-03-02 | 0 |
| 72587110 | 443895.1245 | 1995-03-01 | 0 |
| 11982337 | 433364.5961 | 1995-02-15 | 0 |
| 34736612 | 428316.3377 | 1995-02-19 | 0 |
| 62597284 | 425985.1162 | 1995-03-04 | 0 |
| 59481859 | 421696.5251 | 1995-03-12 | 0 |
| 76740996 | 421355.8745 | 1995-02-25 | 0 |
| 20601378 | 419369.0300 | 1995-03-13 | 0 |
| 23482308 | 418992.5933 | 1995-02-14 | 0 |
| 3400066 | 418830.9286 | 1995-03-06 | 0 |
| 53367108 | 413322.3462 | 1995-03-06 | 0 |
| 44846022 | 412002.8474 | 1995-03-06 | 0 |
| 41160167 | 409386.8393 | 1995-03-09 | 0 |
+------------+-------------+-------------+----------------+
15 rows in set (39.74 sec)
多表分析查询(启用并行查询)的执行计划分析:
+----+-------------+----------+...+--------------------------------------------------------------------------------------------------------------------------------+
| id | select_type | table |...| Extra |
+----+-------------+----------+...+--------------------------------------------------------------------------------------------------------------------------------+
| 1 | SIMPLE | customer |...| Using where; Using temporary; Using filesort; Using parallel query (2 columns, 1 filters, 0 exprs; 0 extra) |
| 1 | SIMPLE | orders |...| Using where; Using join buffer (Hash Join Outer table orders); Using parallel query (4 columns, 1 filters, 1 exprs; 0 extra) |
| 1 | SIMPLE | lineitem |...| Using where; Using join buffer (Hash Join Outer table lineitem); Using parallel query (5 columns, 1 filters, 1 exprs; 0 extra) |
+----+-------------+----------+...+--------------------------------------------------------------------------------------------------------------------------------+
除了监控 Amazon Aurora 数据库集群指标中所述的 Amazon CloudWatch 指标以外,Aurora 还提供了其他全局状态变量。可以使用这些全局状态变量来帮助监视并行查询执行情况。它们可以让您深入了解为什么优化程序在给定情况下可能使用或不使用并行查询。完整变量列表,请参见监控并行查询。
运行以下命令来查看并行查询相关的状态:
MySQL [tpch]> SHOW GLOBAL STATUS LIKE 'Aurora_pq%';
+--------------------------------------------------------------+-------+
| Variable_name | Value |
+--------------------------------------------------------------+-------+
| Aurora_pq_max_concurrent_requests | 4 |
| Aurora_pq_request_attempted | 1 |
| Aurora_pq_request_attempted_grouping_aggr | 0 |
| Aurora_pq_request_attempted_partition_table | 0 |
| Aurora_pq_request_by_force_config | 1 |
| Aurora_pq_request_by_global_config | 0 |
| Aurora_pq_request_by_hint | 0 |
| Aurora_pq_request_by_session_config | 0 |
| Aurora_pq_request_executed | 1 |
总结
本文通过测试工作负载,展示了各场景下的查询耗时。并行查询的启用对于OLTP事务性查询影响甚微,而对于OLAP分析性查询则显著提高了查询速度。
数据库集群 |
百万行至千万行数据集 |
千万行至亿行数据集 |
查询语句 |
禁用并行查询 |
开启并行查询 |
禁用并行查询 |
开启并行查询 |
单表事务查询 |
0.0003 sec |
0.0003 sec |
0.0005 sec |
0.0005 sec |
单表分析查询 |
4.71 sec |
2.44 sec |
5 min 39.08 sec |
20.24 sec |
多表分析查询 |
18.87 sec |
7.35 sec |
22 min 1.33 sec |
39.74 sec |
说明:单表事务查询为OLTP类查询语句,使用了外键,查询速度很快;单表分析查询和多表分析查询为OLAP类查询语句,需要做全表扫描及关联查询,查询耗时较长。
使用并行查询,您可以对 Aurora MySQL 表运行数据密集型分析查询。在很多情况下,与传统的查询处理分工相比,性能提高了一个数量级,同时还能保持核心事务工作负载的高吞吐量。
只需在 Amazon RDS 管理控制台中单击几次或者下载最新的 AWS 开发工具包或 CLI 即可在新的或现有的 Aurora 集群中启用并行查询,请阅读 Aurora 文档了解更多信息。
您可以将并行查询与全球数据库等其他 Aurora MySQL 5.7 功能结合使用。此外,此功能对 MySQL 5.7 和 MySQL 5.6 数据库的适用性已扩展到包括中国北京和宁夏的20几个AWS 区域。有关提供并行查询的区域完整列表,请参阅 Aurora 定价。
本篇作者