一个leftjoinSQL简单优化分析
有个关联查询的sql,需要2秒多,于是进行查看一番:
创新互联专注为客户提供全方位的互联网综合服务,包含不限于成都做网站、网站设计、香洲网络推广、成都微信小程序、香洲网络营销、香洲企业策划、香洲品牌公关、搜索引擎seo、人物专访、企业宣传片、企业代运营等,从售前售中售后,我们都将竭诚为您服务,您的肯定,是我们最大的嘉奖;创新互联为所有大学生创业者提供香洲建站搭建服务,24小时服务热线:18982081108,官方网址:www.cdcxhl.com
SELECT a.id, a.brand_id, a.series_id, a.product_id, a.material_id, a.custom_category_id, a.price, a.product_url, a.organ_id, ..... FROM pm_brand_xxxx a LEFT JOIN pm_brand_yyyyy d ON a.series_id = d.id WHERE a.is_delete = 0 AND d.is_delete = 0 AND a.organ_id = 'Cxxx' AND a.brand_id = 6491603 AND d.brand_id = 6491603 AND a.model_flag = 14;
MySQL> show profile for query 4; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000072 | | checking permissions | 0.000002 | | checking permissions | 0.000002 | | Opening tables | 0.000011 | | init | 0.000026 | | System lock | 0.000007 | | optimizing | 0.000016 | | statistics | 0.000142 | | preparing | 0.000018 | | executing | 0.000002 | | Sending data | 2.281192 |<<<<<<<执行的主要时间消耗 | end | 0.000007 | | query end | 0.000011 | | closing tables | 0.000011 | | freeing items | 0.000030 | | logging slow query | 0.000003 | | logging slow query | 0.000102 | | cleaning up | 0.000022 | +----------------------+----------+
+----+-------------+-------+------------+-------------+---------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+---------+-------+-------+----------+------------------------------------------------------------------------------------------------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------------+---------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+---------+-------+-------+----------+------------------------------------------------------------------------------------------------------------------------------------------------+ | 1 | SIMPLE | d | NULL | ref | PRIMARY,idx_pm_yyyy_bid | idx_pm_yyyyy_bid | 9 | const | 1 | 10.00 | Using where | | 1 | SIMPLE | a | NULL | index_merge | idx_pm_xxxx_sid,idx_pm_xxx_bid,idx_pm_brand_xxxx_organ | idx_pm_xxx_organ,idx_pm_brand_xxxx_bid | 99,9 | NULL | 11314 | 0.04 | Using intersect(idx_pm_xxxxx_organ,idx_pm_xxxx_bid); Using where; Using join buffer (Block Nested Loop) | +----+-------------+-------+------------+-------------+---------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+---------+-------+-------+----------+------------------------------------------------------------------------------------------------------------------------------------------------+ 2 rows in set, 1 warning (0.00 sec)
从执行计划来看,d表是做了驱动表,a做了被驱动表
d表 type = ref ,使用非唯一性索引或者唯一索引的前缀扫描,返回匹配某个单独值的记录行,这里使用了索引idx_pm_yyyyy_bid,该索引正是brand_id上的索引,
即是说,在和a表的关联中d先通过brand_id来查找记录行,再通过相应记录的id去和a表的series_id做匹配。
我查看相应的记录数,发现a表145万的大表,d表是4075的小表。
a表
mysql> select count(*) from pm_xxxxxx;
+----------+
| count(*) |
+----------+
| 1459777 |
+----------+
1 row in set (0.27 sec)
d表:
mysql> select count(*) from pm_yyyyyy;
+----------+
| count(*) |
+----------+
| 4075 |
+----------+
1 row in set (0.00 sec)
而 a表是type=index_merge 索引合并,这里走了idx_pm_xxx_organ(organ_id),idx_pm_brand_xxxx_bid(brand_id) ,extra 是
Using intersect(idx_pm_xxxxx_organ,idx_pm_xxxx_bid); Using where; Using join buffer (Block Nested Loop)
Using intersect正说明了这里使用了(idx_pm_xxxxx_organ,idx_pm_xxxx_bid)的交集
Using where 是用model_flag等这些其他条件的过滤
Using join buffer (Block Nested Loop) 说明使用BNL的算法进行匹配
BNL 算法是将外层循环的行/结果集(驱动表)存入join buffer, 内层循环的每一行与整个buffer中的记录做比较,从而减少内层循环的次数.
举例来说,外层循环的结果集是100行,使用NLJ 算法需要扫描内部表100次,如果使用BNL算法,先把对Outer Loop表(外部表)每次读取的10行记录放到join buffer,然后在InnerLoop表(内部表)中直接匹配这10行数据,内存循环就可以一次与这10行进行比较, 这样只需要比较10次,对内部表的扫描减少了9/10。所以BNL算法就能够显著减少内层循环表扫描的次数.
在这里就是d表中取得结果集分批放入buffer中与a表进行匹配。
而这个语句无论如何都要2秒中,也在我们的认识中小表驱动大表并没错,我的猜想应该就是在进行BNL时消耗了时间,表现到过程中就是 Sending data 的时间消耗增多。
吐槽的是mysql中貌似没有什么办法来多方面看查询消耗了。
我想到的是如果该表现有sql关联的顺序是否性能能改善,在该sql中,我发现了两个条件:
AND a.brand_id = 6491603
AND d.brand_id = 6491603
在业务逻辑上这两个表的字段应该是一致的,如果我将d表的d.brand_id = 6491603去掉,以上的执行计划应该会改变,于是去掉之后执行,执行时间非常小。
mysql> show profile for query 1; +----------------------+----------+ | Status | Duration | +----------------------+----------+ | starting | 0.000080 | | checking permissions | 0.000002 | | checking permissions | 0.000002 | | Opening tables | 0.000012 | | init | 0.000030 | | System lock | 0.000006 | | optimizing | 0.000014 | | statistics | 0.000130 | | preparing | 0.000016 | | executing | 0.000001 | | Sending data | 0.027325 | | end | 0.000003 | | query end | 0.000015 | | closing tables | 0.000005 | | freeing items | 0.000014 | | cleaning up | 0.000009 | +----------------------+----------+ 16 rows in set, 1 warning (0.00 sec)
看其执行计划: +----+-------------+-------+------------+-------------+---------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+---------+-------------------------+-------+----------+---------------------------------------------------------------------------------------------------------+ | id | select_type | table | partitions | type | possible_keys | key | key_len | ref | rows | filtered | Extra | +----+-------------+-------+------------+-------------+---------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+---------+-------------------------+-------+----------+---------------------------------------------------------------------------------------------------------+ | 1 | SIMPLE | a | NULL | index_merge | idx_pm_xxxxx_sid,idx_pm_xxxxx_bid,idx_pm_xxxx_organ | idx_pm_xxxxx_organ,idx_pm_xxxx_bid | 99,9 | NULL | 11315 | 1.00 | Using intersect(idx_pm_xxxxx_organ,idx_pm_xxxx_bid); Using where | | 1 | SIMPLE | d | NULL | eq_ref | PRIMARY | PRIMARY | 8 | xxxx.a.series_id | 1 | 10.00 | Using where | +----+-------------+-------+------------+-------------+---------------------------------------------------------------------------------------------------------------+---------------------------------------------------------------------------+---------+-------------------------+-------+----------+---------------------------------------------------------------------------------------------------------+ 2 rows in set, 1 warning (0.00 sec)
发现变成了a表做驱动表,d表做被驱动表,从extra列看
a表是Using intersect(idx_pm_xxxxx_organ,idx_pm_xxxx_bid); Using where 依然是使用索引合并,where条件来取结果,使用了idx_pm_xxxxx_organ,idx_pm_xxxx_bid 连个索引。
d表走PRIMARY 主键索引,从ref列来看是通过a表的series_id 来关联,这样效率表提升了。
需要说的一点是,小结果集并不代表就是小表,大表也可以有小结果集,当大表用来被匹配并被扫描多次,自然效率并不高.
本文名称:一个leftjoinSQL简单优化分析
本文路径:http://pcwzsj.com/article/posipe.html