中文字幕国产精品二区,免费无码AV一区二区波多野结衣 ,国产AV影片麻豆精品传媒

公告

上次我們分析了在附加屬性表這樣表結(jié)構(gòu)設(shè)計(jì)下的PostgreSQL查詢效率。由于PostgreSQL眾所周知的所謂“性能”問(wèn)題，所以有必要再用使用MyISAM引擎的MySQL再來(lái)實(shí)驗(yàn)一遍。在我們?cè)敿?xì)分析了兩種常見(jiàn)的開(kāi)源數(shù)據(jù)庫(kù)之后，話題將會(huì)進(jìn)一步引申到按行存儲(chǔ)的數(shù)據(jù)庫(kù)結(jié)構(gòu)以及索引對(duì)于查詢效率的影響。以下實(shí)驗(yàn)中的MySQL為MariaDB發(fā)行版本。還是從建表開(kāi)始：

MariaDB [veil]> show create table cc2;
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                         |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------+
| cc2   | CREATE TABLE `cc2` (
  `contact_id` int(11) NOT NULL,
  `cad_id` int(11) NOT NULL,
  `value` int(11) NOT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------+------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [veil]> select count(*) from cc2;
+----------+
| count(*) |
+----------+
|  5904385 |
+----------+
1 row in set (0.00 sec)

MySQL在表上緩存了count(*)的結(jié)果，所以查詢是不需要花費(fèi)時(shí)間的。再來(lái)創(chuàng)建必要的索引：

MariaDB [veil]> create index contact_id_idx on cc2(contact_id);
Query OK, 5904385 rows affected (3.11 sec)             
Records: 5904385  Duplicates: 0  Warnings: 0

MariaDB [veil]> create index cad_id_value_idx on cc2(cad_id, value);
Query OK, 5904385 rows affected (8.17 sec)             
Records: 5904385  Duplicates: 0  Warnings: 0

MariaDB [veil]> select count(*) from cc2 where cad_id = 101 and value = 5;
+----------+
| count(*) |
+----------+
|   998839 |
+----------+
1 row in set (0.18 sec)

我們可以看到，這速度那是剛剛的。比較PostgreSQL中的相同的查詢，速度要快上4倍。但是：

MariaDB [veil]> select count(contact_id) from cc2 where cad_id = 101 and value = 5;
+-------------------+
| count(contact_id) |
+-------------------+
|            998839 |
+-------------------+
1 row in set (0.41 sec)

這個(gè)查詢比前一個(gè)慢，是因?yàn)閏ount(*)是數(shù)返回行的rowid，而count(contact_id)是數(shù)真正的contact_id列，而這個(gè)列的值是不包含在cad_id_value_idx中的，如果創(chuàng)建更多的索引的話，速度就要更快一些了：

MariaDB [veil]> create index cad_id_value_contact_id_idx on cc2(cad_id, value, contact_id);
Query OK, 5904385 rows affected (13.37 sec)            
Records: 5904385  Duplicates: 0  Warnings: 0

MariaDB [veil]> select count(contact_id) from cc2 where cad_id = 101 and value = 5;
+-------------------+
| count(contact_id) |
+-------------------+
|            998839 |
+-------------------+
1 row in set (0.21 sec)

這個(gè)在MySQL中被稱作covering index。大概PostgreSQL 9.2還沒(méi)發(fā)布的index only query也是這個(gè)意思吧，我猜的。
現(xiàn)在，讓我們來(lái)看看MySQL是否能夠在兩個(gè)條件的情況下表現(xiàn)得比PostgreSQL強(qiáng)。首先嘗試INTERSECT吧：

MariaDB [veil]> select count(*) from (select contact_id from cc as a1 where a1.cad_id = 101 and a1.value = 5 intersect select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7) as temp;
ERROR 1064 (42000): You have an error in your SQL syntax; check the manual that corresponds to your MySQL server version for the right syntax to use near 'intersect select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7' at line 1

我靠，MySQL居然不支持INTERSECT。由于我們知道IN + SUBQUERY肯定是更慢的，所以就只剩INNER JOIN這一種寫(xiě)法了。

MariaDB [veil]> select count(*) from cc as a1, cc as a2 where a1.contact_id = a2.contact_id and a1.cad_id = 101 and a1.value = 5 and a2.cad_id = 102 and a2.value = 7;
+----------+
| count(*) |
+----------+
|   164788 |
+----------+
1 row in set (6.56 sec)

MariaDB [veil]> select count(a1.contact_id) from cc as a1, cc as a2 where a1.contact_id = a2.contact_id and a1.cad_id = 101 and a1.value = 5 and a2.cad_id = 102 and a2.value = 7;
+----------------------+
| count(a1.contact_id) |
+----------------------+
|               164788 |
+----------------------+
1 row in set (6.67 sec)

你沒(méi)看錯(cuò)，這結(jié)果就是這么慘不忍睹。它甚至比PostgreSQL用IN + SUBQUERY實(shí)現(xiàn)得還要慢。看看到底是啥狀況吧：

MariaDB [veil]> explain extended select count(a1.contact_id) from cc as a1, cc as a2 where a1.contact_id = a2.contact_id and a1.cad_id = 101 and a1.value = 5 and a2.cad_id = 102 and a2.value = 7;
+----+-------------+-------+------+------------------------------------+-------------------------+---------+--------------------------------+--------+----------+--------------------------+
| id | select_type | table | type | possible_keys                      | key                     | key_len | ref                            | rows   | filtered | Extra                    |
+----+-------------+-------+------+------------------------------------+-------------------------+---------+--------------------------------+--------+----------+--------------------------+
|  1 | SIMPLE      | a1    | ref  | contact_id,cad_id_value_contact_id | cad_id_value_contact_id | 10      | const,const                    | 808542 |   100.00 | Using where; Using index |
|  1 | SIMPLE      | a2    | ref  | contact_id,cad_id_value_contact_id | cad_id_value_contact_id | 15      | const,const,veil.a1.contact_id |      1 |   100.00 | Using index              |
+----+-------------+-------+------+------------------------------------+-------------------------+---------+--------------------------------+--------+----------+--------------------------+
2 rows in set, 1 warning (0.00 sec)

雖然不像PostgreSQL的分析結(jié)果那般詳細(xì)。但是至少還是可以看出，索引確實(shí)是被利用上了。所以我感覺(jué)，MySQL真的不擅長(zhǎng)復(fù)雜的join。這就是問(wèn)題了，join處理不好，多條件的情況就沒(méi)法支持好了。

對(duì)于我們的第一次嘗試，可以總結(jié)出以下幾點(diǎn)：

基于磁盤(pán)的方案行不通，必須放在內(nèi)存中
無(wú)條件：PostgreSQL慢，MySQL慢
一個(gè)條件：PostgreSQL快，MySQL更快（拜covering index所賜）
兩個(gè)條件：PostgreSQL慢，MySQL更慢。INNER JOIN稍慢于INTERSECT。
三個(gè)條件：PostgreSQL快于MySQL，而且INNER JOIN要快于INTERSECT。

總體來(lái)說(shuō)，要把請(qǐng)求響應(yīng)時(shí)間控制在一秒之內(nèi)那難度是相當(dāng)?shù)拇蟀　Ｋ越Ｋ悸繁仨氉兓?。那我們?cè)賮?lái)看看把行變列會(huì)不會(huì)好一些吧：

MariaDB [veil]> show create table spike;
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                         |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| spike | CREATE TABLE `spike` (
  `contact_id` int(11) NOT NULL,
  `a1` int(11) DEFAULT NULL,
  `a2` int(11) DEFAULT NULL,
  `a3` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

MariaDB [veil]> select * from spike limit 10;
+------------+------+------+------+
| contact_id | a1   | a2   | a3   |
+------------+------+------+------+
|     800001 |    4 | NULL | NULL |
|     800003 |    5 |    7 |    1 |
|     800004 |    3 | NULL | NULL |
|     800005 |    3 | NULL | NULL |
|     800006 |    5 | NULL | NULL |
|     800007 | NULL |    6 | NULL |
|     800008 |    5 | NULL | NULL |
|     800009 |    4 |    8 | NULL |
|     800011 | NULL |    7 | NULL |
|     800012 |    3 |    6 |    1 |
+------------+------+------+------+
10 rows in set (0.01 sec)

列a1來(lái)自于cad_id=101的值，a2來(lái)自于cad_id=102的值，a3來(lái)自于cad_id=6的值。

首先試試一個(gè)條件的性能吧：

MariaDB [veil]> select count(*) from spike where a1 = 5;
+----------+
| count(*) |
+----------+
|   998839 |
+----------+
1 row in set (0.26 sec)

速度不錯(cuò)，即便我們還沒(méi)給a1加索引呢。這是因?yàn)镻ostgreSQL和MySQL這樣的按行存儲(chǔ)的數(shù)據(jù)庫(kù)，在需要做sequential scan的時(shí)候，性能很大程度上決定于行的大小。如果我們有100行，這性能就肯定不會(huì)這么好了。不信的話可以試試：

MariaDB [veil]> alter table spike add column dummy1 TEXT;
Query OK, 4074980 rows affected (1.28 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> select count(*) from spike where a1 = 5;
+----------+
| count(*) |
+----------+
|   998839 |
+----------+
1 row in set (0.64 sec)

不出意料，僅僅只是添加了一個(gè)TEXT字段，查詢就慢了這么多。為了克服按行存儲(chǔ)的查詢速度隨行的列數(shù)增加和越來(lái)越慢的問(wèn)題，索引變得特別必要：

MariaDB [veil]> create index a1_idx on spike(a1);        
Query OK, 4074980 rows affected (4.97 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> select count(*) from spike where a1 = 5;
+----------+
| count(*) |
+----------+
|   998839 |
+----------+
1 row in set (0.22 sec)

我們可以看到，查詢時(shí)間又變得正常了。因?yàn)閍1_idx是一個(gè)covering index，所以count無(wú)需真的讀取行，只需要查詢索引就可以知道有多少行了。而且MYISAM又不像PostgreSQL那樣，因?yàn)镸VCC還需要對(duì)原始行更新hint bit，所以整個(gè)查詢就和行的大小一點(diǎn)關(guān)系都沒(méi)有了。

MariaDB [veil]> alter table spike add column dummy2 TEXT;
Query OK, 4074980 rows affected (5.00 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> alter table spike add column dummy3 TEXT;
Query OK, 4074980 rows affected (5.30 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> select count(*) from spike where a1 = 5;
+----------+
| count(*) |
+----------+
|   998839 |
+----------+
1 row in set (0.23 sec)

對(duì)吧，沒(méi)騙你吧。
那么兩個(gè)條件如何？先把表結(jié)構(gòu)回到正常的狀態(tài)，把索引都去掉：

MariaDB [veil]> alter table spike drop column dummy1;
Query OK, 4074980 rows affected (5.19 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> alter table spike drop column dummy2;
Query OK, 4074980 rows affected (4.98 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> alter table spike drop column dummy3;
Query OK, 4074980 rows affected (3.41 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> alter table spike drop index a1_idx;
Query OK, 4074980 rows affected (0.46 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> show create table spike;             
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| Table | Create Table                                                                                                                                                                         |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
| spike | CREATE TABLE `spike` (
  `contact_id` int(11) NOT NULL,
  `a1` int(11) DEFAULT NULL,
  `a2` int(11) DEFAULT NULL,
  `a3` int(11) DEFAULT NULL
) ENGINE=MyISAM DEFAULT CHARSET=latin1 |
+-------+--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------+
1 row in set (0.00 sec)

接下來(lái)用兩個(gè)條件做查詢：

MariaDB [veil]> select count(*) from spike where a1 = 5 and a2 = 7;
+----------+
| count(*) |
+----------+
|   164788 |
+----------+
1 row in set (0.30 sec)

速度不錯(cuò)嘛！給a1創(chuàng)建一個(gè)索引，應(yīng)該速度會(huì)更快吧：

MariaDB [veil]> create index a1_idx on spike(a1);                  
Query OK, 4074980 rows affected (3.06 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> select count(*) from spike where a1 = 5 and a2 = 7;
+----------+
| count(*) |
+----------+
|   164788 |
+----------+
1 row in set (0.45 sec)

尼瑪居然更慢了……原因不是索引沒(méi)有被使用，而是因?yàn)橛昧怂饕愿?/p>

MariaDB [veil]> explain select count(*) from spike where a1 = 5 and a2 = 7;
+----+-------------+-------+------+---------------+--------+---------+-------+---------+-------------+
| id | select_type | table | type | possible_keys | key    | key_len | ref   | rows    | Extra       |
+----+-------------+-------+------+---------------+--------+---------+-------+---------+-------------+
|  1 | SIMPLE      | spike | ref  | a1_idx        | a1_idx | 5       | const | 1420338 | Using where |
+----+-------------+-------+------+---------------+--------+---------+-------+---------+-------------+
1 row in set (0.01 sec)

a1索引確實(shí)被使用了，但是a2并沒(méi)有在同一個(gè)索引之中。所以這不是一個(gè)covering index，查詢就不能僅僅在索引中查詢了，還要回到原始的行存儲(chǔ)中去檢查a2=7。

MariaDB [veil]> create index a2_idx on spike(a2);
Query OK, 4074980 rows affected (5.71 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> select count(*) from spike where a1 = 5 and a2 = 7;
+----------+
| count(*) |
+----------+
|   164788 |
+----------+
1 row in set (0.40 sec)

給a2_idx創(chuàng)建了一個(gè)單獨(dú)的索引也于事無(wú)補(bǔ)，因?yàn)樵趻呙鑑1_idx的時(shí)候，還是沒(méi)有a2。查詢?nèi)匀恍枰氐皆嫉男写鎯?chǔ)中去獲得a2。

MariaDB [veil]> create index a1_a2_idx on spike(a1, a2);
Query OK, 4074980 rows affected (9.47 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> select count(*) from spike where a1 = 5 and a2 = 7;
+----------+
| count(*) |
+----------+
|   164788 |
+----------+
1 row in set (0.06 sec)

現(xiàn)在，我們才看到什么叫做速度！它是完全基于索引的。

MariaDB [veil]> alter table spike add column dummy1 TEXT;
Query OK, 4074980 rows affected (12.33 sec)            
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> alter table spike add column dummy2 TEXT;
Query OK, 4074980 rows affected (12.87 sec)            
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> alter table spike add column dummy3 TEXT;
Query OK, 4074980 rows affected (13.15 sec)            
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> select count(*) from spike where a1 = 5 and a2 = 7;
+----------+
| count(*) |
+----------+
|   164788 |
+----------+
1 row in set (0.06 sec)

把行的大小增大也沒(méi)有關(guān)系，因?yàn)閴焊筒粫?huì)去讀真正的行。但是一旦把covering index移除，情況就回到原來(lái)那個(gè)德行了：

MariaDB [veil]> alter table spike drop index a1_a2_idx;
Query OK, 4074980 rows affected (8.74 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> select count(*) from spike where a1 = 5 and a2 = 7;
+----------+
| count(*) |
+----------+
|   164788 |
+----------+
1 row in set (0.62 sec)

MariaDB [veil]> alter table spike drop index a1_idx;
Query OK, 4074980 rows affected (5.31 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> alter table spike drop index a2_idx;
Query OK, 4074980 rows affected (1.93 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> select count(*) from spike where a1 = 5 and a2 = 7;
+----------+
| count(*) |
+----------+
|   164788 |
+----------+
1 row in set (0.72 sec)

所以嘛，wide table的表設(shè)計(jì)確實(shí)會(huì)是一個(gè)問(wèn)題。一旦索引罩不住了，情況就糟糕了。

如果covering index這么棒的話，那為何不給所有的查詢都建立covering index呢？這樣問(wèn)題可不就解決了么。

真是牛逼的想法，試試吧：

MariaDB [veil]> create index a1_a2_a3_idx on spike(a1, a2, a3);
Query OK, 4074980 rows affected (7.03 sec)             
Records: 4074980  Duplicates: 0  Warnings: 0

MariaDB [veil]> select count(*) from spike where a1 = 5 and a2 = 7 and a3 = 1;
+----------+
| count(*) |
+----------+
|     6748 |
+----------+
1 row in set (0.01 sec)

真的耶！covering index又立功了。似乎一旦建立了這樣的一個(gè)索引，所有的查詢問(wèn)題都解決了。

MariaDB [veil]> select count(*) from spike where a1 = 5;                      
+----------+
| count(*) |
+----------+
|   998839 |
+----------+
1 row in set (0.28 sec)

看見(jiàn)沒(méi)，它確實(shí)管用！

MariaDB [veil]> select count(*) from spike where a3 = 1;
+----------+
| count(*) |
+----------+
|   207228 |
+----------+
1 row in set (1.06 sec)

我靠，高興太早了。栽了。原因是索引的順序是a1，a2，a3，對(duì)于a1的查詢因?yàn)楹退饕⒌捻樞蛳喾?，所以用的上。類似的a1，a2的查詢也能用得上。但是a3和建立順序不符，就無(wú)法利用上。如果要真正的達(dá)到覆蓋所有的情況，那么將是一個(gè)排列組合的結(jié)果。對(duì)于十幾列的表來(lái)說(shuō)，這個(gè)數(shù)字會(huì)大得讓系統(tǒng)無(wú)法承受。

總結(jié)起來(lái)就是，無(wú)論是磁盤(pán)還是內(nèi)存，無(wú)論是附加的屬性表，還是把屬性建模成列，無(wú)論是PostgreSQL還是MySQL都沒(méi)有辦法提供Responsive Query。傳統(tǒng)的按行存儲(chǔ)的數(shù)據(jù)庫(kù)無(wú)法滿足多維數(shù)據(jù)的高速查詢需求。

posted on 2012-02-26 16:43 taowen 閱讀(1895) 評(píng)論(1) 收藏舉報(bào)

刷新頁(yè)面返回頂部