四虎影视一区二区精品,免费黄色人成视频在线,日日碰狠狠添天天爽五月婷

公告

有時(shí)需求需要我們把系統(tǒng)做成靈活的。最常見(jiàn)的形式是，屬性不能是固定的，要用戶(hù)可以自定義。這樣的需求往往會(huì)在數(shù)據(jù)庫(kù)中建模成一個(gè)一對(duì)多的關(guān)系。

create table person {

...　　

}

create table person_attribute {

person_id ...

attribute_name ...

attribute_value ...

...

}

這樣的建模在沒(méi)有查詢(xún)需要的時(shí)候，還是蠻不錯(cuò)的。但是一旦需要對(duì)擴(kuò)充的屬性值進(jìn)行查詢(xún)，速度往往慘不忍睹。曾經(jīng)在新加坡做過(guò)一個(gè)電信的遺留系統(tǒng)的前端，其數(shù)據(jù)庫(kù)的建模就是這樣的。對(duì)于中間的屬性表，一個(gè)簡(jiǎn)單的查詢(xún)都需要join好幾次，速度非常慢。好在那次只是做ETL，并不是直接把這樣的數(shù)據(jù)庫(kù)做后端，要不然肯定死的很慘。當(dāng)時(shí)的做法是把所有的數(shù)據(jù)讀入到內(nèi)存中，針對(duì)屬性的查詢(xún)用內(nèi)存集合遍歷來(lái)實(shí)現(xiàn)。這樣做的前提是集合的元素?cái)?shù)量非常少（幾百而已），總數(shù)據(jù)量也非常少。但是如果我們需要處理的數(shù)據(jù)量非常大，那么我們就必須在數(shù)據(jù)庫(kù)中能夠?qū)Χ嗑S數(shù)據(jù)進(jìn)行高效查詢(xún)。

為了搞清楚這個(gè)問(wèn)題，我們需要做一系列實(shí)驗(yàn)。先來(lái)介紹一下我們實(shí)驗(yàn)的對(duì)象。

假定我們有一張contacts表，然后對(duì)每個(gè)聯(lián)系人有一個(gè)contact_categories的表，簡(jiǎn)稱(chēng)cc

create table cc(
  contact_id integer,
  cad_id integer,
  value integer);

cad_id代表字段的id，value是cateogry的值。數(shù)據(jù)量是500萬(wàn)。實(shí)驗(yàn)用的數(shù)據(jù)庫(kù)分別是PostgreSQL（原始數(shù)據(jù)大小250M），和使用MYISAM引擎的MySQL（原始數(shù)據(jù)大小73M）。使用的磁盤(pán)是普通的筆記本硬盤(pán)，沒(méi)有raid，普通的ext4分區(qū)，峰值傳輸率大概是70M/s。假設(shè)沒(méi)有資源的爭(zhēng)搶?zhuān)覕?shù)據(jù)庫(kù)總是以最快的順序讀的方式從磁盤(pán)中加載數(shù)據(jù)，那么PostgreSQL得用4s，而MySQL也需要1s才能把所有的磁盤(pán)內(nèi)容讀到內(nèi)存中。

所以如果我們使用基于磁盤(pán)的解決方案的話(huà)，無(wú)論如何也無(wú)法把查詢(xún)壓縮到1s以?xún)?nèi)。因?yàn)槲覀冃枰峁┮粋€(gè)Reponsive的界面前端，所以數(shù)據(jù)必須能夠在內(nèi)存中被查詢(xún)，可能不是所有的數(shù)據(jù)都能放入內(nèi)存，但是最起碼被查詢(xún)到的數(shù)據(jù)得一直在內(nèi)存中。最簡(jiǎn)單的辦法把數(shù)據(jù)庫(kù)移到內(nèi)存中的方式不是改數(shù)據(jù)庫(kù)的設(shè)置，而是直接把內(nèi)存映射成文件夾：

sudo mount -t ramfs -o size=200000m ramfs /mnt/memory

然后把數(shù)據(jù)庫(kù)的數(shù)據(jù)目錄移動(dòng)到/mnt/memory之中。但是數(shù)據(jù)庫(kù)的緩存設(shè)置也是必須修改的，比如PostgreSQL的work_mem如果設(shè)置過(guò)小的話(huà)，在做對(duì)一個(gè)很大的表做count(distinct xxx)時(shí)就會(huì)導(dǎo)致中間結(jié)果被寫(xiě)入到臨時(shí)表之中。所以我們還是把所有的緩存搞大一些吧。

work_mem = 1000MB
shared_buffer = 1000MB
temp_buffer = 1000MB
effective_cache_size = 1000MB
wal_buffers = 1000MB
auto_vacuum = off

讓我們來(lái)看看最基本的一個(gè)查詢(xún)能有多快吧

taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc;'
  count  
---------
 5904385
(1 row)


real    0m0.448s
user    0m0.024s
sys    0m0.000s

count(*)和count(contact_id)是一樣的么，讓我們來(lái)試驗(yàn)一下

taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc;'
  count  
---------
 5904385
(1 row)


real    0m0.537s
user    0m0.016s
sys    0m0.008s

有意思！居然比count(*)還要慢。可能是因?yàn)槲覀儧](méi)有給contact_id字段加索引的原因。好吧，加上索引。

taowen@dmright-perf:~$ time psql postgres -c 'create index contact_id_idx on cc(contact_id);'
CREATE INDEX

real    0m4.848s
user    0m0.016s
sys    0m0.008s
taowen@dmright-perf:~$ time psql postgres -c 'analyze cc;'
ANALYZE

real    0m0.197s
user    0m0.024s
sys    0m0.004s

再試試看

taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc;'
  count  
---------
 5904385
(1 row)


real    0m0.534s
user    0m0.020s
sys    0m0.008s

taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc;'
  count  
---------
 5904385
(1 row)


real    0m0.447s
user    0m0.028s
sys    0m0.008s

基本上沒(méi)有變化……count(*)貌似就是比count(contact_id)要快。不管啦，既然小于500ms，也算是夠快了。讓我們給查詢(xún)加上個(gè)條件吧。

taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc where cad_id = 101 and value = 5;'
 count  
--------
 998839
(1 row)


real    0m0.686s
user    0m0.024s
sys    0m0.000s

taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc where cad_id = 101 and value = 5;'
 count  
--------
 998839
(1 row)


real    0m0.660s
user    0m0.024s
sys    0m0.000s

我們可以看到，在count(*)和count(contact_id)之間沒(méi)有特別大的區(qū)別。但是我們還沒(méi)有給cad_id和value加索引，所以讓我們加上看看如何

taowen@dmright-perf:~$ time psql postgres -c 'create index cad_id_value_idx on cc(cad_id, value);'
CREATE INDEX

real    0m10.069s
user    0m0.020s
sys    0m0.008s

taowen@dmright-perf:~$ time psql postgres -c 'analyze cc;'
ANALYZE

real    0m0.199s
user    0m0.016s
sys    0m0.012s

and try again.

taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc where cad_id = 101 and value = 5;'
 count  
--------
 998839
(1 row)


real    0m0.283s
user    0m0.020s
sys    0m0.012s

真是很快很快耶！讓我們來(lái)回顧一下，我們現(xiàn)在對(duì)于contact_id和(cad_id, value)都建立了索引。
接下來(lái)我們把查詢(xún)弄得更復(fù)雜一些，加上AND條件。有三種可能的方式：
1、INTERSECT
2、INNER JOIN
3、IN + SUB QUERY
我們每種做法都試試

taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp'
 count  
--------
 164788
(1 row)


real    0m1.159s
user    0m0.028s
sys    0m0.008s

taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp' count  
--------
 164788
(1 row)


real    0m1.148s
user    0m0.032s
sys    0m0.000s

顯而易見(jiàn)，這么做很慢。那么INNER JOIN是不是更快一些呢？

taowen@dmright-perf:~$ time psql postgres -c 'select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;'
 count  
--------
 164788
(1 row)


real    0m1.162s
user    0m0.036s
sys    0m0.000s

這大概要慢上個(gè)200ms了。那么IN + SUB QUERY呢？

taowen@dmright-perf:~$ time psql postgres -c 'select count(a1.contact_id) from cc as a1 where a1.cad_id = 101 and a1.value = 5 and a1.contact_id in (select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7)'
 count  
--------
 164788
(1 row)


real    0m2.645s
user    0m0.024s
sys    0m0.004s

靠，居然更慢了。總結(jié)就是，在有兩個(gè)條件的情況下，INTERSECT似乎是最快的。但是即便如此，它也超過(guò)了一秒鐘了。為什么會(huì)這樣呢？時(shí)間都花哪里去了？

taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp'
                                                                              QUERY PLAN                                                                               
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=107853.91..107853.92 rows=1 width=4) (actual time=1471.907..1471.907 rows=1 loops=1)
   ->  Subquery Scan on temp  (cost=7467.63..107375.65 rows=191301 width=4) (actual time=1366.131..1459.419 rows=164788 loops=1)
         ->  HashSetOp Intersect  (cost=7467.63..105462.64 rows=191301 width=4) (actual time=1366.129..1439.781 rows=164788 loops=1)
               ->  Append  (cost=7467.63..103326.69 rows=854380 width=4) (actual time=69.436..797.478 rows=1829408 loops=1)
                     ->  Subquery Scan on "*SELECT* 2"  (cost=7467.63..48180.53 rows=351876 width=4) (actual time=69.435..350.031 rows=830569 loops=1)
                           ->  Bitmap Heap Scan on cc  (cost=7467.63..44661.77 rows=351876 width=4) (actual time=69.434..264.538 rows=830569 loops=1)
                                 Recheck Cond: ((cad_id = 102) AND (value = 7))
                                 ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..7379.66 rows=351876 width=0) (actual time=64.162..64.162 rows=830569 loops=1)
                                       Index Cond: ((cad_id = 102) AND (value = 7))
                     ->  Subquery Scan on "*SELECT* 1"  (cost=10667.56..55146.16 rows=502504 width=4) (actual time=69.846..331.544 rows=998839 loops=1)
                           ->  Bitmap Heap Scan on cc  (cost=10667.56..50121.12 rows=502504 width=4) (actual time=69.845..233.794 rows=998839 loops=1)
                                 Recheck Cond: ((cad_id = 101) AND (value = 5))
                                 ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..10541.94 rows=502504 width=0) (actual time=64.501..64.501 rows=998839 loops=1)
                                       Index Cond: ((cad_id = 101) AND (value = 5))
 Total runtime: 1477.484 ms
(15 rows)

首先，INTERSECT不是并行執(zhí)行的。兩個(gè)子查詢(xún)分別花費(fèi)了300ms以上的時(shí)間，加起來(lái)有800ms是用在搜集contact_id上了。其次，集合之間的并集操作花費(fèi)了600多ms。其余的時(shí)間都花在了數(shù)集合的成員個(gè)數(shù)上了。有一點(diǎn)值得注意的是，根據(jù) http://postgresql.1045698.n5.nabble.com/ANTI-JOIN-needs-table-index-scan-not-possible-td3425340.html index scan和heap scan實(shí)際上都用上了索引。然而，SELECT還是在其之上額外耗費(fèi)了100ms，我猜測(cè)它可能是回到原始的表結(jié)構(gòu)中把行取出來(lái)，以獲得contact_id的值。這在磁盤(pán)上的話(huà)速度會(huì)更慢，因?yàn)闀?huì)是random seek操作。
看完了INTERSECT，讓我們?cè)賮?lái)分析分析Join：

taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;'
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=180366.31..180366.32 rows=1 width=4) (actual time=1280.964..1280.964 rows=1 loops=1)
   ->  Merge Join  (cost=174783.00..180104.02 rows=104912 width=4) (actual time=1043.879..1270.197 rows=164788 loops=1)
         Merge Cond: (a1.contact_id = a2.contact_id)
         ->  Sort  (cost=97705.18..98961.44 rows=502504 width=4) (actual time=586.735..626.292 rows=998839 loops=1)
               Sort Key: a1.contact_id
               Sort Method:  quicksort  Memory: 71397kB
               ->  Bitmap Heap Scan on cc a1  (cost=10667.56..50121.12 rows=502504 width=4) (actual time=75.377..257.403 rows=998839 loops=1)
                     Recheck Cond: ((cad_id = 101) AND (value = 5))
                     ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..10541.94 rows=502504 width=0) (actual time=69.565..69.565 rows=998839 loops=1)
                           Index Cond: ((cad_id = 101) AND (value = 5))
         ->  Sort  (cost=77077.83..77957.52 rows=351876 width=4) (actual time=457.131..492.428 rows=830569 loops=1)
               Sort Key: a2.contact_id
               Sort Method:  quicksort  Memory: 63509kB
               ->  Bitmap Heap Scan on cc a2  (cost=7467.63..44661.77 rows=351876 width=4) (actual time=60.039..186.422 rows=830569 loops=1)
                     Recheck Cond: ((cad_id = 102) AND (value = 7))
                     ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..7379.66 rows=351876 width=0) (actual time=54.929..54.929 rows=830569 loops=1)
                           Index Cond: ((cad_id = 102) AND (value = 7))
 Total runtime: 1286.846 ms
(18 rows)

貌似大部分時(shí)間都花在了排序上。而且它也不是并行執(zhí)行的。如果后臺(tái)同時(shí)執(zhí)行top命令的話(huà)，就會(huì)發(fā)現(xiàn)只有一個(gè)核是被實(shí)際占用著的。
要是再創(chuàng)建更多的索引呢？會(huì)不會(huì)有幫助？

taowen@dmright-perf:~$ time psql postgres -c 'create index cad_id_value_contact_id on cc(cad_id, value, contact_id);'CREATE INDEX

real    0m10.683s
user    0m0.020s
sys    0m0.004s
taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;'
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=180366.31..180366.32 rows=1 width=4) (actual time=1338.478..1338.478 rows=1 loops=1)
   ->  Merge Join  (cost=174783.00..180104.02 rows=104912 width=4) (actual time=1090.136..1327.312 rows=164788 loops=1)
         Merge Cond: (a1.contact_id = a2.contact_id)
         ->  Sort  (cost=97705.18..98961.44 rows=502504 width=4) (actual time=615.843..657.438 rows=998839 loops=1)
               Sort Key: a1.contact_id
               Sort Method:  quicksort  Memory: 71397kB
               ->  Bitmap Heap Scan on cc a1  (cost=10667.56..50121.12 rows=502504 width=4) (actual time=80.926..275.431 rows=998839 loops=1)
                     Recheck Cond: ((cad_id = 101) AND (value = 5))
                     ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..10541.94 rows=502504 width=0) (actual time=75.816..75.816 rows=998839 loops=1)
                           Index Cond: ((cad_id = 101) AND (value = 5))
         ->  Sort  (cost=77077.83..77957.52 rows=351876 width=4) (actual time=474.279..510.866 rows=830569 loops=1)
               Sort Key: a2.contact_id
               Sort Method:  quicksort  Memory: 63509kB
               ->  Bitmap Heap Scan on cc a2  (cost=7467.63..44661.77 rows=351876 width=4) (actual time=65.335..198.655 rows=830569 loops=1)
                     Recheck Cond: ((cad_id = 102) AND (value = 7))
                     ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..7379.66 rows=351876 width=0) (actual time=60.314..60.314 rows=830569 loops=1)
                           Index Cond: ((cad_id = 102) AND (value = 7))
 Total runtime: 1346.587 ms
(18 rows)

還是不行！Merge Join慢的話(huà)，升級(jí)到9.1然后強(qiáng)制使用hash join會(huì)不會(huì)好一點(diǎn)？

postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5';
                                                                        QUERY PLAN                                                                        
----------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=902356.85..902356.86 rows=1 width=4) (actual time=1693.137..1693.137 rows=1 loops=1)
   ->  Hash Join  (cost=59599.78..902135.88 rows=88389 width=4) (actual time=461.788..1682.718 rows=164788 loops=1)
         Hash Cond: (a1.contact_id = a2.contact_id)
         ->  Bitmap Heap Scan on cc a1  (cost=10833.24..50406.32 rows=510472 width=4) (actual time=76.337..224.571 rows=998839 loops=1)
               Recheck Cond: ((cad_id = 101) AND (value = 5))
               ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..10705.62 rows=510472 width=0) (actual time=71.028..71.028 rows=998839 loops=1)
                     Index Cond: ((cad_id = 101) AND (value = 5))
         ->  Hash  (cost=44444.15..44444.15 rows=345792 width=4) (actual time=385.256..385.256 rows=830569 loops=1)
               Buckets: 65536  Batches: 1  Memory Usage: 29200kB
               ->  Bitmap Heap Scan on cc a2  (cost=7341.27..44444.15 rows=345792 width=4) (actual time=64.778..258.059 rows=830569 loops=1)
                     Recheck Cond: ((cad_id = 102) AND (value = 7))
                     ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..7254.82 rows=345792 width=0) (actual time=59.675..59.675 rows=830569 loops=1)
                           Index Cond: ((cad_id = 102) AND (value = 7))
 Total runtime: 1698.207 ms
(14 rows)

有兩個(gè)原因造成這樣的情況：
1、兩個(gè)條件意味著掃描兩次，而且不是并行掃描
2、join自身很耗費(fèi)時(shí)間，無(wú)論是sort merge join還是hash join

兩個(gè)條件都這熊樣了，三個(gè)條件呢？看看吧，先上INTERSECT：

postgres@dmright-perf:/mnt/memory/pg$ time psql postgres -c 'select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7 intersect select contact_id from cc where cad_id = 6 and value = 1) as temp'
 count 
-------
  6748
(1 row)


real    0m1.350s
user    0m0.020s
sys    0m0.008s

然后是INNER JOIN

postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id inner join cc as a3 on a1.contact_id = a3.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1';
 count 
-------
  6748
(1 row)


real    0m0.756s
user    0m0.028s
sys    0m0.000s

然后是IN + SUBQUERY：

postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'select count(a1.contact_id) from cc as a1 where a1.cad_id = 101 and a1.value = 5 and a1.contact_id in (select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7 and a2.contact_id in (select contact_id from cc as a3 where a3.cad_id = 6 and a3.value = 1))';
 count 
-------
  6748
(1 row)


real    0m7.320s
user    0m0.024s
sys    0m0.004s

我們可以看到這回INNER JOIN是最快的了。為什么？

postgres@dmright-perf:/mnt/memory/pg$ time psql postgres -c 'explain analyze select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7 intersect select contact_id from cc where cad_id = 6 and value = 1) as temp'
                                                                                       QUERY PLAN                                                                                        
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=132357.21..132357.22 rows=1 width=4) (actual time=1659.666..1659.666 rows=1 loops=1)
   ->  Subquery Scan on temp  (cost=306.85..132335.71 rows=8602 width=4) (actual time=1648.818..1659.243 rows=6748 loops=1)
         ->  HashSetOp Intersect  (cost=306.85..132249.69 rows=8602 width=4) (actual time=1648.817..1658.543 rows=6748 loops=1)
               ->  Append  (cost=306.85..131691.89 rows=223118 width=4) (actual time=33.797..1547.811 rows=372016 loops=1)
                     ->  Subquery Scan on "*SELECT* 3"  (cost=306.85..26138.13 rows=14239 width=4) (actual time=33.797..153.230 rows=207228 loops=1)
                           ->  Bitmap Heap Scan on cc  (cost=306.85..25995.74 rows=14239 width=4) (actual time=33.795..134.292 rows=207228 loops=1)
                                 Recheck Cond: ((cad_id = 6) AND (value = 1))
                                 ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..303.29 rows=14239 width=0) (actual time=28.558..28.558 rows=207228 loops=1)
                                       Index Cond: ((cad_id = 6) AND (value = 1))
                     ->  Result  (cost=7341.27..105553.76 rows=208879 width=4) (actual time=1282.426..1371.255 rows=164788 loops=1)
                           ->  HashSetOp Intersect  (cost=7341.27..105553.76 rows=208879 width=4) (actual time=1282.418..1353.422 rows=164788 loops=1)
                                 ->  Append  (cost=7341.27..103413.10 rows=856264 width=4) (actual time=57.550..734.282 rows=1829408 loops=1)
                                       ->  Subquery Scan on "*SELECT* 2"  (cost=7341.27..47902.07 rows=345792 width=4) (actual time=57.550..292.595 rows=830569 loops=1)
                                             ->  Bitmap Heap Scan on cc  (cost=7341.27..44444.15 rows=345792 width=4) (actual time=57.549..218.162 rows=830569 loops=1)
                                                   Recheck Cond: ((cad_id = 102) AND (value = 7))
                                                   ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..7254.82 rows=345792 width=0) (actual time=52.606..52.606 rows=830569 loops=1)
                                                         Index Cond: ((cad_id = 102) AND (value = 7))
                                       ->  Subquery Scan on "*SELECT* 1"  (cost=10833.24..55511.04 rows=510472 width=4) (actual time=69.129..330.737 rows=998839 loops=1)
                                             ->  Bitmap Heap Scan on cc  (cost=10833.24..50406.32 rows=510472 width=4) (actual time=69.128..242.416 rows=998839 loops=1)
                                                   Recheck Cond: ((cad_id = 101) AND (value = 5))
                                                   ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..10705.62 rows=510472 width=0) (actual time=64.161..64.161 rows=998839 loops=1)
                                                         Index Cond: ((cad_id = 101) AND (value = 5))
 Total runtime: 1665.691 ms
(23 rows)

對(duì)于INTERSECT來(lái)說(shuō)，過(guò)程和兩個(gè)條件是差不多的，只是集合更大一些罷了。

postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id inner join cc as a3 on a1.contact_id = a3.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1';
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=118979.47..118979.48 rows=1 width=4) (actual time=771.393..771.393 rows=1 loops=1)
   ->  Nested Loop  (cost=49073.39..118977.89 rows=630 width=4) (actual time=399.835..770.587 rows=6748 loops=1)
         ->  Hash Join  (cost=49073.39..97159.97 rows=2466 width=8) (actual time=399.814..691.519 rows=34219 loops=1)
               Hash Cond: (a3.contact_id = a2.contact_id)
               ->  Bitmap Heap Scan on cc a3  (cost=306.85..25995.74 rows=14239 width=4) (actual time=25.655..83.526 rows=207228 loops=1)
                     Recheck Cond: ((cad_id = 6) AND (value = 1))
                     ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..303.29 rows=14239 width=0) (actual time=20.570..20.570 rows=207228 loops=1)
                           Index Cond: ((cad_id = 6) AND (value = 1))
               ->  Hash  (cost=44444.15..44444.15 rows=345792 width=4) (actual time=373.969..373.969 rows=830569 loops=1)
                     Buckets: 65536  Batches: 1  Memory Usage: 29200kB
                     ->  Bitmap Heap Scan on cc a2  (cost=7341.27..44444.15 rows=345792 width=4) (actual time=59.271..250.932 rows=830569 loops=1)
                           Recheck Cond: ((cad_id = 102) AND (value = 7))
                           ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..7254.82 rows=345792 width=0) (actual time=54.030..54.030 rows=830569 loops=1)
                                 Index Cond: ((cad_id = 102) AND (value = 7))
         ->  Index Scan using cad_id_value_contact_id_idx on cc a1  (cost=0.00..8.83 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=34219)
               Index Cond: ((cad_id = 101) AND (value = 5) AND (contact_id = a2.contact_id))
 Total runtime: 774.588 ms
(17 rows)

對(duì)于INNER JOIN，索引都被利用上了。它不需要取得contact_id然后再來(lái)做集合操作。結(jié)論是對(duì)rowid做hash操作比集合操作更快。
不顯示用INNER JOIN，讓Planner決定Join順序也是一樣的：

postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1, cc as a2, cc as a3 where a1.contact_id = a2.contact_id and a2.contact_id = a3.contact_id and a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1';
                                                                           QUERY PLAN                                                                           
----------------------------------------------------------------------------------------------------------------------------------------------------------------
 Aggregate  (cost=118979.47..118979.48 rows=1 width=4) (actual time=762.969..762.970 rows=1 loops=1)
   ->  Nested Loop  (cost=49073.39..118977.89 rows=630 width=4) (actual time=398.554..762.206 rows=6748 loops=1)
         ->  Hash Join  (cost=49073.39..97159.97 rows=2466 width=8) (actual time=398.531..684.425 rows=34219 loops=1)
               Hash Cond: (a3.contact_id = a2.contact_id)
               ->  Bitmap Heap Scan on cc a3  (cost=306.85..25995.74 rows=14239 width=4) (actual time=34.802..91.672 rows=207228 loops=1)
                     Recheck Cond: ((cad_id = 6) AND (value = 1))
                     ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..303.29 rows=14239 width=0) (actual time=29.524..29.524 rows=207228 loops=1)
                           Index Cond: ((cad_id = 6) AND (value = 1))
               ->  Hash  (cost=44444.15..44444.15 rows=345792 width=4) (actual time=363.537..363.537 rows=830569 loops=1)
                     Buckets: 65536  Batches: 1  Memory Usage: 29200kB
                     ->  Bitmap Heap Scan on cc a2  (cost=7341.27..44444.15 rows=345792 width=4) (actual time=57.799..245.467 rows=830569 loops=1)
                           Recheck Cond: ((cad_id = 102) AND (value = 7))
                           ->  Bitmap Index Scan on cad_id_value_idx  (cost=0.00..7254.82 rows=345792 width=0) (actual time=52.349..52.349 rows=830569 loops=1)
                                 Index Cond: ((cad_id = 102) AND (value = 7))
         ->  Index Scan using cad_id_value_contact_id_idx on cc a1  (cost=0.00..8.83 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=34219)
               Index Cond: ((cad_id = 101) AND (value = 5) AND (contact_id = a2.contact_id))
 Total runtime: 766.107 ms
(17 rows)

結(jié)果是一樣的。

基本上對(duì)PostgreSQL的實(shí)驗(yàn)就到這里了，差不多也就這樣了，提高空間不大。Google之后發(fā)現(xiàn)，PostgreSQL使用的MVCC機(jī)制導(dǎo)致其甚至在做SELECT COUNT這樣的操作的時(shí)候也會(huì)去更新hint bit。也許這就是其慢的重要原因。MySQL的MYISAM引擎以完全不負(fù)責(zé)事務(wù)和著稱(chēng)，應(yīng)該會(huì)比PostgreSQL有更大的提升，下一篇中，我們將對(duì)MySQL重復(fù)同樣的實(shí)驗(yàn)。

posted on 2012-02-26 08:33 taowen 閱讀(1401) 評(píng)論(0) 收藏舉報(bào)

刷新頁(yè)面返回頂部