<output id="qn6qe"></output>

    1. <output id="qn6qe"><tt id="qn6qe"></tt></output>
    2. <strike id="qn6qe"></strike>

      亚洲 日本 欧洲 欧美 视频,日韩中文字幕有码av,一本一道av中文字幕无码,国产线播放免费人成视频播放,人妻少妇偷人无码视频,日夜啪啪一区二区三区,国产尤物精品自在拍视频首页,久热这里只有精品12

      有時(shí)需求需要我們把系統(tǒng)做成靈活的。最常見(jiàn)的形式是,屬性不能是固定的,要用戶(hù)可以自定義。這樣的需求往往會(huì)在數(shù)據(jù)庫(kù)中建模成一個(gè)一對(duì)多的關(guān)系。

      create table person {

      ...  

      }
      create table person_attribute {

      person_id ...

      attribute_name ...

      attribute_value ...

      ...

      }

      這樣的建模在沒(méi)有查詢(xún)需要的時(shí)候,還是蠻不錯(cuò)的。但是一旦需要對(duì)擴(kuò)充的屬性值進(jìn)行查詢(xún),速度往往慘不忍睹。曾經(jīng)在新加坡做過(guò)一個(gè)電信的遺留系統(tǒng)的前端,其數(shù)據(jù)庫(kù)的建模就是這樣的。對(duì)于中間的屬性表,一個(gè)簡(jiǎn)單的查詢(xún)都需要join好幾次,速度非常慢。好在那次只是做ETL,并不是直接把這樣的數(shù)據(jù)庫(kù)做后端,要不然肯定死的很慘。當(dāng)時(shí)的做法是把所有的數(shù)據(jù)讀入到內(nèi)存中,針對(duì)屬性的查詢(xún)用內(nèi)存集合遍歷來(lái)實(shí)現(xiàn)。這樣做的前提是集合的元素?cái)?shù)量非常少(幾百而已),總數(shù)據(jù)量也非常少。但是如果我們需要處理的數(shù)據(jù)量非常大,那么我們就必須在數(shù)據(jù)庫(kù)中能夠?qū)Χ嗑S數(shù)據(jù)進(jìn)行高效查詢(xún)。

      為了搞清楚這個(gè)問(wèn)題,我們需要做一系列實(shí)驗(yàn)。先來(lái)介紹一下我們實(shí)驗(yàn)的對(duì)象。

      假定我們有一張contacts表,然后對(duì)每個(gè)聯(lián)系人有一個(gè)contact_categories的表,簡(jiǎn)稱(chēng)cc

      create table cc(
      contact_id integer,
      cad_id integer,
      value integer);

      cad_id代表字段的id,value是cateogry的值。數(shù)據(jù)量是500萬(wàn)。實(shí)驗(yàn)用的數(shù)據(jù)庫(kù)分別是PostgreSQL(原始數(shù)據(jù)大小250M),和使用MYISAM引擎的MySQL(原始數(shù)據(jù)大小73M)。使用的磁盤(pán)是普通的筆記本硬盤(pán),沒(méi)有raid,普通的ext4分區(qū),峰值傳輸率大概是70M/s。假設(shè)沒(méi)有資源的爭(zhēng)搶?zhuān)覕?shù)據(jù)庫(kù)總是以最快的順序讀的方式從磁盤(pán)中加載數(shù)據(jù),那么PostgreSQL得用4s,而MySQL也需要1s才能把所有的磁盤(pán)內(nèi)容讀到內(nèi)存中。

      所以如果我們使用基于磁盤(pán)的解決方案的話(huà),無(wú)論如何也無(wú)法把查詢(xún)壓縮到1s以?xún)?nèi)。因?yàn)槲覀冃枰峁┮粋€(gè)Reponsive的界面前端,所以數(shù)據(jù)必須能夠在內(nèi)存中被查詢(xún),可能不是所有的數(shù)據(jù)都能放入內(nèi)存,但是最起碼被查詢(xún)到的數(shù)據(jù)得一直在內(nèi)存中。最簡(jiǎn)單的辦法把數(shù)據(jù)庫(kù)移到內(nèi)存中的方式不是改數(shù)據(jù)庫(kù)的設(shè)置,而是直接把內(nèi)存映射成文件夾:

      sudo mount -t ramfs -o size=200000m ramfs /mnt/memory

      然后把數(shù)據(jù)庫(kù)的數(shù)據(jù)目錄移動(dòng)到/mnt/memory之中。但是數(shù)據(jù)庫(kù)的緩存設(shè)置也是必須修改的,比如PostgreSQL的work_mem如果設(shè)置過(guò)小的話(huà),在做對(duì)一個(gè)很大的表做count(distinct xxx)時(shí)就會(huì)導(dǎo)致中間結(jié)果被寫(xiě)入到臨時(shí)表之中。所以我們還是把所有的緩存搞大一些吧。

      work_mem = 1000MB
      shared_buffer = 1000MB
      temp_buffer = 1000MB
      effective_cache_size = 1000MB
      wal_buffers = 1000MB
      auto_vacuum = off

      讓我們來(lái)看看最基本的一個(gè)查詢(xún)能有多快吧

      taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc;'
      count
      ---------
      5904385
      (1 row)


      real 0m0.448s
      user 0m0.024s
      sys 0m0.000s

      count(*)和count(contact_id)是一樣的么,讓我們來(lái)試驗(yàn)一下

      taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc;'
      count
      ---------
      5904385
      (1 row)


      real 0m0.537s
      user 0m0.016s
      sys 0m0.008s

      有意思!居然比count(*)還要慢。可能是因?yàn)槲覀儧](méi)有給contact_id字段加索引的原因。好吧,加上索引。

      taowen@dmright-perf:~$ time psql postgres -c 'create index contact_id_idx on cc(contact_id);'
      CREATE INDEX

      real 0m4.848s
      user 0m0.016s
      sys 0m0.008s
      taowen@dmright-perf:~$ time psql postgres -c 'analyze cc;'
      ANALYZE

      real 0m0.197s
      user 0m0.024s
      sys 0m0.004s

      再試試看

      taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc;'
      count
      ---------
      5904385
      (1 row)


      real 0m0.534s
      user 0m0.020s
      sys 0m0.008s

      taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc;'
      count
      ---------
      5904385
      (1 row)


      real 0m0.447s
      user 0m0.028s
      sys 0m0.008s

      基本上沒(méi)有變化……count(*)貌似就是比count(contact_id)要快。不管啦,既然小于500ms,也算是夠快了。讓我們給查詢(xún)加上個(gè)條件吧。

      taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from cc where cad_id = 101 and value = 5;'
      count
      --------
      998839
      (1 row)


      real 0m0.686s
      user 0m0.024s
      sys 0m0.000s

      taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc where cad_id = 101 and value = 5;'
      count
      --------
      998839
      (1 row)


      real 0m0.660s
      user 0m0.024s
      sys 0m0.000s

      我們可以看到,在count(*)和count(contact_id)之間沒(méi)有特別大的區(qū)別。但是我們還沒(méi)有給cad_id和value加索引,所以讓我們加上看看如何

      taowen@dmright-perf:~$ time psql postgres -c 'create index cad_id_value_idx on cc(cad_id, value);'
      CREATE INDEX

      real 0m10.069s
      user 0m0.020s
      sys 0m0.008s

      taowen@dmright-perf:~$ time psql postgres -c 'analyze cc;'
      ANALYZE

      real 0m0.199s
      user 0m0.016s
      sys 0m0.012s

      and try again.

      taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from cc where cad_id = 101 and value = 5;'
      count
      --------
      998839
      (1 row)


      real 0m0.283s
      user 0m0.020s
      sys 0m0.012s

      真是很快很快耶!讓我們來(lái)回顧一下,我們現(xiàn)在對(duì)于contact_id和(cad_id, value)都建立了索引。
      接下來(lái)我們把查詢(xún)弄得更復(fù)雜一些,加上AND條件。有三種可能的方式:
      1、INTERSECT
      2、INNER JOIN
      3、IN + SUB QUERY
      我們每種做法都試試

      taowen@dmright-perf:~$ time psql postgres -c 'select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp'
      count
      --------
      164788
      (1 row)


      real 0m1.159s
      user 0m0.028s
      sys 0m0.008s

      taowen@dmright-perf:~$ time psql postgres -c 'select count(*) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp' count
      --------
      164788
      (1 row)


      real 0m1.148s
      user 0m0.032s
      sys 0m0.000s

      顯而易見(jiàn),這么做很慢。那么INNER JOIN是不是更快一些呢?

      taowen@dmright-perf:~$ time psql postgres -c 'select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;'
      count
      --------
      164788
      (1 row)


      real 0m1.162s
      user 0m0.036s
      sys 0m0.000s

      這大概要慢上個(gè)200ms了。那么IN + SUB QUERY呢?

      taowen@dmright-perf:~$ time psql postgres -c 'select count(a1.contact_id) from cc as a1 where a1.cad_id = 101 and a1.value = 5 and a1.contact_id in (select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7)'
      count
      --------
      164788
      (1 row)


      real 0m2.645s
      user 0m0.024s
      sys 0m0.004s

      靠,居然更慢了。總結(jié)就是,在有兩個(gè)條件的情況下,INTERSECT似乎是最快的。但是即便如此,它也超過(guò)了一秒鐘了。為什么會(huì)這樣呢?時(shí)間都花哪里去了?

      taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7) as temp'
      QUERY PLAN
      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Aggregate (cost=107853.91..107853.92 rows=1 width=4) (actual time=1471.907..1471.907 rows=1 loops=1)
      -> Subquery Scan on temp (cost=7467.63..107375.65 rows=191301 width=4) (actual time=1366.131..1459.419 rows=164788 loops=1)
      -> HashSetOp Intersect (cost=7467.63..105462.64 rows=191301 width=4) (actual time=1366.129..1439.781 rows=164788 loops=1)
      -> Append (cost=7467.63..103326.69 rows=854380 width=4) (actual time=69.436..797.478 rows=1829408 loops=1)
      -> Subquery Scan on "*SELECT* 2" (cost=7467.63..48180.53 rows=351876 width=4) (actual time=69.435..350.031 rows=830569 loops=1)
      -> Bitmap Heap Scan on cc (cost=7467.63..44661.77 rows=351876 width=4) (actual time=69.434..264.538 rows=830569 loops=1)
      Recheck Cond: ((cad_id = 102) AND (value = 7))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7379.66 rows=351876 width=0) (actual time=64.162..64.162 rows=830569 loops=1)
      Index Cond: ((cad_id = 102) AND (value = 7))
      -> Subquery Scan on "*SELECT* 1" (cost=10667.56..55146.16 rows=502504 width=4) (actual time=69.846..331.544 rows=998839 loops=1)
      -> Bitmap Heap Scan on cc (cost=10667.56..50121.12 rows=502504 width=4) (actual time=69.845..233.794 rows=998839 loops=1)
      Recheck Cond: ((cad_id = 101) AND (value = 5))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10541.94 rows=502504 width=0) (actual time=64.501..64.501 rows=998839 loops=1)
      Index Cond: ((cad_id = 101) AND (value = 5))
      Total runtime: 1477.484 ms
      (15 rows)

      首先,INTERSECT不是并行執(zhí)行的。兩個(gè)子查詢(xún)分別花費(fèi)了300ms以上的時(shí)間,加起來(lái)有800ms是用在搜集contact_id上了。其次,集合之間的并集操作花費(fèi)了600多ms。其余的時(shí)間都花在了數(shù)集合的成員個(gè)數(shù)上了。有一點(diǎn)值得注意的是,根據(jù) http://postgresql.1045698.n5.nabble.com/ANTI-JOIN-needs-table-index-scan-not-possible-td3425340.html index scan和heap scan實(shí)際上都用上了索引。然而,SELECT還是在其之上額外耗費(fèi)了100ms,我猜測(cè)它可能是回到原始的表結(jié)構(gòu)中把行取出來(lái),以獲得contact_id的值。這在磁盤(pán)上的話(huà)速度會(huì)更慢,因?yàn)闀?huì)是random seek操作。
      看完了INTERSECT,讓我們?cè)賮?lái)分析分析Join:

      taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;'
      QUERY PLAN
      -----------------------------------------------------------------------------------------------------------------------------------------------------------
      Aggregate (cost=180366.31..180366.32 rows=1 width=4) (actual time=1280.964..1280.964 rows=1 loops=1)
      -> Merge Join (cost=174783.00..180104.02 rows=104912 width=4) (actual time=1043.879..1270.197 rows=164788 loops=1)
      Merge Cond: (a1.contact_id = a2.contact_id)
      -> Sort (cost=97705.18..98961.44 rows=502504 width=4) (actual time=586.735..626.292 rows=998839 loops=1)
      Sort Key: a1.contact_id
      Sort Method: quicksort Memory: 71397kB
      -> Bitmap Heap Scan on cc a1 (cost=10667.56..50121.12 rows=502504 width=4) (actual time=75.377..257.403 rows=998839 loops=1)
      Recheck Cond: ((cad_id = 101) AND (value = 5))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10541.94 rows=502504 width=0) (actual time=69.565..69.565 rows=998839 loops=1)
      Index Cond: ((cad_id = 101) AND (value = 5))
      -> Sort (cost=77077.83..77957.52 rows=351876 width=4) (actual time=457.131..492.428 rows=830569 loops=1)
      Sort Key: a2.contact_id
      Sort Method: quicksort Memory: 63509kB
      -> Bitmap Heap Scan on cc a2 (cost=7467.63..44661.77 rows=351876 width=4) (actual time=60.039..186.422 rows=830569 loops=1)
      Recheck Cond: ((cad_id = 102) AND (value = 7))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7379.66 rows=351876 width=0) (actual time=54.929..54.929 rows=830569 loops=1)
      Index Cond: ((cad_id = 102) AND (value = 7))
      Total runtime: 1286.846 ms
      (18 rows)

      貌似大部分時(shí)間都花在了排序上。而且它也不是并行執(zhí)行的。如果后臺(tái)同時(shí)執(zhí)行top命令的話(huà),就會(huì)發(fā)現(xiàn)只有一個(gè)核是被實(shí)際占用著的。
      要是再創(chuàng)建更多的索引呢?會(huì)不會(huì)有幫助?

      taowen@dmright-perf:~$ time psql postgres -c 'create index cad_id_value_contact_id on cc(cad_id, value, contact_id);'CREATE INDEX

      real 0m10.683s
      user 0m0.020s
      sys 0m0.004s
      taowen@dmright-perf:~$ time psql postgres -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id and a2.cad_id = 102 and a2.value = 7 where a1.cad_id = 101 and a1.value = 5;'
      QUERY PLAN
      -----------------------------------------------------------------------------------------------------------------------------------------------------------
      Aggregate (cost=180366.31..180366.32 rows=1 width=4) (actual time=1338.478..1338.478 rows=1 loops=1)
      -> Merge Join (cost=174783.00..180104.02 rows=104912 width=4) (actual time=1090.136..1327.312 rows=164788 loops=1)
      Merge Cond: (a1.contact_id = a2.contact_id)
      -> Sort (cost=97705.18..98961.44 rows=502504 width=4) (actual time=615.843..657.438 rows=998839 loops=1)
      Sort Key: a1.contact_id
      Sort Method: quicksort Memory: 71397kB
      -> Bitmap Heap Scan on cc a1 (cost=10667.56..50121.12 rows=502504 width=4) (actual time=80.926..275.431 rows=998839 loops=1)
      Recheck Cond: ((cad_id = 101) AND (value = 5))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10541.94 rows=502504 width=0) (actual time=75.816..75.816 rows=998839 loops=1)
      Index Cond: ((cad_id = 101) AND (value = 5))
      -> Sort (cost=77077.83..77957.52 rows=351876 width=4) (actual time=474.279..510.866 rows=830569 loops=1)
      Sort Key: a2.contact_id
      Sort Method: quicksort Memory: 63509kB
      -> Bitmap Heap Scan on cc a2 (cost=7467.63..44661.77 rows=351876 width=4) (actual time=65.335..198.655 rows=830569 loops=1)
      Recheck Cond: ((cad_id = 102) AND (value = 7))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7379.66 rows=351876 width=0) (actual time=60.314..60.314 rows=830569 loops=1)
      Index Cond: ((cad_id = 102) AND (value = 7))
      Total runtime: 1346.587 ms
      (18 rows)

      還是不行!Merge Join慢的話(huà),升級(jí)到9.1然后強(qiáng)制使用hash join會(huì)不會(huì)好一點(diǎn)?

      postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5';
      QUERY PLAN
      ----------------------------------------------------------------------------------------------------------------------------------------------------------
      Aggregate (cost=902356.85..902356.86 rows=1 width=4) (actual time=1693.137..1693.137 rows=1 loops=1)
      -> Hash Join (cost=59599.78..902135.88 rows=88389 width=4) (actual time=461.788..1682.718 rows=164788 loops=1)
      Hash Cond: (a1.contact_id = a2.contact_id)
      -> Bitmap Heap Scan on cc a1 (cost=10833.24..50406.32 rows=510472 width=4) (actual time=76.337..224.571 rows=998839 loops=1)
      Recheck Cond: ((cad_id = 101) AND (value = 5))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10705.62 rows=510472 width=0) (actual time=71.028..71.028 rows=998839 loops=1)
      Index Cond: ((cad_id = 101) AND (value = 5))
      -> Hash (cost=44444.15..44444.15 rows=345792 width=4) (actual time=385.256..385.256 rows=830569 loops=1)
      Buckets: 65536 Batches: 1 Memory Usage: 29200kB
      -> Bitmap Heap Scan on cc a2 (cost=7341.27..44444.15 rows=345792 width=4) (actual time=64.778..258.059 rows=830569 loops=1)
      Recheck Cond: ((cad_id = 102) AND (value = 7))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=59.675..59.675 rows=830569 loops=1)
      Index Cond: ((cad_id = 102) AND (value = 7))
      Total runtime: 1698.207 ms
      (14 rows)

      有兩個(gè)原因造成這樣的情況:
      1、兩個(gè)條件意味著掃描兩次,而且不是并行掃描
      2、join自身很耗費(fèi)時(shí)間,無(wú)論是sort merge join還是hash join

      兩個(gè)條件都這熊樣了,三個(gè)條件呢?看看吧,先上INTERSECT:

      postgres@dmright-perf:/mnt/memory/pg$ time psql postgres -c 'select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7 intersect select contact_id from cc where cad_id = 6 and value = 1) as temp'
      count
      -------
      6748
      (1 row)


      real 0m1.350s
      user 0m0.020s
      sys 0m0.008s

      然后是INNER JOIN

      postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id inner join cc as a3 on a1.contact_id = a3.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1';
      count
      -------
      6748
      (1 row)


      real 0m0.756s
      user 0m0.028s
      sys 0m0.000s

      然后是IN + SUBQUERY:

      postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'select count(a1.contact_id) from cc as a1 where a1.cad_id = 101 and a1.value = 5 and a1.contact_id in (select contact_id from cc as a2 where a2.cad_id = 102 and a2.value = 7 and a2.contact_id in (select contact_id from cc as a3 where a3.cad_id = 6 and a3.value = 1))';
      count
      -------
      6748
      (1 row)


      real 0m7.320s
      user 0m0.024s
      sys 0m0.004s

      我們可以看到這回INNER JOIN是最快的了。為什么?

      postgres@dmright-perf:/mnt/memory/pg$ time psql postgres -c 'explain analyze select count(contact_id) from (select contact_id from cc where cad_id = 101 and value = 5 intersect select contact_id from cc where cad_id = 102 and value = 7 intersect select contact_id from cc where cad_id = 6 and value = 1) as temp'
      QUERY PLAN
      -----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
      Aggregate (cost=132357.21..132357.22 rows=1 width=4) (actual time=1659.666..1659.666 rows=1 loops=1)
      -> Subquery Scan on temp (cost=306.85..132335.71 rows=8602 width=4) (actual time=1648.818..1659.243 rows=6748 loops=1)
      -> HashSetOp Intersect (cost=306.85..132249.69 rows=8602 width=4) (actual time=1648.817..1658.543 rows=6748 loops=1)
      -> Append (cost=306.85..131691.89 rows=223118 width=4) (actual time=33.797..1547.811 rows=372016 loops=1)
      -> Subquery Scan on "*SELECT* 3" (cost=306.85..26138.13 rows=14239 width=4) (actual time=33.797..153.230 rows=207228 loops=1)
      -> Bitmap Heap Scan on cc (cost=306.85..25995.74 rows=14239 width=4) (actual time=33.795..134.292 rows=207228 loops=1)
      Recheck Cond: ((cad_id = 6) AND (value = 1))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..303.29 rows=14239 width=0) (actual time=28.558..28.558 rows=207228 loops=1)
      Index Cond: ((cad_id = 6) AND (value = 1))
      -> Result (cost=7341.27..105553.76 rows=208879 width=4) (actual time=1282.426..1371.255 rows=164788 loops=1)
      -> HashSetOp Intersect (cost=7341.27..105553.76 rows=208879 width=4) (actual time=1282.418..1353.422 rows=164788 loops=1)
      -> Append (cost=7341.27..103413.10 rows=856264 width=4) (actual time=57.550..734.282 rows=1829408 loops=1)
      -> Subquery Scan on "*SELECT* 2" (cost=7341.27..47902.07 rows=345792 width=4) (actual time=57.550..292.595 rows=830569 loops=1)
      -> Bitmap Heap Scan on cc (cost=7341.27..44444.15 rows=345792 width=4) (actual time=57.549..218.162 rows=830569 loops=1)
      Recheck Cond: ((cad_id = 102) AND (value = 7))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=52.606..52.606 rows=830569 loops=1)
      Index Cond: ((cad_id = 102) AND (value = 7))
      -> Subquery Scan on "*SELECT* 1" (cost=10833.24..55511.04 rows=510472 width=4) (actual time=69.129..330.737 rows=998839 loops=1)
      -> Bitmap Heap Scan on cc (cost=10833.24..50406.32 rows=510472 width=4) (actual time=69.128..242.416 rows=998839 loops=1)
      Recheck Cond: ((cad_id = 101) AND (value = 5))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..10705.62 rows=510472 width=0) (actual time=64.161..64.161 rows=998839 loops=1)
      Index Cond: ((cad_id = 101) AND (value = 5))
      Total runtime: 1665.691 ms
      (23 rows)

      對(duì)于INTERSECT來(lái)說(shuō),過(guò)程和兩個(gè)條件是差不多的,只是集合更大一些罷了。

      postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1 inner join cc as a2 on a1.contact_id = a2.contact_id inner join cc as a3 on a1.contact_id = a3.contact_id where a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1';
      QUERY PLAN
      ----------------------------------------------------------------------------------------------------------------------------------------------------------------
      Aggregate (cost=118979.47..118979.48 rows=1 width=4) (actual time=771.393..771.393 rows=1 loops=1)
      -> Nested Loop (cost=49073.39..118977.89 rows=630 width=4) (actual time=399.835..770.587 rows=6748 loops=1)
      -> Hash Join (cost=49073.39..97159.97 rows=2466 width=8) (actual time=399.814..691.519 rows=34219 loops=1)
      Hash Cond: (a3.contact_id = a2.contact_id)
      -> Bitmap Heap Scan on cc a3 (cost=306.85..25995.74 rows=14239 width=4) (actual time=25.655..83.526 rows=207228 loops=1)
      Recheck Cond: ((cad_id = 6) AND (value = 1))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..303.29 rows=14239 width=0) (actual time=20.570..20.570 rows=207228 loops=1)
      Index Cond: ((cad_id = 6) AND (value = 1))
      -> Hash (cost=44444.15..44444.15 rows=345792 width=4) (actual time=373.969..373.969 rows=830569 loops=1)
      Buckets: 65536 Batches: 1 Memory Usage: 29200kB
      -> Bitmap Heap Scan on cc a2 (cost=7341.27..44444.15 rows=345792 width=4) (actual time=59.271..250.932 rows=830569 loops=1)
      Recheck Cond: ((cad_id = 102) AND (value = 7))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=54.030..54.030 rows=830569 loops=1)
      Index Cond: ((cad_id = 102) AND (value = 7))
      -> Index Scan using cad_id_value_contact_id_idx on cc a1 (cost=0.00..8.83 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=34219)
      Index Cond: ((cad_id = 101) AND (value = 5) AND (contact_id = a2.contact_id))
      Total runtime: 774.588 ms
      (17 rows)

      對(duì)于INNER JOIN,索引都被利用上了。它不需要取得contact_id然后再來(lái)做集合操作。結(jié)論是對(duì)rowid做hash操作比集合操作更快。
      不顯示用INNER JOIN,讓Planner決定Join順序也是一樣的:

      postgres@dmright-perf:/mnt/memory/pg$ time psql -c 'explain analyze select count(a1.contact_id) from cc as a1, cc as a2, cc as a3 where a1.contact_id = a2.contact_id and a2.contact_id = a3.contact_id and a2.cad_id = 102 and a2.value = 7 and a1.cad_id = 101 and a1.value = 5 and a3.cad_id = 6 and a3.value = 1';
      QUERY PLAN
      ----------------------------------------------------------------------------------------------------------------------------------------------------------------
      Aggregate (cost=118979.47..118979.48 rows=1 width=4) (actual time=762.969..762.970 rows=1 loops=1)
      -> Nested Loop (cost=49073.39..118977.89 rows=630 width=4) (actual time=398.554..762.206 rows=6748 loops=1)
      -> Hash Join (cost=49073.39..97159.97 rows=2466 width=8) (actual time=398.531..684.425 rows=34219 loops=1)
      Hash Cond: (a3.contact_id = a2.contact_id)
      -> Bitmap Heap Scan on cc a3 (cost=306.85..25995.74 rows=14239 width=4) (actual time=34.802..91.672 rows=207228 loops=1)
      Recheck Cond: ((cad_id = 6) AND (value = 1))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..303.29 rows=14239 width=0) (actual time=29.524..29.524 rows=207228 loops=1)
      Index Cond: ((cad_id = 6) AND (value = 1))
      -> Hash (cost=44444.15..44444.15 rows=345792 width=4) (actual time=363.537..363.537 rows=830569 loops=1)
      Buckets: 65536 Batches: 1 Memory Usage: 29200kB
      -> Bitmap Heap Scan on cc a2 (cost=7341.27..44444.15 rows=345792 width=4) (actual time=57.799..245.467 rows=830569 loops=1)
      Recheck Cond: ((cad_id = 102) AND (value = 7))
      -> Bitmap Index Scan on cad_id_value_idx (cost=0.00..7254.82 rows=345792 width=0) (actual time=52.349..52.349 rows=830569 loops=1)
      Index Cond: ((cad_id = 102) AND (value = 7))
      -> Index Scan using cad_id_value_contact_id_idx on cc a1 (cost=0.00..8.83 rows=1 width=4) (actual time=0.002..0.002 rows=0 loops=34219)
      Index Cond: ((cad_id = 101) AND (value = 5) AND (contact_id = a2.contact_id))
      Total runtime: 766.107 ms
      (17 rows)

      結(jié)果是一樣的。

      基本上對(duì)PostgreSQL的實(shí)驗(yàn)就到這里了,差不多也就這樣了,提高空間不大。Google之后發(fā)現(xiàn),PostgreSQL使用的MVCC機(jī)制導(dǎo)致其甚至在做SELECT COUNT這樣的操作的時(shí)候也會(huì)去更新hint bit。也許這就是其慢的重要原因。MySQL的MYISAM引擎以完全不負(fù)責(zé)事務(wù)和著稱(chēng),應(yīng)該會(huì)比PostgreSQL有更大的提升,下一篇中,我們將對(duì)MySQL重復(fù)同樣的實(shí)驗(yàn)。


























       



      posted on 2012-02-26 08:33  taowen  閱讀(1401)  評(píng)論(0)    收藏  舉報(bào)
      主站蜘蛛池模板: 日韩av毛片福利国产福利| 欧美激情在线播放| 狠狠色狠狠色综合| 免费无码影视在线观看mov| 亚洲精品一区二区毛豆| 国产在线精品一区二区三区直播| 三级4级全黄60分钟| 亚洲啪啪精品一区二区的| 美女胸18下看禁止免费视频| 中文字幕日韩精品有码| 精品偷拍一区二区三区在| 成人无码www在线看免费 | 永登县| 亚洲www永久成人网站| 人妻少妇精品视频三区二区 | 日本a在线播放| 成人精品一区二区三区四| 亚洲产在线精品亚洲第一站一| 欧美乱码卡一卡二卡四卡免费| 精品无码久久久久久尤物| 国产福利萌白酱在线观看视频| 亚洲熟妇自偷自拍另亚洲| 国产高清自产拍av在线| 日日碰狠狠躁久久躁综合小说 | 任我爽精品视频在线播放| 国内精品一区二区不卡| 日本黄漫动漫在线观看视频| 国产又爽又黄又刺激的视频| 亚洲人成网站免费播放| 许昌市| 亚洲无码精品视频| 亚洲成av人片无码不卡播放器| 26uuu另类亚洲欧美日本| 国产精品线在线精品| 亚洲不卡av不卡一区二区| 漂亮人妻中文字幕丝袜| 久久精品国产亚洲av熟女| 国产手机在线αⅴ片无码观看| a级亚洲片精品久久久久久久| 日韩高清砖码一二区在线| 五月天天天综合精品无码|