repmgr+keepalived+pgbouncer構建postgresql集群

安裝環境

操作系統：openeuler 22 LTS SP3
數據庫：postgresql 16.3
兩臺已經安裝了數據庫的服務器

10.11.110.46	node1
10.11.110.47	node2

配置node1和node2之間的免密登錄，以及postgres用戶需要無密執行systemd啟停數據庫服務。

echo "postgres ALL = (root) NOPASSWD:/usr/bin/systemctl start postgresql.service,/usr/bin/systemctl restart postgresql.service,/usr/bin/systemctl stop postgresql.service,/usr/bin/systemctl reload postgresql.service,/usr/bin/systemctl status postgresql.service" | sudo tee /etc/sudoers.d/postgres

chmod 0440 /etc/sudoers.d/postgres
visudo -c

注意這里配置了sudo密碼執行systemctl命令，手動運行的時候也必須指定命令全路徑
/etc/hosts配置ip與主機名映射關系

10.11.110.46 node1
10.11.110.47 node2

repmgr源碼包下載
https://www.repmgr.org/download/repmgr-5.4.1.tar.gz

安裝repmgr

su - postgres
tar  zxvf repmgr-5.4.1.tar.gz
export PG_CONFIG=/usr/local/postgresql/bin/pg_config
./configure
make
make install

如果編譯報錯如下，是因為缺少包導致

/usr/bin/ld: 找不到 -lcurl
/usr/bin/ld: 找不到 -ljson-c

安裝依賴包

dnf install libcurl-devel json-c-devel -y

配置repmgr

postgresql配置

選擇node1作為主節點，修改配置文件/usr/local/postgresql/data/postgresql.conf

max_wal_senders = 10
max_replication_slots = 10
wal_level = replica
hot_standby = on
archive_mode = on
wal_log_hints = on
archive_command = '/bin/true'
shared_preload_libraries = 'repmgr'

修改配置文件/usr/local/postgresql/data/pg_hba.conf，配置repmgr用戶的認證權限，確保其有適當的權限，并且能夠以復制模式連接

local   replication   repmgr                                    trust
host   replication   repmgr            127.0.0.1/32            trust
host    replication   repmgr            10.11.110.46/32         trust
host    replication   repmgr            10.11.110.47/32         trust
local   repmgr        repmgr                                    trust
host    repmgr        repmgr            127.0.0.1/32            trust
host    repmgr        repmgr            10.11.110.46/32         trust
host    repmgr        repmgr            10.11.110.47/32         trust

修改完成后重啟數據庫systemctl restart postgresql
在node1上創建repmgr超級用戶和repmgr數據庫

su - postgres
createuser  -s repmgr
createdb repmgr -O repmgr
psql
postgres=# ALTER USER repmgr ENCRYPTED PASSWORD 'repmgr';

備節點配置

在node2上停止postgresql，并刪除其數據目錄，如果之前沒有初始化數據庫就不用管。
從備節點測試是否能夠正常連接主庫
psql 'host=node1 user=repmgr dbname=repmgr connect_timeout=2'

創建repmgr配置文件

在node1和node2上創建/usr/local/postgresql/data/repmgr.conf
node1的配置文件

node_id=1
node_name=node1
conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/usr/local/postgresql/data'
log_file='/usr/local/postgresql/log/repmgr.log'
pg_bindir='/usr/local/postgresql/bin'
ssh_options='-q -o ConnectTimeout=10'
failover='automatic'
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /usr/local/postgresql/repmgr/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /usr/local/postgresql/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'
service_start_command='sudo systemctl start postgresql.service'
service_stop_command='sudo systemctl stop postgresql.service'
service_restart_command='sudo systemctl restart postgresql.service'
service_reload_command='sudo systemctl reload postgresql.service'

node2的配置文件

node_id=2
node_name=node2
conninfo='host=node2 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/usr/local/postgresql/data'
log_file='/usr/local/postgresql/log/repmgr.log'
pg_bindir='/usr/local/postgresql/bin'
ssh_options='-q -o ConnectTimeout=10'
failover='automatic'
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /usr/local/postgresql/repmgr/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /usr/local/postgresql/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'
service_start_command='sudo systemctl start postgresql.service'
service_stop_command='sudo systemctl stop postgresql.service'
service_restart_command='sudo systemctl restart postgresql.service'
service_reload_command='sudo systemctl reload postgresql.service'

其他配置項參考樣例：https://raw.githubusercontent.com/EnterpriseDB/repmgr/master/repmgr.conf.sample

注冊主節點

在主節點node1上注冊repmgr主節點信息，將安裝repmgr擴展和元數據對象，并為主服務器添加一個元數據記錄。

repmgr -f /usr/local/postgresql/repmgr/repmgr.conf primary register

INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered

查看集群狀態

repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
 
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                     
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        | host=node1 user=repmgr dbname=repmgr connect_timeout=2

repmgr元數據如下：

[postgres@node1 data]$ psql -Urepmgr
repmgr=# \x
Expanded display is on.
repmgr=# select * from repmgr.nodes;
-[ RECORD 1 ]----+-------------------------------------------------------
node_id          | 1
upstream_node_id | 
active           | t
node_name        | node1
type             | primary
location         | default
priority         | 100
conninfo         | host=node1 user=repmgr dbname=repmgr connect_timeout=2
repluser         | repmgr
slot_name        | 
config_file      | /usr/local/postgresql/repmgr/repmgr.conf

一個PostgreSQL復制集群中的每個服務器都將擁有自己的記錄。在使用repmgrd時，當節點狀態或角色發生變化時，upstream_node_id、active和type字段會更新。

備節點配置

在node2節點上測試（--dry-run）一下是否能正常clone主庫數據：

[postgres@node2 data]$ repmgr -h node1 -U repmgr -d repmgr -f /usr/local/postgresql/repmgr/repmgr.conf standby clone --dry-run
NOTICE: destination directory "/usr/local/postgresql/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=node1 user=repmgr dbname=repmgr
DETAIL: current installation size is 29 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: would execute:
  /usr/local/postgresql/pg_basebackup -l "repmgr base backup"  -D /usr/local/postgresql/data -h node1 -p 5432 -U repmgr -X stream 
INFO: all prerequisites for "standby clone" are met

沒有報錯的情況下，開始正式克隆

[postgres@node2 postgresql]$ repmgr  -h node1 -U repmgr -d repmgr -f /usr/local/postgresql/repmgr/repmgr.conf standby clone
NOTICE: destination directory "/usr/local/postgresql/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=node1 user=repmgr dbname=repmgr
DETAIL: current installation size is 30 MB
INFO: replication slot usage not requested;  no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/usr/local/postgresql/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
  /usr/local/postgresql/bin/pg_basebackup -l "repmgr base backup"  -D /usr/local/postgresql/data -h node1 -p 5432 -U repmgr -X stream 
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: sudo systemctl start pgserver
HINT: after starting the server, you need to register this standby with "repmgr standby register"

實際上使用了pg_basebackup命令clone了主節點的數據目錄文件。主節點數據目錄中的配置文件也都被復制到了備節點的數據目錄中，包括postgresql.conf, postgresql.auto.conf, pg_hba.conf和pg_ident.conf。如果不需要針對備節點定制修改這些配置的話，就可以啟動備節點的數據庫服務了:
sudo /usr/bin/systemctl start postgresql.service

驗證復制是否正常工作

在主庫node1上查看復制情況

[postgres@node1 repmgr]$ psql   -Urepmgr
psql (16.3)
Type "help" for help.

repmgr=# \x
Expanded display is on.
repmgr=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid              | 7665
usesysid         | 16388
usename          | repmgr
application_name | node2
client_addr      | 10.11.110.47
client_hostname  | 
client_port      | 40102
backend_start    | 2024-05-28 17:39:27.557961+08
backend_xmin     | 
state            | streaming
sent_lsn         | 0/7000328
write_lsn        | 0/7000328
flush_lsn        | 0/7000328
replay_lsn       | 0/7000328
write_lag        | 
flush_lag        | 
replay_lag       | 
sync_priority    | 0
sync_state       | async
reply_time       | 2024-05-28 21:48:53.074483+08

備庫node2上查看復制情況

[postgres@node1 repmgr]$ psql   -Urepmgr
psql (16.3)
Type "help" for help.

repmgr=# \x
Expanded display is on.
repmgr=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid                   | 6255
status                | streaming
receive_start_lsn     | 0/7000000
receive_start_tli     | 1
written_lsn           | 0/7000328
flushed_lsn           | 0/7000328
received_tli          | 1
last_msg_send_time    | 2024-05-28 21:48:33.053217+08
last_msg_receipt_time | 2024-05-28 21:48:33.054117+08
latest_end_lsn        | 0/7000328
latest_end_time       | 2024-05-28 17:42:57.692486+08
slot_name             | 
sender_host           | node1
sender_port           | 5432
conninfo              | user=repmgr passfile=/home/postgres/.pgpass channel_binding=disable connect_timeout=2 dbname=replication host=node1 port=5432 application_name=node2 fallback_application_name=walreceiver sslmode=disable sslcompression=0 sslcertmode=disable sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable

注冊備節點

在node2上使用repmgr命令將node2注冊為備節點

[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf standby register
INFO: connecting to local node "node2" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "node2" (ID: 2) successfully registered

此時再次查看集群狀態

[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                     
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 1        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | standby |   running | node1    | default  | 100      | 1        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

手動切換測試

在備節點node2上執行切換操作，手動將node2切換為主節點

[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf standby switchover
NOTICE: executing switchover on node "node2" (ID: 2)
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "node1" (ID: 1)
NOTICE: issuing CHECKPOINT on node "node1" (ID: 1) 
DETAIL: executing server command "sudo systemctl stop postgresql.service"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/9000028
NOTICE: waiting up to 30 seconds (parameter "wal_receive_check_timeout") for received WAL to flush to disk
INFO: sleeping 1 of maximum 30 seconds waiting for standby to flush received WAL to disk
INFO: sleeping 2 of maximum 30 seconds waiting for standby to flush received WAL to disk
INFO: sleeping 3 of maximum 30 seconds waiting for standby to flush received WAL to disk
[...]
INFO: sleeping 30 of maximum 30 seconds waiting for standby to flush received WAL to disk
WARNING: local node "node2" is behind shutdown primary "node1"
DETAIL: local node last receive LSN is 0/82A0000, primary shutdown checkpoint LSN is 0/9000028
NOTICE: aborting switchover
HINT: use --always-promote to force promotion of standby
[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
 ID | Name  | Role    | Status        | Upstream | Location | Priority | Timeline | Connection string                                     
----+-------+---------+---------------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | ? unreachable | ?        | default  | 100      |          | host=node1 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | standby |   running     | ? node1  | default  | 100      | 1        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

第一次切換可能會遇到如上的錯誤，主節點停掉了，但是報錯WARNING: local node "node2" is behind shutdown primary "node1"。查詢資料得到的解決辦法是，將postgresql.conf配置文件做如下修改

archive_command = '{ sleep 5;true; }'

重新加載配置sudo /usr/bin/systemctl reload postgresql.service ，再執行手動切換

[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf standby switchover
NOTICE: executing switchover on node "node2" (ID: 2)
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "node1" (ID: 1)
NOTICE: issuing CHECKPOINT on node "node1" (ID: 1) 
DETAIL: executing server command "sudo systemctl stop postgresql.service"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/A000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary
NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby
NOTICE: switchover was successful
DETAIL: node "node2" is now primary and node "node1" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully

重新查看集群狀態

[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                     
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | standby |   running | node2    | default  | 100      | 1        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

使用repmgrd實現自動故障轉移

啟動repmgrd

在node1和node2上配置systemd配置文件/lib/systemd/system/repmgrd.service

[Unit]
Description=A replication manager, and failover management tool for PostgreSQL
After=syslog.target
After=network.target
After=postgresql.service

[Service]
Type=forking

User=postgres
Group=postgres

# PID file
PIDFile=/usr/local/postgresql/data/repmgrd.pid

# Location of repmgr conf file:
Environment=REPMGRDCONF=/usr/local/postgresql/repmgr/repmgr.conf
Environment=PIDFILE=/usr/local/postgresql/data/repmgrd.pid

# Where to send early-startup messages from the server
# This is normally controlled by the global default set by systemd
# StandardOutput=syslog
ExecStart=/usr/local/postgresql/bin/repmgrd -f ${REPMGRDCONF} -p ${PIDFILE} -d --verbose
ExecStop=/usr/bin/kill -TERM $MAINPID
ExecReload=/usr/bin/kill -HUP $MAINPID

# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=300

[Install]
WantedBy=multi-user.target

在node1和node2上啟動repmgrd服務

systemctl enable repmgrd --now

自動故障轉移測試

當前集群狀態

[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                     
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | standby |   running | node2    | default  | 100      | 2        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | * running |          | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

當前主節點在node2，模擬主節點宕掉

[postgres@node2 ~]$ sudo systemctl stop postgresql.service

等一會兒查看集群狀態，node1節點已提升為主節點

[postgres@node1 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                     
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 3        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | - failed  | ?        | default  | 100      |          | host=node2 user=repmgr dbname=repmgr connect_timeout=2

重新啟動node2節點postgresql，會出現下面的狀況，這個也是這種主備高可用方案的缺點，出現了兩個主節點

[postgres@node1 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
 ID | Name  | Role    | Status    | Upstream | Location | Priority | Timeline | Connection string                                     
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
 1  | node1 | primary | * running |          | default  | 100      | 3        | host=node1 user=repmgr dbname=repmgr connect_timeout=2
 2  | node2 | primary | ! running |          | default  | 100      | 2        | host=node2 user=repmgr dbname=repmgr connect_timeout=2

node2的狀態是不對的，先將postgresql再次停止，并重新以備節點的角色加入到集群

[postgres@node2 ~]$ sudo systemctl stop postgresql.service 
[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf node rejoin  -d 'host=node1 dbname=repmgr user=repmgr' --force-rewind
NOTICE: rejoin target is node "node1" (ID: 1)
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 1
DETAIL: rejoin target server's timeline 3 forked off current database system timeline 2 before current recovery point 0/F000028
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/usr/local/postgresql/bin/pg_rewind -D '/usr/local/postgresql/data' --source-server='host=node1 user=repmgr dbname=repmgr connect_timeout=2'"
NOTICE: 0 files copied to /usr/local/postgresql/data
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=node2 user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "sudo systemctl start postgresql.service"
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 1

注意：這里使用了--force-rewind參數，正常情況下先不加該參數執行rejoin，不成功的情況下再添加該參數。-d 參數后面指定的host為當前的主節點。在正式進行操作前可以先加--dry-run參數測試一下能否正常執行。
因此，如果是生產上用的雙節點主備集群，那么當發生主備切換后，手動將主節點啟動的時候就直接rejoin
repmgr -f /usr/local/postgresql/repmgr/repmgr.conf node rejoin -d 'host=node1 dbname=repmgr user=repmgr'

keepalived實現VIP

repmgr自身沒有vip配置功能，可以使用keepalived來實現vip功能。
使用yum安裝keepalived即可，配置如下
MASTER節點/etc/keepalived/keepalived.conf

vrrp_script chk_myscript {
  script       "/usr/local/bin/is_postgres_primary.sh"
  interval 3   # check every 3 seconds
  fall 3       # require 3 failures for KO
  rise 3       # require 3 successes for OK
  user postgres postgres
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    virtual_router_id 51
    priority 100   # make all keepalived have different priority
    nopreempt
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 123465
    }
    virtual_ipaddress {
        10.11.88.44/16     # this is the standard VIP, change as needed
    }

    track_script {
        chk_myscript
    }
}

BACKUP節點/etc/keepalived/keepalived.conf

vrrp_script chk_myscript {
  script       "/usr/local/bin/is_postgres_primary.sh"
  interval 3   # check every 3 seconds
  fall 3       # require 3 failures for KO
  rise 3       # require 3 successes for OK
  user postgres postgres
}

vrrp_instance VI_1 {
    state BACKUP
    interface ens33
    virtual_router_id 51
    priority 90   # make all keepalived have different priority
    nopreempt
    advert_int 1
    authentication {
        auth_type PASS
        auth_pass 123465
    }
    virtual_ipaddress {
        10.11.88.44/16     # this is the standard VIP, change as needed
    }

    track_script {
        chk_myscript
    }
}

postgresql檢查腳本/usr/local/bin/is_postgres_primary.sh，記得賦予執行權限

#!/bin/bash
export PATH=/usr/local/postgresql/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
export PGDATA=/usr/local/postgresql/data
export PGDATABASE=postgres
export PGUSER=postgres
export PGPORT=5432
psql -q -t -c 'select pg_is_in_recovery()' |grep 'f'

配置好后啟動keepalived，只有在對應節點postgresql正常運行且為primary節點時，VIP才會在當前服務器。

pgbouncer安裝

下載源碼包
https://www.pgbouncer.org/downloads/files/1.22.1/pgbouncer-1.22.1.tar.gz
安裝依賴
dnf install make make-devel libevent libevent-devel pkgconfig pkgconfig-devel openssl openssl-devel -y
編譯安裝

tar zxvf pgbouncer-1.22.1.tar.gz && cd pgbouncer-1.22.1
./configure --prefix=/usr/local/pgbouncer --with-systemd
make && make install

安裝好以后樣例配置文件及service配置文件等在路徑/usr/local/pgbouncer/share/doc/pgbouncer下面。
pgbouncer.ini配置

[databases]
test = host=10.11.88.44 port=5432 pool_size=20 dbname=test
#如果懶得配指定數據庫也可以用下面這種通配符方式
* = host=10.11.88.44 port=5432 pool_size=20

[pgbouncer]
logfile = /usr/local/pgbouncer/log/pgbouncer.log
pidfile = /usr/local/pgbouncer/pgbouncer.pid

listen_addr = *
listen_port = 16432

auth_type = md5
auth_file = /usr/local/pgbouncer/userlist.txt

admin_users = postgres
stats_users = stats, root

pool_mode = session
server_reset_query = DISCARD ALL
server_check_query = select 1

max_client_conn = 1000
default_pool_size = 20
min_pool_size = 10
reserve_pool_size = 10
reserve_pool_timeout = 5
max_db_connections = 100
max_user_connections = 100

auth_file中的用戶即訪問后端真實postgresql數據庫中的所需要的用戶。
auth_type 配置為md5是會自動兼容md5和scram-sha-256兩種加密方式。PG14之后默認scram-sha-256?？赏ㄟ^select usename,passwd from pg_shadow order by 1;查看。
日志文件路徑要提前創建mkdir /usr/local/pgbouncer/log
host 這里配置的VIP地址

userlist.txt文件的內容可以直接從數據庫pg_shadow里查詢出來粘貼進去，樣例如下：

"bill" "SCRAM-SHA-256$4096:orUtCmrN6jsZDygp3sJuog==$JevG8fFxys35E4Qe+aonmOm52PvXEHhmMgmCumsL9TM=:6E1YwsQc4CMYIXsy2koQFD2sbbphZb1s+dENcotanTM="

配置systemd啟動腳本/lib/systemd/system/pgbouncer.service

[Unit]
Description=connection pooler for PostgreSQL
Documentation=man:pgbouncer(1)
Documentation=https://www.pgbouncer.org/
After=network-online.target
Wants=network-online.target

[Service]
Type=notify
User=postgres
ExecStart=/usr/local/pgbouncer/bin/pgbouncer /usr/local/pgbouncer/pgbouncer.ini
ExecReload=/bin/kill -HUP $MAINPID
KillSignal=SIGINT

[Install]
WantedBy=multi-user.target

啟動pgbouncer后就可以連接數據庫了
systemctl enable pgbouncer.service --now
psql -h 10.11.88.44 -p 16432 -U bill -d test
本地登錄pgbouncer控制臺，可以查看連接狀態等信息。

[postgres@node1 ~]$  psql -p 16432 pgbouncer pgbouncer
psql (16.3, server 1.22.1/bouncer)
WARNING: psql major version 16, server major version 1.22.
         Some psql features might not work.
Type "help" for help.

pgbouncer=# show help;
NOTICE:  Console usage
DETAIL:  
        SHOW HELP|CONFIG|DATABASES|POOLS|CLIENTS|SERVERS|USERS|VERSION
        SHOW PEERS|PEER_POOLS
        SHOW FDS|SOCKETS|ACTIVE_SOCKETS|LISTS|MEM|STATE
        SHOW DNS_HOSTS|DNS_ZONES
        SHOW STATS|STATS_TOTALS|STATS_AVERAGES|TOTALS
        SET key = arg
        RELOAD
        PAUSE [<db>]
        RESUME [<db>]
        DISABLE <db>
        ENABLE <db>
        RECONNECT [<db>]
        KILL <db>
        SUSPEND
        SHUTDOWN
        WAIT_CLOSE [<db>]
SHOW
pgbouncer=# show clients;
 type |   user    | database  | state  | addr | port  | local_addr | local_port |      connect_time       |      request_time       | wait | wait_us | close_needed |   ptr    | link | remote_pid | tls | application_name | prepared_stateme
nts 
------+-----------+-----------+--------+------+-------+------------+------------+-------------------------+-------------------------+------+---------+--------------+----------+------+------------+-----+------------------+-----------------
----
 C    | pgbouncer | pgbouncer | active | unix | 16432 | unix       |      16432 | 2024-05-29 21:27:14 CST | 2024-05-29 21:27:20 CST |    4 |  390026 |            0 | 0xf5a2a0 |      |      84252 |     | psql             |                 
  0
 C    | bill      | test      | active | unix | 16432 | unix       |      16432 | 2024-05-29 21:13:41 CST | 2024-05-29 21:13:41 CST |    0 |       0 |            0 | 0xf59ff0 |      |      82562 |     | psql             |                 
  0
(2 rows)

總結

通過repmgrd實現自動故障轉移，結合keepalived使訪問請求只會發到primary節點，再加上pgbouncer進行連接池控制，基本上實現了線上最少服務器情況下的postgresql高可用架構。
不足：在發生自動故障轉移后，需要手動干預將故障節點重新加入集群。
特殊場景1：即使故障轉移后原主節點postgresql被手動或自動的意外拉起來，造成雙主的情況也不怕，因為keepalived配置的非搶占模式，vip還是停留在新主節點服務器上。
特殊場景2：手動把主節點的keepalived停掉，此時備節點keepalived檢查腳本檢測到本機postgresql仍是備節點，故不會將vip添加的備機，當前就處于沒有vip的情況，無法對外提供數據庫訪問服務。

posted on 2024-06-03 18:12 lzc_jack 閱讀(672) 評論(0) 收藏舉報

刷新頁面返回頂部