安裝環境
操作系統:openeuler 22 LTS SP3
數據庫:postgresql 16.3
兩臺已經安裝了數據庫的服務器
10.11.110.46 node1
10.11.110.47 node2
配置node1和node2之間的免密登錄,以及postgres用戶需要無密執行systemd啟停數據庫服務。
echo "postgres ALL = (root) NOPASSWD:/usr/bin/systemctl start postgresql.service,/usr/bin/systemctl restart postgresql.service,/usr/bin/systemctl stop postgresql.service,/usr/bin/systemctl reload postgresql.service,/usr/bin/systemctl status postgresql.service" | sudo tee /etc/sudoers.d/postgres
chmod 0440 /etc/sudoers.d/postgres
visudo -c
注意這里配置了sudo密碼執行systemctl命令,手動運行的時候也必須指定命令全路徑
/etc/hosts配置ip與主機名映射關系
10.11.110.46 node1
10.11.110.47 node2
repmgr源碼包下載
https://www.repmgr.org/download/repmgr-5.4.1.tar.gz
安裝repmgr
su - postgres
tar zxvf repmgr-5.4.1.tar.gz
export PG_CONFIG=/usr/local/postgresql/bin/pg_config
./configure
make
make install
如果編譯報錯如下,是因為缺少包導致
/usr/bin/ld: 找不到 -lcurl
/usr/bin/ld: 找不到 -ljson-c
安裝依賴包
dnf install libcurl-devel json-c-devel -y
配置repmgr
postgresql配置
選擇node1作為主節點,修改配置文件/usr/local/postgresql/data/postgresql.conf
max_wal_senders = 10
max_replication_slots = 10
wal_level = replica
hot_standby = on
archive_mode = on
wal_log_hints = on
archive_command = '/bin/true'
shared_preload_libraries = 'repmgr'
修改配置文件/usr/local/postgresql/data/pg_hba.conf,配置repmgr用戶的認證權限,確保其有適當的權限,并且能夠以復制模式連接
local replication repmgr trust
host replication repmgr 127.0.0.1/32 trust
host replication repmgr 10.11.110.46/32 trust
host replication repmgr 10.11.110.47/32 trust
local repmgr repmgr trust
host repmgr repmgr 127.0.0.1/32 trust
host repmgr repmgr 10.11.110.46/32 trust
host repmgr repmgr 10.11.110.47/32 trust
修改完成后重啟數據庫systemctl restart postgresql
在node1上創建repmgr超級用戶和repmgr數據庫
su - postgres
createuser -s repmgr
createdb repmgr -O repmgr
psql
postgres=# ALTER USER repmgr ENCRYPTED PASSWORD 'repmgr';
備節點配置
在node2上停止postgresql,并刪除其數據目錄,如果之前沒有初始化數據庫就不用管。
從備節點測試是否能夠正常連接主庫
psql 'host=node1 user=repmgr dbname=repmgr connect_timeout=2'
創建repmgr配置文件
在node1和node2上創建/usr/local/postgresql/data/repmgr.conf
node1的配置文件
node_id=1
node_name=node1
conninfo='host=node1 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/usr/local/postgresql/data'
log_file='/usr/local/postgresql/log/repmgr.log'
pg_bindir='/usr/local/postgresql/bin'
ssh_options='-q -o ConnectTimeout=10'
failover='automatic'
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /usr/local/postgresql/repmgr/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /usr/local/postgresql/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'
service_start_command='sudo systemctl start postgresql.service'
service_stop_command='sudo systemctl stop postgresql.service'
service_restart_command='sudo systemctl restart postgresql.service'
service_reload_command='sudo systemctl reload postgresql.service'
node2的配置文件
node_id=2
node_name=node2
conninfo='host=node2 user=repmgr dbname=repmgr connect_timeout=2'
data_directory='/usr/local/postgresql/data'
log_file='/usr/local/postgresql/log/repmgr.log'
pg_bindir='/usr/local/postgresql/bin'
ssh_options='-q -o ConnectTimeout=10'
failover='automatic'
promote_command='/usr/local/postgresql/bin/repmgr standby promote -f /usr/local/postgresql/repmgr/repmgr.conf --log-to-file'
follow_command='/usr/local/postgresql/bin/repmgr standby follow -f /usr/local/postgresql/repmgr/repmgr.conf --log-to-file --upstream-node-id=%n'
service_start_command='sudo systemctl start postgresql.service'
service_stop_command='sudo systemctl stop postgresql.service'
service_restart_command='sudo systemctl restart postgresql.service'
service_reload_command='sudo systemctl reload postgresql.service'
其他配置項參考樣例:https://raw.githubusercontent.com/EnterpriseDB/repmgr/master/repmgr.conf.sample
注冊主節點
在主節點node1上注冊repmgr主節點信息,將安裝repmgr擴展和元數據對象,并為主服務器添加一個元數據記錄。
repmgr -f /usr/local/postgresql/repmgr/repmgr.conf primary register
INFO: connecting to primary database...
NOTICE: attempting to install extension "repmgr"
NOTICE: "repmgr" extension successfully installed
NOTICE: primary node record (ID: 1) registered
查看集群狀態
repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=node1 user=repmgr dbname=repmgr connect_timeout=2
repmgr元數據如下:
[postgres@node1 data]$ psql -Urepmgr
repmgr=# \x
Expanded display is on.
repmgr=# select * from repmgr.nodes;
-[ RECORD 1 ]----+-------------------------------------------------------
node_id | 1
upstream_node_id |
active | t
node_name | node1
type | primary
location | default
priority | 100
conninfo | host=node1 user=repmgr dbname=repmgr connect_timeout=2
repluser | repmgr
slot_name |
config_file | /usr/local/postgresql/repmgr/repmgr.conf
一個PostgreSQL復制集群中的每個服務器都將擁有自己的記錄。在使用repmgrd時,當節點狀態或角色發生變化時,upstream_node_id、active和type字段會更新。
備節點配置
在node2節點上測試(--dry-run)一下是否能正常clone主庫數據:
[postgres@node2 data]$ repmgr -h node1 -U repmgr -d repmgr -f /usr/local/postgresql/repmgr/repmgr.conf standby clone --dry-run
NOTICE: destination directory "/usr/local/postgresql/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=node1 user=repmgr dbname=repmgr
DETAIL: current installation size is 29 MB
INFO: "repmgr" extension is installed in database "repmgr"
INFO: replication slot usage not requested; no replication slot will be set up for this standby
INFO: parameter "max_wal_senders" set to 10
NOTICE: checking for available walsenders on the source node (2 required)
INFO: sufficient walsenders available on the source node
DETAIL: 2 required, 10 available
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: required number of replication connections could be made to the source server
DETAIL: 2 replication connections required
NOTICE: standby will attach to upstream node 1
HINT: consider using the -c/--fast-checkpoint option
INFO: would execute:
/usr/local/postgresql/pg_basebackup -l "repmgr base backup" -D /usr/local/postgresql/data -h node1 -p 5432 -U repmgr -X stream
INFO: all prerequisites for "standby clone" are met
沒有報錯的情況下,開始正式克隆
[postgres@node2 postgresql]$ repmgr -h node1 -U repmgr -d repmgr -f /usr/local/postgresql/repmgr/repmgr.conf standby clone
NOTICE: destination directory "/usr/local/postgresql/data" provided
INFO: connecting to source node
DETAIL: connection string is: host=node1 user=repmgr dbname=repmgr
DETAIL: current installation size is 30 MB
INFO: replication slot usage not requested; no replication slot will be set up for this standby
NOTICE: checking for available walsenders on the source node (2 required)
NOTICE: checking replication connections can be made to the source server (2 required)
INFO: checking and correcting permissions on existing directory "/usr/local/postgresql/data"
NOTICE: starting backup (using pg_basebackup)...
HINT: this may take some time; consider using the -c/--fast-checkpoint option
INFO: executing:
/usr/local/postgresql/bin/pg_basebackup -l "repmgr base backup" -D /usr/local/postgresql/data -h node1 -p 5432 -U repmgr -X stream
NOTICE: standby clone (using pg_basebackup) complete
NOTICE: you can now start your PostgreSQL server
HINT: for example: sudo systemctl start pgserver
HINT: after starting the server, you need to register this standby with "repmgr standby register"
實際上使用了pg_basebackup命令clone了主節點的數據目錄文件。主節點數據目錄中的配置文件也都被復制到了備節點的數據目錄中,包括postgresql.conf, postgresql.auto.conf, pg_hba.conf和pg_ident.conf。 如果不需要針對備節點定制修改這些配置的話,就可以啟動備節點的數據庫服務了:
sudo /usr/bin/systemctl start postgresql.service
驗證復制是否正常工作
在主庫node1上查看復制情況
[postgres@node1 repmgr]$ psql -Urepmgr
psql (16.3)
Type "help" for help.
repmgr=# \x
Expanded display is on.
repmgr=# select * from pg_stat_replication;
-[ RECORD 1 ]----+------------------------------
pid | 7665
usesysid | 16388
usename | repmgr
application_name | node2
client_addr | 10.11.110.47
client_hostname |
client_port | 40102
backend_start | 2024-05-28 17:39:27.557961+08
backend_xmin |
state | streaming
sent_lsn | 0/7000328
write_lsn | 0/7000328
flush_lsn | 0/7000328
replay_lsn | 0/7000328
write_lag |
flush_lag |
replay_lag |
sync_priority | 0
sync_state | async
reply_time | 2024-05-28 21:48:53.074483+08
備庫node2上查看復制情況
[postgres@node1 repmgr]$ psql -Urepmgr
psql (16.3)
Type "help" for help.
repmgr=# \x
Expanded display is on.
repmgr=# select * from pg_stat_wal_receiver;
-[ RECORD 1 ]---------+-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
pid | 6255
status | streaming
receive_start_lsn | 0/7000000
receive_start_tli | 1
written_lsn | 0/7000328
flushed_lsn | 0/7000328
received_tli | 1
last_msg_send_time | 2024-05-28 21:48:33.053217+08
last_msg_receipt_time | 2024-05-28 21:48:33.054117+08
latest_end_lsn | 0/7000328
latest_end_time | 2024-05-28 17:42:57.692486+08
slot_name |
sender_host | node1
sender_port | 5432
conninfo | user=repmgr passfile=/home/postgres/.pgpass channel_binding=disable connect_timeout=2 dbname=replication host=node1 port=5432 application_name=node2 fallback_application_name=walreceiver sslmode=disable sslcompression=0 sslcertmode=disable sslsni=1 ssl_min_protocol_version=TLSv1.2 gssencmode=disable krbsrvname=postgres gssdelegation=0 target_session_attrs=any load_balance_hosts=disable
注冊備節點
在node2上使用repmgr命令將node2注冊為備節點
[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf standby register
INFO: connecting to local node "node2" (ID: 2)
INFO: connecting to primary database
WARNING: --upstream-node-id not supplied, assuming upstream node is primary (node ID: 1)
INFO: standby registration complete
NOTICE: standby node "node2" (ID: 2) successfully registered
此時再次查看集群狀態
[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 1 | host=node1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | node1 | default | 100 | 1 | host=node2 user=repmgr dbname=repmgr connect_timeout=2
手動切換測試
在備節點node2上執行切換操作,手動將node2切換為主節點
[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf standby switchover
NOTICE: executing switchover on node "node2" (ID: 2)
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "node1" (ID: 1)
NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)
DETAIL: executing server command "sudo systemctl stop postgresql.service"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/9000028
NOTICE: waiting up to 30 seconds (parameter "wal_receive_check_timeout") for received WAL to flush to disk
INFO: sleeping 1 of maximum 30 seconds waiting for standby to flush received WAL to disk
INFO: sleeping 2 of maximum 30 seconds waiting for standby to flush received WAL to disk
INFO: sleeping 3 of maximum 30 seconds waiting for standby to flush received WAL to disk
[...]
INFO: sleeping 30 of maximum 30 seconds waiting for standby to flush received WAL to disk
WARNING: local node "node2" is behind shutdown primary "node1"
DETAIL: local node last receive LSN is 0/82A0000, primary shutdown checkpoint LSN is 0/9000028
NOTICE: aborting switchover
HINT: use --always-promote to force promotion of standby
[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+---------------+----------+----------+----------+----------+--------------------------------------------------------
1 | node1 | primary | ? unreachable | ? | default | 100 | | host=node1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | standby | running | ? node1 | default | 100 | 1 | host=node2 user=repmgr dbname=repmgr connect_timeout=2
第一次切換可能會遇到如上的錯誤,主節點停掉了,但是報錯WARNING: local node "node2" is behind shutdown primary "node1"。查詢資料得到的解決辦法是,將postgresql.conf配置文件做如下修改
archive_command = '{ sleep 5;true; }'
重新加載配置sudo /usr/bin/systemctl reload postgresql.service ,再執行手動切換
[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf standby switchover
NOTICE: executing switchover on node "node2" (ID: 2)
NOTICE: attempting to pause repmgrd on 2 nodes
NOTICE: local node "node2" (ID: 2) will be promoted to primary; current primary "node1" (ID: 1) will be demoted to standby
NOTICE: stopping current primary node "node1" (ID: 1)
NOTICE: issuing CHECKPOINT on node "node1" (ID: 1)
DETAIL: executing server command "sudo systemctl stop postgresql.service"
INFO: checking for primary shutdown; 1 of 60 attempts ("shutdown_check_timeout")
NOTICE: current primary has been cleanly shut down at location 0/A000028
NOTICE: promoting standby to primary
DETAIL: promoting server "node2" (ID: 2) using pg_promote()
NOTICE: waiting up to 60 seconds (parameter "promote_check_timeout") for promotion to complete
NOTICE: STANDBY PROMOTE successful
DETAIL: server "node2" (ID: 2) was successfully promoted to primary
NOTICE: node "node2" (ID: 2) promoted to primary, node "node1" (ID: 1) demoted to standby
NOTICE: switchover was successful
DETAIL: node "node2" is now primary and node "node1" is attached as standby
NOTICE: STANDBY SWITCHOVER has completed successfully
重新查看集群狀態
[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 1 | host=node1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | primary | * running | | default | 100 | 2 | host=node2 user=repmgr dbname=repmgr connect_timeout=2
使用repmgrd實現自動故障轉移
啟動repmgrd
在node1和node2上配置systemd配置文件/lib/systemd/system/repmgrd.service
[Unit]
Description=A replication manager, and failover management tool for PostgreSQL
After=syslog.target
After=network.target
After=postgresql.service
[Service]
Type=forking
User=postgres
Group=postgres
# PID file
PIDFile=/usr/local/postgresql/data/repmgrd.pid
# Location of repmgr conf file:
Environment=REPMGRDCONF=/usr/local/postgresql/repmgr/repmgr.conf
Environment=PIDFILE=/usr/local/postgresql/data/repmgrd.pid
# Where to send early-startup messages from the server
# This is normally controlled by the global default set by systemd
# StandardOutput=syslog
ExecStart=/usr/local/postgresql/bin/repmgrd -f ${REPMGRDCONF} -p ${PIDFILE} -d --verbose
ExecStop=/usr/bin/kill -TERM $MAINPID
ExecReload=/usr/bin/kill -HUP $MAINPID
# Give a reasonable amount of time for the server to start up/shut down
TimeoutSec=300
[Install]
WantedBy=multi-user.target
在node1和node2上啟動repmgrd服務
systemctl enable repmgrd --now
自動故障轉移測試
當前集群狀態
[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
1 | node1 | standby | running | node2 | default | 100 | 2 | host=node1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | primary | * running | | default | 100 | 2 | host=node2 user=repmgr dbname=repmgr connect_timeout=2
當前主節點在node2,模擬主節點宕掉
[postgres@node2 ~]$ sudo systemctl stop postgresql.service
等一會兒查看集群狀態,node1節點已提升為主節點
[postgres@node1 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 3 | host=node1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | primary | - failed | ? | default | 100 | | host=node2 user=repmgr dbname=repmgr connect_timeout=2
重新啟動node2節點postgresql,會出現下面的狀況,這個也是這種主備高可用方案的缺點,出現了兩個主節點
[postgres@node1 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf cluster show
ID | Name | Role | Status | Upstream | Location | Priority | Timeline | Connection string
----+-------+---------+-----------+----------+----------+----------+----------+--------------------------------------------------------
1 | node1 | primary | * running | | default | 100 | 3 | host=node1 user=repmgr dbname=repmgr connect_timeout=2
2 | node2 | primary | ! running | | default | 100 | 2 | host=node2 user=repmgr dbname=repmgr connect_timeout=2
node2的狀態是不對的,先將postgresql再次停止,并重新以備節點的角色加入到集群
[postgres@node2 ~]$ sudo systemctl stop postgresql.service
[postgres@node2 ~]$ repmgr -f /usr/local/postgresql/repmgr/repmgr.conf node rejoin -d 'host=node1 dbname=repmgr user=repmgr' --force-rewind
NOTICE: rejoin target is node "node1" (ID: 1)
NOTICE: pg_rewind execution required for this node to attach to rejoin target node 1
DETAIL: rejoin target server's timeline 3 forked off current database system timeline 2 before current recovery point 0/F000028
NOTICE: executing pg_rewind
DETAIL: pg_rewind command is "/usr/local/postgresql/bin/pg_rewind -D '/usr/local/postgresql/data' --source-server='host=node1 user=repmgr dbname=repmgr connect_timeout=2'"
NOTICE: 0 files copied to /usr/local/postgresql/data
NOTICE: setting node 2's upstream to node 1
WARNING: unable to ping "host=node2 user=repmgr dbname=repmgr connect_timeout=2"
DETAIL: PQping() returned "PQPING_NO_RESPONSE"
NOTICE: starting server using "sudo systemctl start postgresql.service"
NOTICE: NODE REJOIN successful
DETAIL: node 2 is now attached to node 1
注意:這里使用了--force-rewind參數,正常情況下先不加該參數執行rejoin,不成功的情況下再添加該參數。-d 參數后面指定的host為當前的主節點。在正式進行操作前可以先加--dry-run參數測試一下能否正常執行。
因此,如果是生產上用的雙節點主備集群,那么當發生主備切換后,手動將主節點啟動的時候就直接rejoin
repmgr -f /usr/local/postgresql/repmgr/repmgr.conf node rejoin -d 'host=node1 dbname=repmgr user=repmgr'
keepalived實現VIP
repmgr自身沒有vip配置功能,可以使用keepalived來實現vip功能。
使用yum安裝keepalived即可,配置如下
MASTER節點/etc/keepalived/keepalived.conf
vrrp_script chk_myscript {
script "/usr/local/bin/is_postgres_primary.sh"
interval 3 # check every 3 seconds
fall 3 # require 3 failures for KO
rise 3 # require 3 successes for OK
user postgres postgres
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 100 # make all keepalived have different priority
nopreempt
advert_int 1
authentication {
auth_type PASS
auth_pass 123465
}
virtual_ipaddress {
10.11.88.44/16 # this is the standard VIP, change as needed
}
track_script {
chk_myscript
}
}
BACKUP節點/etc/keepalived/keepalived.conf
vrrp_script chk_myscript {
script "/usr/local/bin/is_postgres_primary.sh"
interval 3 # check every 3 seconds
fall 3 # require 3 failures for KO
rise 3 # require 3 successes for OK
user postgres postgres
}
vrrp_instance VI_1 {
state BACKUP
interface ens33
virtual_router_id 51
priority 90 # make all keepalived have different priority
nopreempt
advert_int 1
authentication {
auth_type PASS
auth_pass 123465
}
virtual_ipaddress {
10.11.88.44/16 # this is the standard VIP, change as needed
}
track_script {
chk_myscript
}
}
postgresql檢查腳本/usr/local/bin/is_postgres_primary.sh,記得賦予執行權限
#!/bin/bash
export PATH=/usr/local/postgresql/bin:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin
export PGDATA=/usr/local/postgresql/data
export PGDATABASE=postgres
export PGUSER=postgres
export PGPORT=5432
psql -q -t -c 'select pg_is_in_recovery()' |grep 'f'
配置好后啟動keepalived,只有在對應節點postgresql正常運行且為primary節點時,VIP才會在當前服務器。
pgbouncer安裝
下載源碼包
https://www.pgbouncer.org/downloads/files/1.22.1/pgbouncer-1.22.1.tar.gz
安裝依賴
dnf install make make-devel libevent libevent-devel pkgconfig pkgconfig-devel openssl openssl-devel -y
編譯安裝
tar zxvf pgbouncer-1.22.1.tar.gz && cd pgbouncer-1.22.1
./configure --prefix=/usr/local/pgbouncer --with-systemd
make && make install
安裝好以后樣例配置文件及service配置文件等在路徑/usr/local/pgbouncer/share/doc/pgbouncer下面。
pgbouncer.ini配置
[databases]
test = host=10.11.88.44 port=5432 pool_size=20 dbname=test
#如果懶得配指定數據庫也可以用下面這種通配符方式
* = host=10.11.88.44 port=5432 pool_size=20
[pgbouncer]
logfile = /usr/local/pgbouncer/log/pgbouncer.log
pidfile = /usr/local/pgbouncer/pgbouncer.pid
listen_addr = *
listen_port = 16432
auth_type = md5
auth_file = /usr/local/pgbouncer/userlist.txt
admin_users = postgres
stats_users = stats, root
pool_mode = session
server_reset_query = DISCARD ALL
server_check_query = select 1
max_client_conn = 1000
default_pool_size = 20
min_pool_size = 10
reserve_pool_size = 10
reserve_pool_timeout = 5
max_db_connections = 100
max_user_connections = 100
auth_file中的用戶即訪問后端真實postgresql數據庫中的所需要的用戶。
auth_type 配置為md5是會自動兼容md5和scram-sha-256兩種加密方式。PG14之后默認scram-sha-256??赏ㄟ^select usename,passwd from pg_shadow order by 1;查看。
日志文件路徑要提前創建mkdir /usr/local/pgbouncer/log
host 這里配置的VIP地址
userlist.txt文件的內容可以直接從數據庫pg_shadow里查詢出來粘貼進去,樣例如下:
"bill" "SCRAM-SHA-256$4096:orUtCmrN6jsZDygp3sJuog==$JevG8fFxys35E4Qe+aonmOm52PvXEHhmMgmCumsL9TM=:6E1YwsQc4CMYIXsy2koQFD2sbbphZb1s+dENcotanTM="
配置systemd啟動腳本/lib/systemd/system/pgbouncer.service
[Unit]
Description=connection pooler for PostgreSQL
Documentation=man:pgbouncer(1)
Documentation=https://www.pgbouncer.org/
After=network-online.target
Wants=network-online.target
[Service]
Type=notify
User=postgres
ExecStart=/usr/local/pgbouncer/bin/pgbouncer /usr/local/pgbouncer/pgbouncer.ini
ExecReload=/bin/kill -HUP $MAINPID
KillSignal=SIGINT
[Install]
WantedBy=multi-user.target
啟動pgbouncer后就可以連接數據庫了
systemctl enable pgbouncer.service --now
psql -h 10.11.88.44 -p 16432 -U bill -d test
本地登錄pgbouncer控制臺,可以查看連接狀態等信息。
[postgres@node1 ~]$ psql -p 16432 pgbouncer pgbouncer
psql (16.3, server 1.22.1/bouncer)
WARNING: psql major version 16, server major version 1.22.
Some psql features might not work.
Type "help" for help.
pgbouncer=# show help;
NOTICE: Console usage
DETAIL:
SHOW HELP|CONFIG|DATABASES|POOLS|CLIENTS|SERVERS|USERS|VERSION
SHOW PEERS|PEER_POOLS
SHOW FDS|SOCKETS|ACTIVE_SOCKETS|LISTS|MEM|STATE
SHOW DNS_HOSTS|DNS_ZONES
SHOW STATS|STATS_TOTALS|STATS_AVERAGES|TOTALS
SET key = arg
RELOAD
PAUSE [<db>]
RESUME [<db>]
DISABLE <db>
ENABLE <db>
RECONNECT [<db>]
KILL <db>
SUSPEND
SHUTDOWN
WAIT_CLOSE [<db>]
SHOW
pgbouncer=# show clients;
type | user | database | state | addr | port | local_addr | local_port | connect_time | request_time | wait | wait_us | close_needed | ptr | link | remote_pid | tls | application_name | prepared_stateme
nts
------+-----------+-----------+--------+------+-------+------------+------------+-------------------------+-------------------------+------+---------+--------------+----------+------+------------+-----+------------------+-----------------
----
C | pgbouncer | pgbouncer | active | unix | 16432 | unix | 16432 | 2024-05-29 21:27:14 CST | 2024-05-29 21:27:20 CST | 4 | 390026 | 0 | 0xf5a2a0 | | 84252 | | psql |
0
C | bill | test | active | unix | 16432 | unix | 16432 | 2024-05-29 21:13:41 CST | 2024-05-29 21:13:41 CST | 0 | 0 | 0 | 0xf59ff0 | | 82562 | | psql |
0
(2 rows)
總結
通過repmgrd實現自動故障轉移,結合keepalived使訪問請求只會發到primary節點,再加上pgbouncer進行連接池控制,基本上實現了線上最少服務器情況下的postgresql高可用架構。
不足:在發生自動故障轉移后,需要手動干預將故障節點重新加入集群。
特殊場景1:即使故障轉移后原主節點postgresql被手動或自動的意外拉起來,造成雙主的情況也不怕,因為keepalived配置的非搶占模式,vip還是停留在新主節點服務器上。
特殊場景2:手動把主節點的keepalived停掉,此時備節點keepalived檢查腳本檢測到本機postgresql仍是備節點,故不會將vip添加的備機,當前就處于沒有vip的情況,無法對外提供數據庫訪問服務。
浙公網安備 33010602011771號