MySQL命令rebootClusterFromCompleteOutage重啟集群注意事項
2024-10-12 14:31 瀟湘隱者 閱讀(1022) 評論(0) 收藏 舉報rebootClusterFromCompleteOutage()是MySQL Shell中的一個實用命令,用于在 InnoDB 集群遇到完全中斷 (例如,當組復制在所有成員實例上停止時)后重新配置和恢復集群。這個命令允許你連接到集群中的一個 MySQL 實例,并使用該實例的元數據來恢復整個集群。
在MySQL Shell中使用rebootClusterFromCompleteOutage命令啟動集群(MySQL InnoDB Cluster)
MySQL mysqlu01:7306 ssl JS > dba.rebootClusterFromCompleteOutage()
Restoring the Cluster 'Cluster_GSP' from complete outage...
Cluster instances: 'mysqlu01:7306' (OFFLINE), 'mysqlu02:7306' (OFFLINE), 'mysqlu03:7306' (OFFLINE)
Waiting for instances to apply pending received transactions...
Validating instance configuration at mysqlu01:7306...
This instance reports its own address as mysqlu01:7306
Instance configuration is suitable.
* Waiting for seed instance to become ONLINE...
mysqlu01:7306 was restored.
Validating instance configuration at mysqlu02:7306...
This instance reports its own address as mysqlu02:7306
Instance configuration is suitable.
Rejoining instance 'mysqlu02:7306' to cluster 'Cluster_GSP'...
Re-creating recovery account...
NOTE: User 'mysql_innodb_cluster_201'@'%' already existed at instance 'mysqlu01:7306'. It will be deleted and created again with a new password.
* Waiting for the Cluster to synchronize with the PRIMARY Cluster...
** Transactions replicated ############################################################ 100%
The instance 'mysqlu02:7306' was successfully rejoined to the cluster.
Validating instance configuration at mysqlu03:7306...
This instance reports its own address as mysqlu03:7306
Instance configuration is suitable.
Rejoining instance 'mysqlu03:7306' to cluster 'Cluster_GSP'...
Re-creating recovery account...
NOTE: User 'mysql_innodb_cluster_202'@'%' already existed at instance 'mysqlu01:7306'. It will be deleted and created again with a new password.
* Waiting for the Cluster to synchronize with the PRIMARY Cluster...
** Transactions replicated ############################################################ 100%
The instance 'mysqlu03:7306' was successfully rejoined to the cluster.
The Cluster was successfully rebooted.
<Cluster:Cluster_GSP>
MySQL mysqlu01:7306 ssl JS >
那么,關于命令rebootClusterFromCompleteOutage的一些知識點或注意事項,必須弄清楚,避免踩坑或稀里糊涂的。下面是一些簡單總結,當前測試環境為MySQL 8.0.35,隨著版本的變更,可能會有一些特性變化,請以實際情況為準。
1.確保集群的所有成員/節點在運行命令之前都已啟動:在執行dba.rebootClusterFromCompleteOutage()命令之前,需要確保所有集群成員的MySQL實例都已啟動并且可以訪問。 如果無法訪問任何集群成員,該命令將失敗。如下所示:
情況1: 三個節點的MySQL服務都未啟動的情況
MySQL JS > \c icadmin@mysqlu03:7306
Creating a session to 'icadmin@mysqlu03:7306'
MySQL Error 2003 (HY000): Can't connect to MySQL server on 'mysqlu03:7306' (111)
MySQL JS >
三個節點都沒有啟動時,MySQL Shell甚至都無法連接上。
情況2: 三個節點中,有一個或兩個節點未啟動情況
MySQL mysqlu01:7306 ssl JS > dba.rebootClusterFromCompleteOutage()
Restoring the Cluster 'Cluster_GSP' from complete outage...
Cluster instances: 'mysqlu01:7306' (OFFLINE), 'mysqlu02:7306' (UNREACHABLE), 'mysqlu03:7306' (UNREACHABLE)
WARNING: One or more instances of the Cluster could not be reached and cannot be rejoined nor ensured to be OFFLINE: 'mysqlu02:7306', 'mysqlu03:7306'. Cluster may diverge and become inconsistent unless all instances are either reachable or certain to be OFFLINE and not accepting new transactions. You may use the 'force' option to bypass this check and proceed anyway.
ERROR: Could not determine if Cluster is completely OFFLINE
Dba.rebootClusterFromCompleteOutage: Could not determine if Cluster is completely OFFLINE (RuntimeError)
MySQL mysqlu01:7306 ssl JS >
2.這種啟動方式,它會自動找出GTID值最大的成員/節點作為MGR的引導節點嗎?如果不能,是否需要mysql shell連接到集群中GTID最大的成員/節點,然后執行這個命令呢?
如下所示,集群中有三個節點mysqlu01,mysqlu02,mysqlu03
mysql> select * from performance_schema.replication_group_members order by member_host\G;
*************************** 1. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 591280ce-bb5f-11ee-8862-00505697b437
MEMBER_HOST: mysqlu01
MEMBER_PORT: 7306
MEMBER_STATE: ONLINE
MEMBER_ROLE: PRIMARY
MEMBER_VERSION: 8.0.35
MEMBER_COMMUNICATION_STACK: MySQL
*************************** 2. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 6ae7d68b-ba96-11ee-8092-005056971158
MEMBER_HOST: mysqlu02
MEMBER_PORT: 7306
MEMBER_STATE: ONLINE
MEMBER_ROLE: SECONDARY
MEMBER_VERSION: 8.0.35
MEMBER_COMMUNICATION_STACK: MySQL
*************************** 3. row ***************************
CHANNEL_NAME: group_replication_applier
MEMBER_ID: 4fc54bd5-bbf3-11ee-b588-0050569783ac
MEMBER_HOST: mysqlu03
MEMBER_PORT: 7306
MEMBER_STATE: ONLINE
MEMBER_ROLE: SECONDARY
MEMBER_VERSION: 8.0.35
MEMBER_COMMUNICATION_STACK: MySQL
3 rows in set (0.00 sec)
ERROR:
No query specified
mysql>
我們先用命令將節點mysqlu03的MySQL實例關閉
sudo systemctl stop mysqld.service
然后在mysqlu01中進行一些插入操作,人為模擬節點mysqlu01的GTID要大于mysqlu03
insert into test
select 1007, 'k1007' union all
select 1008, 'k1008';
mysql> insert into test
-> select 1007, 'k1007' union all
-> select 1008, 'k1008';
Query OK, 2 rows affected (0.01 sec)
Records: 2 Duplicates: 0 Warnings: 0
然后先后依次將mysqlu02,mysqlu01的MySQL服務關閉,然后啟動這三個節點的MySQL服務,然后使用mysqlsh連接到mysqlu03上來啟動
mysqlsh icadmin@mysqlu03:7306
或
mysqlsh icadmin@mysqlu03:7306 --log-level=DEBUG3
如下所示:
MySQL mysqlu03:7306 ssl JS > dba.rebootClusterFromCompleteOutage()
Restoring the Cluster 'Cluster_GSP' from complete outage...
Cluster instances: 'mysqlu01:7306' (OFFLINE), 'mysqlu02:7306' (OFFLINE), 'mysqlu03:7306' (OFFLINE)
Waiting for instances to apply pending received transactions...
Switching over to instance 'mysqlu01:7306' (which has the highest GTID set), to be used as seed.
NOTE: The instance 'mysqlu01:7306' is running auto-rejoin process, which will be cancelled.
Validating instance configuration at mysqlu01:7306...
This instance reports its own address as mysqlu01:7306
Instance configuration is suitable.
NOTE: Cancelling active GR auto-initialization at mysqlu01:7306
* Waiting for seed instance to become ONLINE...
mysqlu01:7306 was restored.
Validating instance configuration at mysqlu02:7306...
This instance reports its own address as mysqlu02:7306
Instance configuration is suitable.
Rejoining instance 'mysqlu02:7306' to cluster 'Cluster_GSP'...
Re-creating recovery account...
NOTE: User 'mysql_innodb_cluster_201'@'%' already existed at instance 'mysqlu01:7306'. It will be deleted and created again with a new password.
* Waiting for the Cluster to synchronize with the PRIMARY Cluster...
** Transactions replicated ############################################################ 100%
The instance 'mysqlu02:7306' was successfully rejoined to the cluster.
Validating instance configuration at mysqlu03:7306...
This instance reports its own address as mysqlu03:7306
Instance configuration is suitable.
Rejoining instance 'mysqlu03:7306' to cluster 'Cluster_GSP'...
Re-creating recovery account...
NOTE: User 'mysql_innodb_cluster_202'@'%' already existed at instance 'mysqlu01:7306'. It will be deleted and created again with a new password.
* Waiting for the Cluster to synchronize with the PRIMARY Cluster...
** Transactions replicated ############################################################ 100%
The instance 'mysqlu03:7306' was successfully rejoined to the cluster.
The Cluster was successfully rebooted.
<Cluster:Cluster_GSP>
MySQL mysqlu03:7306 ssl JS >

如上所示,從"Switching over to instance 'mysqlu01:7306' (which has the highest GTID set), to be used as seed."這個輸出信息,我們可以清楚的 看到MySQL Shell中使用命令啟動集群的時候,它會自動判斷哪一個節點的GTID值最大,這個測試案例中,mysqlu01節點的GTID值最大,它就將mysqlu01作為 MGR的引導節點。所以,使用命令(dba.rebootClusterFromCompleteOutage())重啟集群時,可以連接到集群的任一節點去啟動集群。
注意:早期的版本,如果重新啟動集群,需要連接到GTID最新的實例,也就是說必須連接到具有GTID超集的成員,即在中斷之前應用了最多事務的實例。如果當前連接的實例的GTID不是最新(最大的值)的,則會報錯,如下所示
Dba.rebootClusterFromCompleteOutage: The active session instance isn't the most updated
in comparison with the ONLINE instances of the Cluster's metadata.
Please use the most up to date instance: '***.***.***.***:3306'. (RuntimeError)
這個僅僅在早期版本中有這個問題。
另外,需要注意的是,如果集群中三個節點的GTID一致,如果集群關閉前主節點為mysqlu01,如果此時MySQL shell連接到mysqlu03去啟動集群,此時mysqlu03會切換為主節點(PRIMARY),而mysqlu01會切換為從節點(SECOND)
3.使用force選項會忽略GTID集的分歧,并將所選成員作為主節點,丟棄未包含在所選成員GTID集中的任何事務。如果此過程失敗,并且集群元數據已嚴重損壞, 你可能需要刪除元數據并從頭開始重新創建集群。這是一個危險的操作,因為它將刪除所有集群元數據,并且不能撤銷。 在實際操作中,應盡量避免使用force選項,因為它可能會引起數據不一致。只有在你完全了解可能產生的后果,并且沒有其他選擇時,才應考慮使用此選項。
如果你知道哪個節點具有最大的GTID值,但無法訪問其他節點,可以使用force選項來強制重啟集群。這將使用剩余可聯系的成員來重啟集群,即使某些成員當前無法訪問
4.這種方式是集群中所有節點都關閉后使用,如果只是一個節點服務器重啟或MySQL服務關閉后重啟,并不能使用這種方式。
浙公網安備 33010602011771號