MariaDB fails to start after wsrep_on = off

維運案例分享

Environment

OS: CentOS 7u9
DB: MariaDB 10.4.15 & Galera 4

架構需求:

Galera Cluster MariaDB -> MariaDB Stand alone

問題描述

從 Galera Cluster 恢復成單機執行時,服務恢復了,但 transaction 異常,且 InnoDB_trx, InnoDB_lock* 多了些奇怪的資訊

Background

  1. XA Transactions
    從官網 [1] xa-transactions overview 先初步了解一下
  2. MariaDB Galera Cluster XA transactions are not supported.
    官網也說了 Galera 不支援 xa transaction

實際維運案例分享

因為 Cluster 的 commit 緩慢問題正在排除,所以決定恢復單機執行,但 galera_new_cluster 問題依舊,故關閉 wsrep_on = off 後再啟動服務.

過程中我們會先得到哪些資訊

mysqld.log ; 因為 cluster 重啟動 & 恢復叢集的操作中,遺留了不少待處理的 Transaction

2022-12-12 15:37:06 0 [Note] InnoDB: Transaction 524271636 was in the XA prepared state.
2022-12-12 15:37:06 0 [Note] InnoDB: Transaction 524271502 was in the XA prepared state.
2022-12-12 15:37:06 0 [Note] InnoDB: 57 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
2022-12-12 15:37:06 0 [Note] InnoDB: Trx id counter is 524289644
2022-12-12 15:37:07 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
==skip==
2022-12-12 15:37:07 0 [Note] InnoDB: Transaction 524272457 in prepared state after recovery
2022-12-12 15:37:07 0 [Note] InnoDB: Transaction contains changes to 1 rows
2022-12-12 15:37:07 0 [Note] InnoDB: 57 transactions in prepared state after recovery
2022-12-12 15:37:07 0 [Note] Found 57 prepared transaction(s) in InnoDB
2022-12-12 15:37:07 0 [Warning] Found 57 prepared XA transactions

slowquery & innodb_trx 得到相關不應該出現緩慢 stmt

*************************** 1. row ***************************
                    trx_id: 524282268
                 trx_state: LOCK WAIT
               trx_started: 2022-12-12 15:05:47
     trx_requested_lock_id: 524282268:6390:3:2
          trx_wait_started: 2022-12-12 15:05:47
                trx_weight: 4
       trx_mysql_thread_id: 2120
                 trx_query: insert ignore into tbl_name (agent_id) values(1)
       trx_operation_state: inserting
         trx_tables_in_use: 1
         trx_tables_locked: 2
          trx_lock_structs: 3
     trx_lock_memory_bytes: 1128
           trx_rows_locked: 1
         trx_rows_modified: 1
   trx_concurrency_tickets: 0
       trx_isolation_level: REPEATABLE READ
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
          trx_is_read_only: 0
trx_autocommit_non_locking: 0
*************************** 2. row ***************************
                    trx_id: 524282226
                 trx_state: LOCK WAIT
               trx_started: 2022-12-12 15:05:34
     trx_requested_lock_id: 524282226:7020:4:4
          trx_wait_started: 2022-12-12 15:05:34
                trx_weight: 3
       trx_mysql_thread_id: 321
                 trx_query: update tbl_name set data ='XXX' where cfid ='YYY'
       trx_operation_state: fetching rows
         trx_tables_in_use: 1
         trx_tables_locked: 1
          trx_lock_structs: 3
     trx_lock_memory_bytes: 1128
           trx_rows_locked: 2
         trx_rows_modified: 0
   trx_concurrency_tickets: 0
       trx_isolation_level: READ COMMITTED
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
          trx_is_read_only: 0
trx_autocommit_non_locking: 0

繼續追查還有對應 trx_mysql_thread_id: 0 的紀錄

*************************** 59. row ***************************
                    trx_id: 524271508
                 trx_state: RUNNING
               trx_started: 2022-12-12 14:50:58
     trx_requested_lock_id: NULL
          trx_wait_started: NULL
                trx_weight: 2
       trx_mysql_thread_id: 0
                 trx_query: NULL
       trx_operation_state: NULL
         trx_tables_in_use: 0
         trx_tables_locked: 1
          trx_lock_structs: 1
     trx_lock_memory_bytes: 1128
           trx_rows_locked: 0
         trx_rows_modified: 1
   trx_concurrency_tickets: 0
       trx_isolation_level: REPEATABLE READ
         trx_unique_checks: 1
    trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
          trx_is_read_only: 0
trx_autocommit_non_locking: 0

從數量判斷 mysqld 處理 crash recover 的時候 XA transaction 也卡住了,接著手動處理 XA transaction (rollback 處理)

Solution

操作官方說明

Steps

可以先試試 –tc-heuristic-recover ROLLBACK 解決問題,如果無解的話

先從 mysql client 接入後 xa recover 呈現現在待處理的 XA transaction,結果從 data 欄位發現 wsrep 遺留的 transaction 包含亂碼,也不好繼續後面指令

XA RECOVER;
+----------+--------------+--------------+------+
| formatID | gtrid_length | bqual_length | data |
+----------+--------------+--------------+------+
|        1 |            4 |            0 | wsrep***** |
+----------+--------------+--------------+------+

轉換可讀語法,官方說法

MariaDB starting with 10.3.3
You can use XA RECOVER FORMAT='SQL' to get the data in a human readable form that can be directly copy-pasted into XA COMMIT or XA ROLLBACK. This is particularly useful for binary xid generated by some transaction coordinators.

所以就會產出

xa recover format='SQL';
+----------+--------------+--------------+-----------------------------------------------+
| formatID | gtrid_length | bqual_length | data                                          |
+----------+--------------+--------------+-----------------------------------------------+
|        1 |           11 |            7 | X'31320d3334093637763738',X'6162630a646566' |
+----------+--------------+--------------+-----------------------------------------------+

接著就可以下 rollback 指定了

xa rollback X'31320d3334093637763738',X'6162630a646566';

建議再重新啟動 MariaDB 觀察一下 mysqld.log 還有沒有其他異常的 Log 需要處理

References

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *