維運案例分享
Environment
OS: CentOS 7u9
DB: MariaDB 10.4.15 & Galera 4
架構需求:
Galera Cluster MariaDB -> MariaDB Stand alone
問題描述
從 Galera Cluster 恢復成單機執行時,服務恢復了,但 transaction 異常,且 InnoDB_trx, InnoDB_lock* 多了些奇怪的資訊
Background
- XA Transactions
從官網 [1] xa-transactions overview 先初步了解一下 - MariaDB Galera Cluster XA transactions are not supported.
官網也說了 Galera 不支援 xa transaction
實際維運案例分享
因為 Cluster 的 commit 緩慢問題正在排除,所以決定恢復單機執行,但 galera_new_cluster 問題依舊,故關閉 wsrep_on = off 後再啟動服務.
過程中我們會先得到哪些資訊
mysqld.log ; 因為 cluster 重啟動 & 恢復叢集的操作中,遺留了不少待處理的 Transaction
2022-12-12 15:37:06 0 [Note] InnoDB: Transaction 524271636 was in the XA prepared state.
2022-12-12 15:37:06 0 [Note] InnoDB: Transaction 524271502 was in the XA prepared state.
2022-12-12 15:37:06 0 [Note] InnoDB: 57 transaction(s) which must be rolled back or cleaned up in total 0 row operations to undo
2022-12-12 15:37:06 0 [Note] InnoDB: Trx id counter is 524289644
2022-12-12 15:37:07 0 [Note] InnoDB: 128 out of 128 rollback segments are active.
==skip==
2022-12-12 15:37:07 0 [Note] InnoDB: Transaction 524272457 in prepared state after recovery
2022-12-12 15:37:07 0 [Note] InnoDB: Transaction contains changes to 1 rows
2022-12-12 15:37:07 0 [Note] InnoDB: 57 transactions in prepared state after recovery
2022-12-12 15:37:07 0 [Note] Found 57 prepared transaction(s) in InnoDB
2022-12-12 15:37:07 0 [Warning] Found 57 prepared XA transactions
slowquery & innodb_trx 得到相關不應該出現緩慢 stmt
*************************** 1. row ***************************
trx_id: 524282268
trx_state: LOCK WAIT
trx_started: 2022-12-12 15:05:47
trx_requested_lock_id: 524282268:6390:3:2
trx_wait_started: 2022-12-12 15:05:47
trx_weight: 4
trx_mysql_thread_id: 2120
trx_query: insert ignore into tbl_name (agent_id) values(1)
trx_operation_state: inserting
trx_tables_in_use: 1
trx_tables_locked: 2
trx_lock_structs: 3
trx_lock_memory_bytes: 1128
trx_rows_locked: 1
trx_rows_modified: 1
trx_concurrency_tickets: 0
trx_isolation_level: REPEATABLE READ
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_is_read_only: 0
trx_autocommit_non_locking: 0
*************************** 2. row ***************************
trx_id: 524282226
trx_state: LOCK WAIT
trx_started: 2022-12-12 15:05:34
trx_requested_lock_id: 524282226:7020:4:4
trx_wait_started: 2022-12-12 15:05:34
trx_weight: 3
trx_mysql_thread_id: 321
trx_query: update tbl_name set data ='XXX' where cfid ='YYY'
trx_operation_state: fetching rows
trx_tables_in_use: 1
trx_tables_locked: 1
trx_lock_structs: 3
trx_lock_memory_bytes: 1128
trx_rows_locked: 2
trx_rows_modified: 0
trx_concurrency_tickets: 0
trx_isolation_level: READ COMMITTED
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_is_read_only: 0
trx_autocommit_non_locking: 0
繼續追查還有對應 trx_mysql_thread_id: 0 的紀錄
*************************** 59. row ***************************
trx_id: 524271508
trx_state: RUNNING
trx_started: 2022-12-12 14:50:58
trx_requested_lock_id: NULL
trx_wait_started: NULL
trx_weight: 2
trx_mysql_thread_id: 0
trx_query: NULL
trx_operation_state: NULL
trx_tables_in_use: 0
trx_tables_locked: 1
trx_lock_structs: 1
trx_lock_memory_bytes: 1128
trx_rows_locked: 0
trx_rows_modified: 1
trx_concurrency_tickets: 0
trx_isolation_level: REPEATABLE READ
trx_unique_checks: 1
trx_foreign_key_checks: 1
trx_last_foreign_key_error: NULL
trx_is_read_only: 0
trx_autocommit_non_locking: 0
從數量判斷 mysqld 處理 crash recover 的時候 XA transaction 也卡住了,接著手動處理 XA transaction (rollback 處理)
Solution
操作官方說明
Steps
可以先試試 –tc-heuristic-recover ROLLBACK 解決問題,如果無解的話
先從 mysql client 接入後 xa recover 呈現現在待處理的 XA transaction,結果從 data 欄位發現 wsrep 遺留的 transaction 包含亂碼,也不好繼續後面指令
XA RECOVER;
+----------+--------------+--------------+------+
| formatID | gtrid_length | bqual_length | data |
+----------+--------------+--------------+------+
| 1 | 4 | 0 | wsrep***** |
+----------+--------------+--------------+------+
轉換可讀語法,官方說法
MariaDB starting with 10.3.3
You can use XA RECOVER FORMAT='SQL' to get the data in a human readable form that can be directly copy-pasted into XA COMMIT or XA ROLLBACK. This is particularly useful for binary xid generated by some transaction coordinators.
所以就會產出
xa recover format='SQL';
+----------+--------------+--------------+-----------------------------------------------+
| formatID | gtrid_length | bqual_length | data |
+----------+--------------+--------------+-----------------------------------------------+
| 1 | 11 | 7 | X'31320d3334093637763738',X'6162630a646566' |
+----------+--------------+--------------+-----------------------------------------------+
接著就可以下 rollback 指定了
xa rollback X'31320d3334093637763738',X'6162630a646566';
建議再重新啟動 MariaDB 觀察一下 mysqld.log 還有沒有其他異常的 Log 需要處理