維運案例分享
OS: CentOS 7u9
DB: MariaDB 10.4
這世界上太多教學告訴你資料庫怎麼還原,Replication 怎麼建立 EX: here
但從來不會有人知道為什麼 mariabackup 打包後的資料庫還原建完 replication 後,
Slave IO Thread 正常,但 Slave SQL Thread 永遠沒辦法正常消化,
放著等也就 ERROR 1205 (HY000): Lock wait timeout exceeded 了
這次遇到的狀況是如此,程序都正常執行著
- mariabackup -> - move backup -> - mariabackup --prepare -> - mariabackup --copy-back -> - service started -> - change master for config replication -> - start slave -> - replication delay
在說明檢查程序前,我們先了解一下 XA Transactions
在 ERROR 1205 的提醒下,我們可以找到 trx_mysql_thread_id = 0 的 dead lock
MariaDB [information_schema]> select * from INNODB_trx\G *************************** 1. row *************************** trx_id: 154142686 trx_state: RUNNING trx_started: 2022-01-05 13:52:26 trx_requested_lock_id: NULL trx_wait_started: NULL trx_weight: 2 trx_mysql_thread_id: 0 trx_query: NULL trx_operation_state: NULL trx_tables_in_use: 0 trx_tables_locked: 1 trx_lock_structs: 1 trx_lock_memory_bytes: 1128 trx_rows_locked: 0 trx_rows_modified: 1 trx_concurrency_tickets: 0 trx_isolation_level: REPEATABLE READ
接著把資料找出來
- mariabackup -> MariaDB [information_schema]> xa recover format='SQL'; +----------+--------------+--------------+---------------------------------------------------------------------+ | formatID | gtrid_length | bqual_length | data | +----------+--------------+--------------+---------------------------------------------------------------------+ | 1 | 32 | 0 | X'5753524550586965bddf8a8f51ef11eca1507e79643313a4e9db950400000000' | +----------+--------------+--------------+---------------------------------------------------------------------+ 1 row in set (0.000 sec)
Rollback 處理後,再觀察一次 Replication 的消化狀況
- mariabackup -> MariaDB [information_schema]> xa rollback X'5753524550586965bddf8a8f51ef11eca1507e79643313a4e9db950400000000'; Query OK, 0 rows affected (0.058 sec)
總結:
我認為這狀況不常見,也不大合理
因為 mariabackup 原則上在 –prepare 下會把該做的 redo / undo 處理告一段落才是,不應該有還沒處理好的 transaction 遺留
所以也只能推測 mariabackup 的鍋
但我沒有證據
Ref: https://www.cnblogs.com/gjc592/p/11240811.html