The tricks for purge hydra useless data.

Intro

Ory Hydra is a hardened, OpenID Certified OAuth 2.0 Server and OpenID Connect Provider.

使用苦主在這兒也簡介了下 Hydra

問題是什麼

在 Hydra v1.3.* 的時期 Schema 依賴 foreign key 建立關聯來節省些事情

但也因為服務特性,只要被摸就會留紀錄的特性目前無法避免無效資料量的增長,所以就只能手動刪除了

好消息是 v2.0.* 後就完全找不到 foreign key 依賴的 Schema 了

這樣資料作業也會方便些。

Background

DB: MariaDB 10.4.15
Hydra v1.3.*

table space size

# ls -lhS | head
total 318G
-rw-rw---- 1 mysql mysql  63G Dec 21 19:53 hydra_oauth2_authentication_request.ibd
-rw-rw---- 1 mysql mysql  55G Dec 21 19:53 hydra_oauth2_oidc.ibd
-rw-rw---- 1 mysql mysql  50G Dec 21 19:53 hydra_oauth2_access.ibd
-rw-rw---- 1 mysql mysql  49G Dec 21 19:53 hydra_oauth2_code.ibd
-rw-rw---- 1 mysql mysql  42G Dec 21 19:53 hydra_oauth2_consent_request.ibd
-rw-rw---- 1 mysql mysql  35G Dec 21 19:53 hydra_oauth2_logout_request.ibd
-rw-rw---- 1 mysql mysql 9.4G Dec 21 19:53 hydra_oauth2_authentication_session.ibd
-rw-rw---- 1 mysql mysql 7.9G Dec 21 19:53 hydra_oauth2_consent_request_handled.ibd
-rw-rw---- 1 mysql mysql 6.6G Dec 21 19:53 hydra_oauth2_authentication_request_handled.ibd

table hydra_oauth2_authentication_request status

$ mysql -e "show table status like 'hydra_oauth2_authentication_request'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+-------------------------------------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+------------------+-----------+
| Name                                | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation       | Checksum | Create_options | Comment | Max_index_length | Temporary |
+-------------------------------------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+------------------+-----------+
| hydra_oauth2_authentication_request | InnoDB |      10 | Dynamic    | 38617002 |           1085 | 41912156160 |               0 |  24018206720 | 250609664 |           NULL | 2022-04-30 23:53:04 | 2022-12-21 19:55:03 | NULL       | utf8_general_ci |     NULL |                |         |                0 | N         |
+-------------------------------------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+------------------+-----------+

問題

產品清資料的語法大概長這樣

mysql> delete from `hydra_oauth2_consent_request` where `client_id` = 'vip' limit 10;

交代一下 Schema 的 FK 關聯多麻煩

$ mysql -e "show create table hydra_oauth2_consent_request\G"
mysql: [Warning] Using a password on the command line interface can be insecure.
*************************** 1. row ***************************
       Table: hydra_oauth2_consent_request
Create Table: CREATE TABLE `hydra_oauth2_consent_request` (
  `challenge` varchar(40) NOT NULL,
  `verifier` varchar(40) NOT NULL,
  `client_id` varchar(255) NOT NULL,
  `subject` varchar(255) NOT NULL,
  `request_url` text NOT NULL,
  `skip` tinyint(1) NOT NULL,
  `requested_scope` text NOT NULL,
  `csrf` varchar(40) NOT NULL,
  `authenticated_at` timestamp NULL DEFAULT NULL,
  `requested_at` timestamp NOT NULL DEFAULT current_timestamp(),
  `oidc_context` text NOT NULL,
  `forced_subject_identifier` varchar(255) DEFAULT '',
  `login_session_id` varchar(40) DEFAULT NULL,
  `login_challenge` varchar(40) DEFAULT NULL,
  `requested_at_audience` text NOT NULL,
  `acr` text NOT NULL,
  `context` text NOT NULL,
  PRIMARY KEY (`challenge`),
  UNIQUE KEY `hydra_oauth2_consent_request_veri_idx` (`verifier`),
  KEY `hydra_oauth2_consent_request_cid_idx` (`client_id`),
  KEY `hydra_oauth2_consent_request_sub_idx` (`subject`),
  KEY `hydra_oauth2_consent_request_idx_01` (`requested_at`),
  KEY `hydra_oauth2_consent_request_login_session_id_idx` (`login_session_id`),
  KEY `hydra_oauth2_consent_request_login_challenge_idx` (`login_challenge`),
  CONSTRAINT `hydra_oauth2_consent_request_client_id_fk` FOREIGN KEY (`client_id`) REFERENCES `hydra_client` (`id`) ON DELETE CASCADE,
  CONSTRAINT `hydra_oauth2_consent_request_login_challenge_fk` FOREIGN KEY (`login_challenge`) REFERENCES `hydra_oauth2_authentication_request` (`challenge`) ON DELETE SET NULL,
  CONSTRAINT `hydra_oauth2_consent_request_login_session_id_fk` FOREIGN KEY (`login_session_id`) REFERENCES `hydra_oauth2_authentication_session` (`id`) ON DELETE SET NULL
) ENGINE=InnoDB DEFAULT CHARSET=utf8

當然這張表也不小

$ mysql -e "show table status like 'hydra_oauth2_consent_request'"
mysql: [Warning] Using a password on the command line interface can be insecure.
+------------------------------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+------------------+-----------+
| Name                         | Engine | Version | Row_format | Rows     | Avg_row_length | Data_length | Max_data_length | Index_length | Data_free | Auto_increment | Create_time         | Update_time         | Check_time | Collation       | Checksum | Create_options | Comment | Max_index_length | Temporary |
+------------------------------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+------------------+-----------+
| hydra_oauth2_consent_request | InnoDB |      10 | Dynamic    | 21986247 |           1362 | 29961666560 |               0 |  14324645888 | 108003328 |           NULL | 2022-04-30 23:53:04 | 2022-12-21 20:01:01 | NULL       | utf8_general_ci |     NULL |                |         |                0 | N         |
+------------------------------+--------+---------+------------+----------+----------------+-------------+-----------------+--------------+-----------+----------------+---------------------+---------------------+------------+-----------------+----------+----------------+---------+------------------+-----------+

先從 explain 來看的話會發生什麼事情

$ mysql -e "explain delete from hydra_oauth2_consent_request where client_id = 'vip' limit 10;"
mysql: [Warning] Using a password on the command line interface can be insecure.
+------+-------------+------------------------------+-------+--------------------------------------+--------------------------------------+---------+------+---------+-------------+
| id   | select_type | table                        | type  | possible_keys                        | key                                  | key_len | ref  | rows    | Extra       |
+------+-------------+------------------------------+-------+--------------------------------------+--------------------------------------+---------+------+---------+-------------+
|    1 | SIMPLE      | hydra_oauth2_consent_request | range | hydra_oauth2_consent_request_cid_idx | hydra_oauth2_consent_request_cid_idx | 767     | NULL | 9292132 | Using where |
+------+-------------+------------------------------+-------+--------------------------------------+--------------------------------------+---------+------+---------+-------------+

接下來作業的時候發生了什麼狀況,很明顯的一根 replication delay

雖然不至於產生 HA 切換,但麻煩的是商務邏輯在讀寫分離的情況下產生的功能異常影響就大條了

區分 delete / select 在 explain 上的差異 ; 光 Using index condition / Using where 的差異就明顯解釋了效能差異

[wnlin@dba-desktop-1 ~]$ mysqlq dbm-m028 m028006 -e "explain select * from hydra_oauth2_consent_request where client_id = 'vip' limit 10;"
mysql: [Warning] Using a password on the command line interface can be insecure.
+------+-------------+------------------------------+------+--------------------------------------+--------------------------------------+---------+-------+---------+-----------------------+
| id   | select_type | table                        | type | possible_keys                        | key                                  | key_len | ref   | rows    | Extra                 |
+------+-------------+------------------------------+------+--------------------------------------+--------------------------------------+---------+-------+---------+-----------------------+
|    1 | SIMPLE      | hydra_oauth2_consent_request | ref  | hydra_oauth2_consent_request_cid_idx | hydra_oauth2_consent_request_cid_idx | 767     | const | 9292105 | Using index condition |
+------+-------------+------------------------------+------+--------------------------------------+--------------------------------------+---------+-------+---------+-----------------------+

這個時候的 limit offset 對解決問題的幫助不大

所以解決問題的思維還是要從 PK 下手

但怎麼找出隨機的 PK 又符合 client_id 條件

order by random() 就不再提了

但如果在同一張表上 JOIN 處理過的 Data Set 就好辦了

STMT 大概會長這樣 ; 有沒有需要 ORDER 見仁見智,畢竟原生需求只要清資料

explain SELECT r1.challenge, r1.client_id
  FROM hydra_oauth2_consent_request AS r1 JOIN
       (SELECT CEIL(RAND() *
                     (SELECT MAX(challenge)
                        FROM hydra_oauth2_consent_request)) as id)
        AS r2
 WHERE r1.challenge >= r2.id
 AND r1.client_id = 'vip'
 -- ORDER BY r1.challenge ASC
 LIMIT 10;

再確認一下 explain 已經可以正常使用 index 了

mysql> explain SELECT r1.challenge, r1.client_id
    ->   FROM hydra_oauth2_consent_request AS r1 JOIN
    ->        (SELECT CEIL(RAND() *
    ->                      (SELECT MAX(challenge)
    ->                         FROM hydra_oauth2_consent_request)) as id)
    ->         AS r2
    ->  WHERE r1.challenge >= r2.id
    ->  AND r1.client_id = 'vip'
    ->  -- ORDER BY r1.challenge ASC
    ->  LIMIT 10;
+------+-------------+------------+--------+----------------------------------------------+--------------------------------------+---------+-------+---------+------------------------------+
| id   | select_type | table      | type   | possible_keys                                | key                                  | key_len | ref   | rows    | Extra                        |
+------+-------------+------------+--------+----------------------------------------------+--------------------------------------+---------+-------+---------+------------------------------+
|    1 | PRIMARY     | <derived2> | system | NULL                                         | NULL                                 | NULL    | NULL  | 1       |                              |
|    1 | PRIMARY     | r1         | ref    | PRIMARY,hydra_oauth2_consent_request_cid_idx | hydra_oauth2_consent_request_cid_idx | 767     | const | 9290919 | Using where; Using index     |
|    2 | DERIVED     | NULL       | NULL   | NULL                                         | NULL                                 | NULL    | NULL  | NULL    | No tables used               |
|    3 | SUBQUERY    | NULL       | NULL   | NULL                                         | NULL                                 | NULL    | NULL  | NULL    | Select tables optimized away |
+------+-------------+------------+--------+----------------------------------------------+--------------------------------------+---------+-------+---------+------------------------------+
4 rows in set, 1 warning (0.00 sec)

再看看這個時候取資料的效率 ; 各 offset limit 反應時間都可以維持在 0.5s 內

10 rows in set, 11 warnings (0.01 sec)
1000 rows in set, 1001 warnings (0.01 sec)
10000 rows in set, 10001 warnings (0.04 sec)
50000 rows in set, 50001 warnings (0.11 sec)

我不建議再把語法改寫成 DELETE INNER JOIN 作業

也因為取得的 dataset 內已含 challenge 這 PK 的話,那就是 loop 把 dataset 一個一個走完,原則上也就不擔心 DELETE STMT 產生的 replication delay 了.

發佈留言

發佈留言必須填寫的電子郵件地址不會公開。 必填欄位標示為 *