Skip to content

Commit c757fb4

Browse files
committed
BUG#35206392: Adding secondary Node, make existing secondary nodes to start lagging
Background ---------- There is a Ticker Manager, which ensures that transactions are committed on the correct side of a view_change. I.e., transactions certified before the view_change commit before the view_change, and transactions certified after the view_change commit after the view_change. This is achieved as follows: * The view_change waits for preceding transactions (according to global order) to begin committing, before the view_change begins to commit. * Transactions following the view_change (according to global order) wait for the view_change to begin committing, before those transactions commit. On an inbound replication channel, there is a Commit Order Manager, which ensures that transactions are committed in the same order they are received. This is achieved by making each transaction wait for the preceding transaction (according to receive order) to begin committing, before it commits. Problem ------- Whenever these two Managers require different orders, it results in a deadlock. For example, suppose the following happens: * The inbound channel receives T1 before T2. * T2 is certified first. Then a view_change occurs. Then T1 is certified. This leads to a deadlock: * When T1 is about to commit, it will invoke the Ticket Manager, which waits for the view_change. * When the view_change is about to commit, it will invoke the Ticket Manager, which waits for T2 to commit. * When T2 is about to commit, it will invoke the Commit Order Manager, which will wait for T1 to commit. Therefore, this results in a wait cycle, i.e., deadlock. Solution -------- We detect the deadlock, enforce the Ticket Manager order and ignore the Commit Order Manager order. This has the following user-visible consequences: * It may violate replica_preserve_commit_order near View_change_log_events. In other words, replica_preserve_commit_order no longer provides any strict guarantee on an inbound channel on a GR primary. * replica_preserve_commit_order still ensures a that transactions are likely ordered correctly, with exceptions only around View_change_log_events. * This suffices to avoid GTID gaps almost always, which ensures that maintaining gtid_executed is fast. * But the user cannot rely on transaction order to be always preserved for inbound channels on GR primaries. Change-Id: Ibbbae77dadc9db79381decc535713d2962d7ffa9
1 parent 5635d3b commit c757fb4

22 files changed

+2049
-40
lines changed

mysql-test/suite/group_replication/r/gr_concurrent_ticket_pop_with_channel_2.result

Lines changed: 420 additions & 0 deletions
Large diffs are not rendered by default.
Lines changed: 68 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,68 @@
1+
include/only_with_option.inc [GLOBAL.replica_parallel_workers > 1]
2+
include/group_replication.inc [rpl_server_count=4]
3+
Warnings:
4+
Note #### Sending passwords in plain text without SSL/TLS is extremely insecure.
5+
Note #### Storing MySQL user name or password information in the connection metadata repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START REPLICA; see the 'START REPLICA Syntax' in the MySQL Manual for more information.
6+
[connection server1]
7+
# Bootstrap group with server1 as primary and server2, server3 as secondaries.
8+
[connection server1]
9+
include/start_and_bootstrap_group_replication.inc
10+
# Create inbound channel from server4 to server1
11+
CHANGE REPLICATION SOURCE TO SOURCE_HOST='127.0.0.1', SOURCE_USER='root', SOURCE_AUTO_POSITION=1, SOURCE_PORT=SERVER_4_PORT FOR CHANNEL 'ch1';
12+
Warnings:
13+
Note 1759 Sending passwords in plain text without SSL/TLS is extremely insecure.
14+
Note 1760 Storing MySQL user name or password information in the connection metadata repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START REPLICA; see the 'START REPLICA Syntax' in the MySQL Manual for more information.
15+
include/start_slave.inc [FOR CHANNEL 'ch1']
16+
# Create tables
17+
[connection server4]
18+
CREATE TABLE t1 (c1 INT NOT NULL PRIMARY KEY);
19+
CREATE TABLE t2 (c1 INT NOT NULL PRIMARY KEY);
20+
include/sync_slave_sql_with_master.inc
21+
# Make server2 join the group
22+
[connection server2]
23+
include/start_group_replication.inc
24+
# Take a lock on the primary so that T1 will be blocked
25+
[connection server_1_1]
26+
LOCK TABLES t1 WRITE;
27+
# Commit transaction T1 on table t1, then transaction T2 on t2.
28+
[connection server4]
29+
INSERT INTO t1 VALUES (1);
30+
INSERT INTO t2 VALUES (2);
31+
# Wait until T2 is waiting for T1 to commit.
32+
[connection server1]
33+
include/save_error_log_position.inc
34+
# Join server3 to the group
35+
[connection server3]
36+
include/start_group_replication.inc
37+
# Wait for T2 to commit on server1
38+
[connection server1]
39+
include/assert_error_log.inc [server: 1, pattern: The transaction '[a-z0-9\-]*:[0-9]*' will commit out of order with respect to its source to follow the group global order]
40+
# Verify that t1 is still not committed
41+
include/assert.inc [There should be one missing GTID]
42+
[connection server_1_1]
43+
include/assert.inc [t1 should still be empty]
44+
# Check that new transactions block as needed, *not* violating replica-preserve-commit-order
45+
[connection server_4]
46+
INSERT INTO t2 VALUES (3);
47+
[connection server_1]
48+
include/assert.inc [t2 should still have only one element]
49+
# Unblock T1 and T3 and let them finish
50+
[connection server_1_1]
51+
UNLOCK TABLES;
52+
include/rpl_sync.inc
53+
# Clean up
54+
[connection server4]
55+
DROP TABLE t1;
56+
DROP TABLE t2;
57+
include/sync_slave_sql_with_master.inc
58+
[connection server1]
59+
include/stop_slave.inc [FOR CHANNEL 'ch1']
60+
RESET REPLICA ALL FOR CHANNEL 'ch1';
61+
include/rpl_sync.inc
62+
[connection server3]
63+
include/stop_group_replication.inc
64+
[connection server2]
65+
include/stop_group_replication.inc
66+
[connection server1]
67+
include/stop_group_replication.inc
68+
include/group_replication_end.inc
Lines changed: 124 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,124 @@
1+
include/only_with_option.inc [GLOBAL.replica_parallel_workers > 1]
2+
include/group_replication.inc [rpl_server_count=4]
3+
Warnings:
4+
Note #### Sending passwords in plain text without SSL/TLS is extremely insecure.
5+
Note #### Storing MySQL user name or password information in the connection metadata repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START REPLICA; see the 'START REPLICA Syntax' in the MySQL Manual for more information.
6+
[connection server1]
7+
# Bootstrap group with server1 as primary and server2, server3 as secondaries.
8+
[connection server1]
9+
include/start_and_bootstrap_group_replication.inc
10+
# Create inbound channel from server4 to server1
11+
CHANGE REPLICATION SOURCE TO SOURCE_HOST='127.0.0.1', SOURCE_USER='root', SOURCE_AUTO_POSITION=1, SOURCE_PORT=SERVER_4_PORT FOR CHANNEL 'ch1';
12+
Warnings:
13+
Note 1759 Sending passwords in plain text without SSL/TLS is extremely insecure.
14+
Note 1760 Storing MySQL user name or password information in the connection metadata repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START REPLICA; see the 'START REPLICA Syntax' in the MySQL Manual for more information.
15+
include/start_slave.inc [FOR CHANNEL 'ch1']
16+
# Suppress errors
17+
include/suppress_messages.inc
18+
# Connection 1 suppresses message <Replica SQL for channel 'ch1': Worker .* failed executing transaction '.*' at source log .* Could not execute Write_rows event on table test.t1>.
19+
# Connection 1 suppresses message <Replica SQL for channel 'ch1': ... The replica coordinator and worker threads are stopped, possibly leaving data in inconsistent state.>.
20+
# Connection 1 suppresses message <Plugin group_replication reported: 'The requested GTID '.*' was already used, the transaction will rollback.*'>.
21+
# Connection 2 suppresses message <Replica SQL for channel 'ch1': Worker .* failed executing transaction '.*' at source log .* Could not execute Write_rows event on table test.t1>.
22+
# Connection 2 suppresses message <Replica SQL for channel 'ch1': ... The replica coordinator and worker threads are stopped, possibly leaving data in inconsistent state.>.
23+
# Connection 2 suppresses message <Plugin group_replication reported: 'The requested GTID '.*' was already used, the transaction will rollback.*'>.
24+
# Connection 3 suppresses message <Replica SQL for channel 'ch1': Worker .* failed executing transaction '.*' at source log .* Could not execute Write_rows event on table test.t1>.
25+
# Connection 3 suppresses message <Replica SQL for channel 'ch1': ... The replica coordinator and worker threads are stopped, possibly leaving data in inconsistent state.>.
26+
# Connection 3 suppresses message <Plugin group_replication reported: 'The requested GTID '.*' was already used, the transaction will rollback.*'>.
27+
# Connection 4 suppresses message <Replica SQL for channel 'ch1': Worker .* failed executing transaction '.*' at source log .* Could not execute Write_rows event on table test.t1>.
28+
# Connection 4 suppresses message <Replica SQL for channel 'ch1': ... The replica coordinator and worker threads are stopped, possibly leaving data in inconsistent state.>.
29+
# Connection 4 suppresses message <Plugin group_replication reported: 'The requested GTID '.*' was already used, the transaction will rollback.*'>.
30+
# Create table
31+
[connection server4]
32+
CREATE TABLE t1 (c1 INT NOT NULL PRIMARY KEY);
33+
include/sync_slave_sql_with_master.inc
34+
# Make server2 join the group
35+
[connection server2]
36+
include/start_group_replication.inc
37+
# Begin a transasction on the group primary so that T1 will be blocked
38+
[connection server_1_1]
39+
BEGIN;
40+
INSERT INTO t1 VALUES (1);
41+
# Begin a transasction on the group primary so that T3 will be blocked
42+
[connection server_1_2]
43+
BEGIN;
44+
INSERT INTO t1 VALUES (3);
45+
# Commit transactions T1, T2, T3 on the upstream source.
46+
# This should eventually lead to the following state on the group primary:
47+
# T1: blocked by local session
48+
# T2: certified, waiting for preceding transaction to commit
49+
# T3: blocked by local session
50+
[connection server4]
51+
INSERT INTO t1 VALUES (1);
52+
INSERT INTO t1 VALUES (2);
53+
INSERT INTO t1 VALUES (3);
54+
# Wait until T2 is waiting for T1 to commit.
55+
[connection server1]
56+
include/save_error_log_position.inc
57+
# Join server3 to the group
58+
# The join will produce a view_change, delivered after T2.
59+
# This forces T2 to bypass replica-preserve-commit-order and commit before T1.
60+
# This should eventually lead to the following state on the group primary:
61+
# T1: blocked by local session
62+
# T2: committed ahead
63+
# T3: blocked by local session
64+
[connection server3]
65+
include/start_group_replication.inc
66+
# Wait for T2 to commit on server1.
67+
[connection server1]
68+
include/assert_error_log.inc [server: 1, pattern: The transaction '[a-z0-9\-]*:[0-9]*' will commit out of order with respect to its source to follow the group global order]
69+
# Verify that T1, T3 are still not committed
70+
include/assert.inc [There should be two missing GTIDs]
71+
# Check that *new* transactions block as needed, *not* violating replica-preserve-commit-order.
72+
# This should eventually lead to the following state on the group primary:
73+
# T1: blocked by local session
74+
# T2: committed ahead
75+
# T3: blocked by local session
76+
# T4: waiting for preceding transaction (T3)
77+
[connection server_4]
78+
INSERT INTO t1 VALUES (4);
79+
[connection server_1]
80+
# Wait until T4 is waiting for preceding transaction to commit.
81+
include/assert.inc [t1 should still have only one element]
82+
# Unblock T3.
83+
# This should eventually lead to the following state on the group primary:
84+
# T1: blocked by local session
85+
# T2: committed ahead
86+
# T3: waiting for preceding transaction (T1)
87+
# T4: waiting for preceding transaction (T3)
88+
[connection server_1_2]
89+
ROLLBACK;
90+
# Wait until T3 is waiting for preceding transaction to commit.
91+
# Make T1 fail, by committing the blocking transaction.
92+
# This should eventually lead to the following state on the group primary:
93+
# T1: rolled back
94+
# T2: committed ahead
95+
# T3: rolled back
96+
# T4: rolled back
97+
[connection server_1_1]
98+
COMMIT;
99+
include/wait_for_slave_sql_error.inc [errno=1062 FOR CHANNEL 'ch1']
100+
include/assert.inc [There should be 3 missing GTIDs]
101+
include/assert.inc [t1 should have two elements (one replicated, one from local session)]
102+
# Remove the duplicate row on the group primary.
103+
[connection server1]
104+
DELETE FROM t1 WHERE c1 = 1;
105+
# Start the inbound channel again.
106+
# Now that the duplicate row is gone, it should be able to replicate T1, T3, T4.
107+
# This should eventually lead to the following state on the group primary:
108+
# T1: committed
109+
# T2: committed ahead
110+
# T3: committed
111+
# T4: committed
112+
include/start_slave.inc [FOR CHANNEL 'ch1']
113+
[connection server4]
114+
include/sync_slave_sql_with_master.inc
115+
include/rpl_sync.inc
116+
# Clean up
117+
[connection server4]
118+
DROP TABLE t1;
119+
include/sync_slave_sql_with_master.inc
120+
include/rpl_sync.inc
121+
[connection server1]
122+
include/stop_slave.inc [FOR CHANNEL 'ch1']
123+
RESET REPLICA ALL FOR CHANNEL 'ch1';
124+
include/group_replication_end.inc
Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
include/only_with_option.inc [GLOBAL.replica_parallel_workers > 1]
2+
include/group_replication.inc [rpl_server_count=4]
3+
Warnings:
4+
Note #### Sending passwords in plain text without SSL/TLS is extremely insecure.
5+
Note #### Storing MySQL user name or password information in the connection metadata repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START REPLICA; see the 'START REPLICA Syntax' in the MySQL Manual for more information.
6+
[connection server1]
7+
8+
############################################################
9+
# 1. Bootstrap group on server1. Configure servers.
10+
# Start an inbound channel that replicates from server4.
11+
[connection server1]
12+
include/start_and_bootstrap_group_replication.inc
13+
SET SESSION sql_log_bin= 0;
14+
call mtr.add_suppression("The transaction '[a-z0-9\-]*:[0-9]*' will commit out of order with respect to its source to follow the group global order.");
15+
SET SESSION sql_log_bin= 1;
16+
# Adding debug point 'simulate_bgct_rpco_deadlock' to @@GLOBAL.debug
17+
CHANGE REPLICATION SOURCE TO SOURCE_HOST='127.0.0.1', SOURCE_USER='root', SOURCE_AUTO_POSITION=1, SOURCE_PORT=SERVER_4_PORT FOR CHANNEL 'ch1';
18+
Warnings:
19+
Note 1759 Sending passwords in plain text without SSL/TLS is extremely insecure.
20+
Note 1760 Storing MySQL user name or password information in the connection metadata repository is not secure and is therefore not recommended. Please consider using the USER and PASSWORD connection options for START REPLICA; see the 'START REPLICA Syntax' in the MySQL Manual for more information.
21+
include/start_slave.inc [FOR CHANNEL 'ch1']
22+
[connection server2]
23+
include/start_group_replication.inc
24+
[connection server3]
25+
include/start_group_replication.inc
26+
27+
############################################################
28+
# 2. Schedule transactions in inbound replication channel
29+
[connection server4]
30+
CREATE TABLE t1 (c1 INT PRIMARY KEY, c2 LONGTEXT);
31+
DROP TABLE t1;
32+
33+
############################################################
34+
# 3. There must be 0 applier threads on server1 with the state
35+
# 'Waiting for Binlog Group Commit ticket'.
36+
[connection server1]
37+
include/assert_grep.inc [There were transactions that did commit out of order with respect to its source to follow the group global order]
38+
39+
############################################################
40+
# 4. Clean up.
41+
[connection server4]
42+
include/sync_slave_sql_with_master.inc
43+
include/rpl_sync.inc
44+
[connection server3]
45+
include/stop_group_replication.inc
46+
[connection server2]
47+
include/stop_group_replication.inc
48+
[connection server1]
49+
# Removing debug point 'simulate_bgct_rpco_deadlock' from @@GLOBAL.debug
50+
include/stop_slave.inc [FOR CHANNEL 'ch1']
51+
RESET REPLICA ALL FOR CHANNEL 'ch1';
52+
include/stop_group_replication.inc
53+
include/group_replication_end.inc
Lines changed: 17 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,17 @@
1+
!include ../my.cnf
2+
3+
[mysqld.1]
4+
5+
[mysqld.2]
6+
7+
[mysqld.3]
8+
9+
[mysqld.4]
10+
binlog_transaction_dependency_tracking= WRITESET
11+
12+
[ENV]
13+
SERVER_MYPORT_3= @mysqld.3.port
14+
SERVER_MYSOCK_3= @mysqld.3.socket
15+
16+
SERVER_MYPORT_4= @mysqld.4.port
17+
SERVER_MYSOCK_4= @mysqld.4.socket

0 commit comments

Comments
 (0)