Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

drop old table twice at the same time #737

Closed
luoxiaocai91 opened this issue May 17, 2019 · 9 comments
Closed

drop old table twice at the same time #737

luoxiaocai91 opened this issue May 17, 2019 · 9 comments
Labels

Comments

@luoxiaocai91
Copy link

luoxiaocai91 commented May 17, 2019

it drop old table twice at the same time,lead to slave delay.because the slave open MTS.
the slave processlist:

2019-05-17 00:03:01 | 5619159 | system user |   | payorder05 | Connect | 70 | Waiting for preceding transaction to commit | DROP TABLE IF EXISTS `_xx_del` /* generated by server */ | 70474 | 0 |  
-- | -- | -- | -- | -- | -- | -- | -- | -- | -- | -- | --
2019-05-17 00:03:01 | 5619169 | system user |   |   | Connect | 78807 | Waiting for an event from Coordinator |   | 78806907 | 0 |  
2019-05-17 00:03:01 | 5619168 | system user |   |   | Connect | 44253 | Waiting for an event from Coordinator |   | 44253003 | 0 |  
2019-05-17 00:03:01 | 5619166 | system user |   |   | Connect | 17994 | Waiting for an event from Coordinator |   | 17993946 | 0 |  
2019-05-17 00:03:01 | 5619157 | system user |   | payorder05 | Connect | 70 | Waiting for table metadata lock | DROP TABLE IF EXISTS `_xx_del` /* generated by server */ | 70475 | 0

the command:

gh-ost_v1.0.40 
--max-load=Threads_running=32             
--critical-load=Threads_running=48             
--chunk-size=1000             
--throttle-control-replicas=\"xxx:xx,xxx:xxx\"             
--max-lag-millis=1000             
--initially-drop-old-table             
--initially-drop-ghost-table             
--initially-drop-socket-file 
--ok-to-drop-table 
--conf=\"xxx\"                 
--host=\"xx\" 
--port=\"xx\" 
--database=\"xx\" 
--table=\"xx\"                 
--verbose 
--alter=\" xxx\"                 
--switch-to-rbr=false                 
--cut-over=default                 
--assume-rbr=true                 
--exact-rowcount=false                 
--default-retries=3                 
--critical-load-interval-millis=3000                 
--panic-flag-file=/tmp/ghost.panic.flag.xx.xx.66506955                 
--postpone-cut-over-flag-file=/tmp/ghost.postpone.flag.xx.xx.66506955                 
--throttle-flag-file=/tmp/ghost.throttle.flag.xx.xx.66506955                 
--execute 
--allow-on-master

mysql version:5.7.21-21-log Percona Server (GPL), Release 21, Revision 2a37e4e
the log:

Copy: 7059000/7981273 88.4%; Applied: 1056; Backlog: 0/1000; Time: 18h7m0s(total), 18h7m0s(copy); streamer: mysql-bin.007256:570606709; State: migrating; ETA: 2h22m1s
2019/05/17 00:00:34 binlogsyncer.go:568: �[0;37m[info] rotate to (mysql-bin.007257, 4)�[0m
2019/05/17 00:00:34 binlogsyncer.go:568: �[0;37m[info] rotate to (mysql-bin.007257, 4)�[0m
2019-05-17 00:00:34 INFO rotate to next log name: mysql-bin.007257
2019-05-17 00:00:34 INFO rotate to next log name: mysql-bin.007257
Copy: 7458000/7981273 93.4%; Applied: 1833; Backlog: 0/1000; Time: 18h7m30s(total), 18h7m30s(copy); streamer: mysql-bin.007257:21960471; State: migrating; ETA: 1h16m18s
Copy: 7853000/7981273 98.4%; Applied: 2568; Backlog: 0/1000; Time: 18h8m0s(total), 18h8m0s(copy); streamer: mysql-bin.007257:517246258; State: migrating; ETA: 17m46s
Copy: 8241000/7981273 103.3%; Applied: 3062; Backlog: 0/1000; Time: 18h8m30s(total), 18h8m30s(copy); streamer: mysql-bin.007257:908197679; State: migrating; ETA: due
2019/05/17 00:01:44 binlogsyncer.go:568: �[0;37m[info] rotate to (mysql-bin.007258, 4)�[0m
2019/05/17 00:01:44 binlogsyncer.go:568: �[0;37m[info] rotate to (mysql-bin.007258, 4)�[0m
2019-05-17 00:01:44 INFO rotate to next log name: mysql-bin.007258
2019-05-17 00:01:44 INFO rotate to next log name: mysql-bin.007258
2019-05-17 00:01:48 INFO Row copy complete
Copy: 8407300/8407300 100.0%; Applied: 3316; Backlog: 0/1000; Time: 18h8m42s(total), 18h8m42s(copy); streamer: mysql-bin.007258:43809488; State: migrating; ETA: due
2019-05-17 00:01:48 INFO Grabbing voluntary lock: gh-ost.8032128.lock
2019-05-17 00:01:48 INFO Setting LOCK timeout as 6 seconds
2019-05-17 00:01:48 INFO Looking for magic cut-over table
2019-05-17 00:01:48 INFO Creating magic cut-over table `xx`.`_xx_del`
2019-05-17 00:01:48 INFO Magic cut-over table created
2019-05-17 00:01:48 INFO Locking `xx`.`xx`, `xx`.`_xx_del`
2019-05-17 00:01:48 INFO Tables locked
2019-05-17 00:01:48 INFO Session locking original & magic tables is 8032128
2019-05-17 00:01:48 INFO Writing changelog state: AllEventsUpToLockProcessed:1558022508196974214
2019-05-17 00:01:48 INFO Waiting for events up to lock
Copy: 8407300/8407300 100.0%; Applied: 3371; Backlog: 19/1000; Time: 18h8m45s(total), 18h8m42s(copy); streamer: mysql-bin.007258:106436102; State: migrating; ETA: due
2019-05-17 00:01:51 ERROR Timeout while waiting for events up to lock
2019-05-17 00:01:51 ERROR 2019-05-17 00:01:51 ERROR Timeout while waiting for events up to lock
2019-05-17 00:01:51 INFO Looking for magic cut-over table
2019-05-17 00:01:51 INFO Will now proceed to drop magic table and unlock tables
2019-05-17 00:01:51 INFO Dropping magic cut-over table
2019-05-17 00:01:51 INFO Dropping magic cut-over table
2019-05-17 00:01:51 INFO Droppping table `xx`.`_xx_del`
2019-05-17 00:01:51 INFO Table dropped
2019-05-17 00:01:51 INFO Releasing lock from `xx`.`xx`, `xx`.`_xx_del`
2019-05-17 00:01:51 INFO Tables unlocked

we hope drop table is serial or only drop once
Thank you!

@MOON-CLJ
Copy link
Contributor

MOON-CLJ commented May 17, 2019

@shlomi-noach after investigate the code,when "Timeout while waiting for events up to lock",drop old table happened at

gh-ost/go/logic/applier.go

Lines 854 to 859 in ffc6c40

log.Infof("Dropping magic cut-over table")
query = fmt.Sprintf(`drop /* gh-ost */ table if exists %s.%s`,
sql.EscapeName(this.migrationContext.DatabaseName),
sql.EscapeName(this.migrationContext.GetOldTableName()),
)
if _, err := tx.Exec(query); err != nil {

and also at

gh-ost/go/logic/migrator.go

Lines 611 to 614 in ffc6c40

defer func() {
okToUnlockTable <- true
this.applier.DropAtomicCutOverSentryTableIfExists()
}()

there is a race condition between these two drop old table,and may send twice to mysql,and due to "mysql slave enable MTS(multi-threaded-replication)" ,those two drop lead a dead lock。see processlist in above comment。

@MOON-CLJ
Copy link
Contributor

@shlomi-noach i am thinking sentry table create in

func (this *Applier) AtomicCutOverMagicLock(sessionIdChan chan int64, tableLocked chan<- error, okToUnlockTable <-chan bool, tableUnlocked chan<- error) error {
func AtomicCutOverMagicLock,why clean it in
func (this *Migrator) atomicCutOver() (err error) {
atomicCutOver。
can we mv

gh-ost/go/logic/migrator.go

Lines 611 to 614 in ffc6c40

defer func() {
okToUnlockTable <- true
this.applier.DropAtomicCutOverSentryTableIfExists()
}()
DropAtomicCutOverSentryTableIfExists to AtomicCutOverMagicLock?

@MOON-CLJ
Copy link
Contributor

ping @shlomi-noach , we need some feedback,thx in advance。

@shlomi-noach
Copy link
Contributor

Hi @MOON-CLJ , I'm sorry, it'll take me a few days to look into this. Thank you for all your work!

@liu2178855
Copy link

When will gh-ost fix the bug,there were many times fault on the product envrienment when I altered tables

@shlomi-noach
Copy link
Contributor

I will merge this downstream, at openark#10

@vleejl
Copy link

vleejl commented Nov 17, 2021

@shlomi-noach i am thinking sentry table create in

func (this *Applier) AtomicCutOverMagicLock(sessionIdChan chan int64, tableLocked chan<- error, okToUnlockTable <-chan bool, tableUnlocked chan<- error) error {

func AtomicCutOverMagicLock,why clean it in

func (this *Migrator) atomicCutOver() (err error) {

atomicCutOver。
can we mv

gh-ost/go/logic/migrator.go

Lines 611 to 614 in ffc6c40

defer func() {
okToUnlockTable <- true
this.applier.DropAtomicCutOverSentryTableIfExists()
}()

DropAtomicCutOverSentryTableIfExists to AtomicCutOverMagicLock?

Hi, do you know why DropAtomicCutOverSentryTableIfExists can not move to AtomicCutOverMagicLock?

@timvaillancourt
Copy link
Collaborator

It appears this was resolved with #755. Closing

Please open a new issue if this is not the case 🙏

@xiaoxuanfeng
Copy link

xiaoxuanfeng commented Apr 21, 2023

After waiting for AllEventsUpToLockProcessed to timeout, the database instance experienced a large amount of metadata waiting for 60 seconds。
Because the sentence "unlock tables" cannot be executed until 60 seconds

mysql version:8.0.24 ghost version: 1.1.5

the log:
2023-03-17 18:13:55 INFO Grabbing voluntary lock: gh-ost.34100417.lock
2023-03-17 18:13:55 INFO Setting LOCK timeout as 6 seconds
2023-03-17 18:13:55 INFO Looking for magic cut-over table
2023-03-17 18:13:55 INFO Creating magic cut-over table db1._test_20230317181349_del
2023-03-17 18:13:55 INFO Magic cut-over table created
2023-03-17 18:13:55 INFO Locking db1.test, db1._test_20230317181349_del
2023-03-17 18:13:55 INFO Tables locked
2023-03-17 18:13:55 INFO Session locking original & magic tables is 34100417

2023-03-17 18:13:55 INFO Writing changelog state: AllEventsUpToLockProcessed:1679048035119299226
2023-03-17 18:13:55 INFO Waiting for events up to lock
2023-03-17 18:13:58 ERROR Timeout while waiting for events up to lock
2023-03-17 18:13:58 ERROR 2023-03-17 18:13:58 ERROR Timeout while waiting for events up to lock

2023-03-17 18:13:58 INFO Looking for magic cut-over table
2023-03-17 18:13:58 INFO Will now proceed to drop magic table and unlock tables
2023-03-17 18:13:58 INFO Dropping magic cut-over table
2023-03-17 18:13:58 INFO Dropping magic cut-over table

2023-03-17 18:13:58 INFO Dropping table db1._test_20230317181349_del

2023-03-17 18:14:58 ERROR Error 1205: Lock wait timeout exceeded; try restarting transaction
2023-03-17 18:14:58 INFO Releasing lock from db1.test, db1._test_20230317181349_del
2023-03-17 18:14:58 INFO Tables unlocked

After AllEventsUpToLockProcessed to timeout, execute the following code:
The 'drop table_test_20230317181349_del' statement is waiting for a metadata lock until it defaults to waiting for 60 seconds for the metadata lock to exit.

image

The code marked in red waits until the above execution ends before executing the "UNLOCK TABLES" release table
image

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

7 participants