Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

br: PiTR table filter online support #59281

Open
wants to merge 6 commits into
base: master
Choose a base branch
from

Conversation

Tristan1900
Copy link
Contributor

@Tristan1900 Tristan1900 commented Feb 7, 2025

What problem does this PR solve?

Issue Number: close #59280

Problem Summary:

What changed and how does it work?

Make sure when user wants to use PiTR table filter to restore to an online cluster, the flag --online is specified.

  1. Figure out all the tables needed for the entire PiTR at snapshot restore phase, and check current cluster to see any table exist so error out early.
  2. For tables created during snapshot backup, set tableMode to restore to block any user op
  3. Make sure to populate table name to final state for renamed tables to avoid any table name conflicts.
  4. During log replay at log restore phase, set all tableInfo to restoreMode and set table name to final name as well
  5. After restore finishes, set tableMode back to normal mode, so it can unblock user op and fetches the latest value from TiKV written by the log restore.

Check List

Tests

  • Unit test
  • Integration test
  • Manual test (add detailed scripts or steps below)
  • No need to test
    • I checked and no code files have been changed.

Side effects

  • Performance regression: Consumes more CPU
  • Performance regression: Consumes more Memory
  • Breaking backward compatibility

Documentation

  • Affects user behaviors
  • Contains syntax changes
  • Contains variable changes
  • Contains experimental features
  • Changes MySQL compatibility

Release note

Please refer to Release Notes Language Style Guide to write a quality release note.

None

Copy link

ti-chi-bot bot commented Feb 7, 2025

Skipping CI for Draft Pull Request.
If you want CI signal for your change, please convert it to an actual PR.
You can still manually trigger a test run with /test all

@ti-chi-bot ti-chi-bot bot added do-not-merge/needs-tests-checked release-note-none Denotes a PR that doesn't merit a release note. do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files. labels Feb 7, 2025
Copy link

tiprow bot commented Feb 7, 2025

Hi @Tristan1900. Thanks for your PR.

PRs from untrusted users cannot be marked as trusted with /ok-to-test in this repo meaning untrusted PR authors can never trigger tests themselves. Collaborators can still trigger tests on the PR using /test all.

I understand the commands that are listed here.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository.

@Tristan1900 Tristan1900 force-pushed the table-filter-online branch from 7309304 to 0d4d25e Compare March 3, 2025 23:34
@Tristan1900 Tristan1900 marked this pull request as ready for review March 3, 2025 23:34
@ti-chi-bot ti-chi-bot bot removed the do-not-merge/work-in-progress Indicates that a PR should not merge because it is a work in progress. label Mar 3, 2025
Copy link

codecov bot commented Mar 3, 2025

Codecov Report

Attention: Patch coverage is 45.69983% with 322 lines in your changes missing coverage. Please review.

Project coverage is 74.3109%. Comparing base (c508e4b) to head (ff865e7).

Additional details and impacted files
@@               Coverage Diff                @@
##             master     #59281        +/-   ##
================================================
+ Coverage   73.1368%   74.3109%   +1.1740%     
================================================
  Files          1707       1754        +47     
  Lines        471602     479881      +8279     
================================================
+ Hits         344915     356604     +11689     
+ Misses       105487     100672      -4815     
- Partials      21200      22605      +1405     
Flag Coverage Δ
integration 44.5650% <45.6998%> (?)

Flags with carried forward coverage won't be shown. Click here to find out more.

Components Coverage Δ
dumpling 52.6910% <ø> (ø)
parser ∅ <ø> (∅)
br 60.2041% <45.6998%> (+12.9671%) ⬆️
🚀 New features to boost your workflow:
  • Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

@Tristan1900 Tristan1900 force-pushed the table-filter-online branch 4 times, most recently from 05b0129 to ac16ebe Compare March 9, 2025 22:17
Copy link
Contributor

@Leavrth Leavrth left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

rest LGTM

Comment on lines 79 to 90
// MarkTableDeleted marks a table as deleted
func (info *LogBackupTableHistoryManager) MarkTableDeleted(tableId int64) {
info.deletedTables[tableId] = struct{}{}
}

// MarkTablesDeleted marks multiple tables as deleted
func (info *LogBackupTableHistoryManager) MarkTablesDeleted(tableIds []int64) {
for _, tableId := range tableIds {
info.MarkTableDeleted(tableId)
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
// MarkTableDeleted marks a table as deleted
func (info *LogBackupTableHistoryManager) MarkTableDeleted(tableId int64) {
info.deletedTables[tableId] = struct{}{}
}
// MarkTablesDeleted marks multiple tables as deleted
func (info *LogBackupTableHistoryManager) MarkTablesDeleted(tableIds []int64) {
for _, tableId := range tableIds {
info.MarkTableDeleted(tableId)
}
}
// MarkTablesDeleted marks multiple tables as deleted
func (info *LogBackupTableHistoryManager) MarkTablesDeleted(tableIds []int64) {
for _, tableId := range tableIds {
info.deletedTables[tableId] = struct{}{}
}
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems it is only used by unit tests

@@ -129,7 +143,7 @@ func (tm *TableMappingManager) ParseMetaKvAndUpdateIdMapping(e *kv.Entry, cf str
return errors.Trace(err)
}
if value != nil {
return tm.parseTableValueAndUpdateIdMapping(dbID, value)
return tm.parseTableValueAndUpdateIdMapping(dbID, value, collector)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should here return nil if value == nil?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah it fall through to the end of this method and return nil

@@ -1413,6 +1413,11 @@ func (rc *LogClient) restoreAndRewriteMetaKvEntries(
} else if newEntry == nil {
continue
}
// sanity check key will never to nil, otherwise will write invalid format data to TiKV
if newEntry.Key == nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we can use if len(newEntry.Key) == 0 to cover nil and []byte{}.

@@ -112,7 +112,8 @@ func RunRestoreRaw(c context.Context, g glue.Glue, cmdName string, cfg *RestoreR
return errors.Trace(err)
}
reader := metautil.NewMetaReader(backupMeta, s, &cfg.CipherInfo)
if err = client.LoadSchemaIfNeededAndInitClient(c, backupMeta, u, reader, true, cfg.StartKey, cfg.EndKey); err != nil {
if err = client.LoadSchemaIfNeededAndInitClient(c, backupMeta, u, reader, true, cfg.StartKey, cfg.EndKey,
cfg.ExplicitFilter, isFullRestore(cmdName)); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

we use ./br restore raw, so the cmdName is always Raw Restore. So is it OK to use isFullRestore("Raw Restore") here?

Copy link
Contributor Author

@Tristan1900 Tristan1900 Mar 11, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

isFullRestore("Raw Restore") can work and will return false. I made the change cuz original code only set the initFullCLusterRestore under isFullRestore so raw will never enter

	if isFullRestore(cmdName) {
		if client.NeedCheckFreshCluster(cfg.ExplicitFilter, checkpointFirstRun) {
			if err = client.CheckTargetClusterFresh(ctx); err != nil {
				return errors.Trace(err)
			}
		}
		// todo: move this check into InitFullClusterRestore, we should move restore config into a separate package
		// to avoid import cycle problem which we won't do it in this pr, then refactor this
		//
		// if it's point restore and reached here, then cmdName=FullRestoreCmd and len(cfg.FullBackupStorage) > 0
		if cfg.WithSysTable {
			client.InitFullClusterRestore(cfg.ExplicitFilter)
		}
	} else if client.IsFull() && checkpointFirstRun && cfg.CheckRequirements {
		if err := checkTableExistence(ctx, mgr, tables); err != nil {
			canRestoreSchedulers = true
			return errors.Trace(err)
		}
	}

ctx context.Context,
g glue.Glue,
cfg *SnapshotRestoreConfig,
domain *domain.Domain,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

unused parameter

//
// if it's point restore and reached here, then cmdName=FullRestoreCmd and len(cfg.FullBackupStorage) > 0
if cfg.WithSysTable {
client.InitFullClusterRestore(cfg.ExplicitFilter)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cfg.WithSysTable is lost now. We should also check it in client.InitFullClusterRestore

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nice catch, added!

}
log.Info("checkpoint status in restore", zap.Bool("enabled", cfg.UseCheckpoint), zap.Bool("exists", cpEnabledAndExists))
if err := checkMandatoryClusterRequirements(client, cfg, cpEnabledAndExists, cmdName); err != nil {
return errors.Trace(err)
}
if err = VerifyDBAndTableInBackup(client.GetDatabases(), cfg.RestoreConfig); err != nil {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

VerifyDBAndTableInBackup is already in the checkMandatoryClusterRequirements. We can remove it.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oh right, it was due to merge conflicts.


// update table mapping manager with new table ids if pitr
if isPiTR {
cfg.tableMappingManager.UpdateDownstreamIds(tables, client.GetDomain())
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

UpdateDownstreamIds doesn't need to get tableInfo from domain because createdTables already keeps them. See

ct := &CreatedTable{

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Besides, tableMappingManager.MergeBaseDBReplace(dbReplaces) in pkg/task/stream.go seems duplicated?

Copy link

ti-chi-bot bot commented Mar 10, 2025

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: Leavrth
Once this PR has been reviewed and has the lgtm label, please assign ailinkid, wjhuang2016 for approval, ensuring that each of them provides their approval before proceeding. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@ti-chi-bot ti-chi-bot bot added the needs-1-more-lgtm Indicates a PR needs 1 more LGTM. label Mar 10, 2025
Copy link

ti-chi-bot bot commented Mar 10, 2025

[LGTM Timeline notifier]

Timeline:

  • 2025-03-10 14:06:42.478484661 +0000 UTC m=+191958.084013599: ☑️ agreed by Leavrth.

@Tristan1900 Tristan1900 force-pushed the table-filter-online branch from d5993d6 to f9f47fe Compare March 17, 2025 19:13
Signed-off-by: Wenqi Mou <[email protected]>
Signed-off-by: Wenqi Mou <[email protected]>
Signed-off-by: Wenqi Mou <[email protected]>
Signed-off-by: Wenqi Mou <[email protected]>
Signed-off-by: Wenqi Mou <[email protected]>
@Tristan1900 Tristan1900 force-pushed the table-filter-online branch from f9f47fe to ff865e7 Compare March 17, 2025 19:17
Copy link

ti-chi-bot bot commented Mar 17, 2025

@Tristan1900: The following tests failed, say /retest to rerun all failed tests or /retest-required to rerun all mandatory failed tests:

Test name Commit Details Required Rerun command
idc-jenkins-ci-tidb/unit-test ff865e7 link true /test unit-test
pull-unit-test-ddlv1 ff865e7 link true /test pull-unit-test-ddlv1
idc-jenkins-ci-tidb/check_dev ff865e7 link true /test check-dev
idc-jenkins-ci-tidb/check_dev_2 ff865e7 link true /test check-dev2
idc-jenkins-ci-tidb/build ff865e7 link true /test build
pull-lightning-integration-test ff865e7 link true /test pull-lightning-integration-test
pull-br-integration-test ff865e7 link true /test pull-br-integration-test

Full PR test history. Your PR dashboard.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes-sigs/prow repository. I understand the commands that are listed here.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
needs-1-more-lgtm Indicates a PR needs 1 more LGTM. release-note-none Denotes a PR that doesn't merit a release note. sig/planner SIG: Planner size/XXL Denotes a PR that changes 1000+ lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

br: add online support for PiTR table filer
2 participants