Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fix(compute): Respect skip_pg_catalog_updates in reconfigure() #10696

Open
wants to merge 1 commit into
base: main
Choose a base branch
from

Conversation

ololobus
Copy link
Member

@ololobus ololobus commented Feb 6, 2025

Problem

We respect skip_pg_catalog_updates at the initial start, but ignore at the follow-up /configure. Yet, it's used for storage->cplane->compute notify requests after migrations, shard split, etc. So every time we get them, applying the new config takes much longer than it should because we go through Postgres catalog checks. Cplane sets this flag, when it does serves notify attach call https://github.com/neondatabase/cloud/commit/9068c7d7433f943af2bc350e9fd59772867e622c

Related to inc-403, for example

Summary of changes

Look at skip_pg_catalog_updates in compute.reconfigure()

@ololobus ololobus requested a review from a team as a code owner February 6, 2025 14:49
@ololobus ololobus requested review from MMeent and tristan957 February 6, 2025 14:49
@ololobus ololobus changed the title fix(compute): Respect skip_pg_catalog_updates in /configure fix(compute): Respect skip_pg_catalog_updates in reconfigure Feb 6, 2025
@ololobus ololobus changed the title fix(compute): Respect skip_pg_catalog_updates in reconfigure fix(compute): Respect skip_pg_catalog_updates in reconfigure() Feb 6, 2025
Copy link
Contributor

@hlinnaka hlinnaka left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Makes sense. Does the control plane set skip_pg_catalog_updates correctly when reconfiguring for pageserver changes? And does it correctly not set it, when applying other changes?

@ololobus
Copy link
Member Author

ololobus commented Feb 6, 2025

Makes sense. Does the control plane set skip_pg_catalog_updates correctly when reconfiguring for pageserver changes?

Yeah, see this commit https://github.com/neondatabase/cloud/commit/9068c7d7433f943af2bc350e9fd59772867e622c

And does it correctly not set it, when applying other changes?

From what I see, the logic is 'apply unless the flag is set' and it's not set by default, so I'd say yes

Copy link

github-actions bot commented Feb 6, 2025

7425 tests run: 7041 passed, 32 failed, 352 skipped (full report)


Failures on Postgres 17

Failures on Postgres 16

Failures on Postgres 15

Failures on Postgres 14

# Run all failed tests locally:
scripts/pytest -vv -n $(nproc) -k "test_sharding_smoke[release-pg14] or test_sharding_smoke[release-pg14] or test_sharding_backpressure[release-pg14] or test_sharding_backpressure[release-pg14] or test_pageserver_compaction_circuit_breaker[release-pg15] or test_pageserver_compaction_circuit_breaker[release-pg15] or test_sharding_smoke[release-pg15] or test_sharding_smoke[release-pg15] or test_sharding_backpressure[release-pg15] or test_sharding_backpressure[release-pg15] or test_pageserver_compaction_circuit_breaker[release-pg16] or test_pageserver_compaction_circuit_breaker[release-pg16] or test_sharding_smoke[release-pg16] or test_sharding_smoke[release-pg16] or test_sharding_backpressure[release-pg16] or test_sharding_backpressure[release-pg16] or test_pageserver_compaction_circuit_breaker[release-pg17] or test_pageserver_compaction_circuit_breaker[release-pg17] or test_pageserver_compaction_circuit_breaker[release-pg17] or test_pageserver_compaction_circuit_breaker[release-pg17] or test_sharding_backpressure[release-pg17] or test_sharding_backpressure[release-pg17] or test_sharding_backpressure[debug-pg17] or test_sharding_backpressure[release-pg17] or test_sharding_backpressure[release-pg17] or test_sharding_gc[release-pg17] or test_sharding_gc[release-pg17] or test_sharding_gc[release-pg17] or test_sharding_gc[release-pg17] or test_sharding_smoke[debug-pg17] or test_sharding_smoke[release-pg17] or test_sharding_smoke[release-pg17]"
Flaky tests (3)

Postgres 17

Postgres 14

Test coverage report is not available

The comment gets automatically updated with the latest test results
ccbb85d at 2025-02-06T17:03:28.727Z :recycle:

@ololobus
Copy link
Member Author

ololobus commented Feb 6, 2025

I am not sure why so many sharding tests failed. If the list of pageserver changes, the endpoint should still be reconfigured. It looks like there were some implicit assumptions that the endpoint is reconfigured with catalog updates. I had a brief look, still not clear. Will continue tomorrow

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants