Skip to content

Conversation

@pyramation
Copy link
Contributor

Fix concurrent role creation race condition with advisory locks

Summary

Fixes the intermittent CI error: ERROR: duplicate key value violates unique constraint "pg_authid_rolname_index" that occurs when multiple processes try to create the same PostgreSQL role concurrently.

Root Cause: Under concurrent execution, CREATE ROLE can hit the pg_authid_rolname_index unique constraint before PostgreSQL's duplicate object detection runs, resulting in unique_violation (error code 23505) instead of duplicate_object (error code 42710).

Solution:

  • Added pg_advisory_xact_lock(42, hashtext(rolname)) to serialize role creation per role name
  • Updated exception handlers to catch both duplicate_object OR unique_violation
  • Applied consistently across all 4 role creation code paths:
    • bootstrap-roles.sql (anonymous, authenticated, administrator)
    • bootstrap-test-roles.sql (app_user, app_admin)
    • LaunchQLInit.bootstrapDbRoles() (custom users)
    • DbAdmin.createUserRole() (pgsql-test users)

The advisory lock prevents the race entirely, while catching both exceptions provides defense in depth.

Review & Testing Checklist for Human

  • Test under actual concurrent load: Run multiple parallel processes that try to create the same role simultaneously to verify the fix eliminates the race condition (CI tests run sequentially and may not catch this)
  • Verify no other role creation code paths were missed: Search the codebase for any other CREATE ROLE statements that might need the same fix
  • Check advisory lock namespace: Confirm that namespace 42 doesn't collide with other advisory locks used elsewhere in the system
  • Test existing functionality: Verify that role creation still works correctly in normal (non-concurrent) scenarios and that the advisory locks don't introduce unexpected blocking

Recommended Test Plan

  1. Run the existing CI tests to ensure no regressions
  2. Manually test concurrent role creation by running lql admin-users add --test --yes in multiple parallel shells
  3. Check that the error no longer occurs in CI logs over multiple runs

Notes

  • Advisory locks are transaction-scoped (pg_advisory_xact_lock) and will be automatically released at transaction end
  • The hashtext() function provides good distribution of lock keys based on role names
  • This fix is compatible with PostgreSQL 13.3+ (the version used in CI)
  • Performance impact should be minimal since role creation is infrequent

Link to Devin run: https://app.devin.ai/sessions/2c33cab4511f4a7897e29001895487df
Requested by: Dan Lynch (@pyramation)

- Add advisory locks (pg_advisory_xact_lock) to serialize role creation
- Catch both duplicate_object and unique_violation exceptions
- Fixes error: duplicate key value violates unique constraint pg_authid_rolname_index
- Applied to all role creation code paths:
  - bootstrap-roles.sql (anonymous, authenticated, administrator)
  - bootstrap-test-roles.sql (app_user, app_admin)
  - LaunchQLInit.bootstrapDbRoles()
  - DbAdmin.createUserRole()

Under concurrent execution, CREATE ROLE can hit the pg_authid_rolname_index
unique constraint before the duplicate_object check, resulting in
unique_violation (23505) instead of duplicate_object (42710). Advisory
locks prevent the race entirely, and catching both exceptions provides
defense in depth.

Co-Authored-By: Dan Lynch <[email protected]>
@devin-ai-integration
Copy link
Contributor

🤖 Devin AI Engineer

I'll be helping with this pull request! Here's what you should know:

✅ I will automatically:

  • Address comments on this PR. Add '(aside)' to your comment to have me ignore it.
  • Look at CI failures and help fix them

Note: I can only respond to comments from users who have write access to this repository.

⚙️ Control Options:

  • Disable automatic comment and CI monitoring

- Add IF NOT EXISTS checks before every CREATE ROLE statement
- Add membership pre-checks before every GRANT statement
- Keep advisory locks for serialization (with-locks variant)
- Keep both exception handlers (duplicate_object OR unique_violation)
- Defense-in-depth: pre-check + lock + exceptions

This pedantic approach ensures maximum safety against concurrent
role creation and membership grant races.

Co-Authored-By: Dan Lynch <[email protected]>
devin-ai-integration bot added a commit that referenced this pull request Nov 18, 2025
This test demonstrates the race condition that occurs when multiple
processes attempt to create the same PostgreSQL role simultaneously.

On the main branch, this test FAILS with unique_violation error when
multiple concurrent CREATE ROLE statements hit the pg_authid_rolname_index.

The test includes:
- Concurrent role creation test (reproduces the bug)
- Performance benchmarks at concurrency levels 2, 4, 8
- Concurrent GRANT operations test
- High concurrency stress test with retries

This PR intentionally fails on main to demonstrate the problem.
Compare against PR #325 (with locks) and PR #326 (no locks) to see fixes.

Co-Authored-By: Dan Lynch <[email protected]>
@devin-ai-integration
Copy link
Contributor

Latest Updates

Added comprehensive pre-existence checks to all CREATE ROLE and GRANT statements for defense-in-depth:

  • ✅ Pre-checks using IF NOT EXISTS and membership queries
  • ✅ Advisory locks for serialization
  • ✅ Both exception handlers (duplicate_object OR unique_violation)

Related PRs

Testing

The concurrent test in PR #327 should PASS with this fix applied.


Link to Devin run: https://app.devin.ai/sessions/2c33cab4511f4a7897e29001895487df
Requested by: Dan Lynch ([email protected]) / @pyramation

@pyramation pyramation marked this pull request as draft December 2, 2025 09:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants