Skip to content

Fixes #791 Fixed a bug with non-idempotent item creation retries causing 'already in use' error#793

Open
slavatrofimov wants to merge 4 commits intomicrosoft:mainfrom
slavatrofimov:fix/handle-item-already-exists-on-retry
Open

Fixes #791 Fixed a bug with non-idempotent item creation retries causing 'already in use' error#793
slavatrofimov wants to merge 4 commits intomicrosoft:mainfrom
slavatrofimov:fix/handle-item-already-exists-on-retry

Conversation

@slavatrofimov
Copy link

Description

This PR adds recovery logic to handle a race condition that occurs when API throttling delays cause the deployed_items cache to become stale during deployment. When an item creation (POST) request is throttled after the server has already created the item, the retry fails with an "already in use" error. Previously, this caused complete deployment failure.

Problem

When deploying items to a Fabric workspace under heavy API throttling:

  1. The deployed_items cache is populated once at deployment start
  2. API throttling can extend deployment time significantly
  3. If a POST request to create an item succeeds server-side but returns 429 before the client receives confirmation, the client retries
  4. The retry fails with ItemDisplayNameAlreadyInUse (HTTP 400) because the item already exists
  5. This error was unhandled, causing deployment failure

Solution

_fabric_endpoint.py:

  • Added explicit handler for ItemDisplayNameAlreadyInUse error code that raises an exception with a descriptive message

fabric_workspace.py:

  • Added try/catch block around item creation in _publish_item()
  • On "already in use" error, attempts recovery by:
  • Re-fetching the item's GUID using existing _lookup_item_attribute() function
  • Updating repository_items with the recovered GUID
  • Updating deployed_items cache to ensure folder move logic works correctly
  • Setting a synthetic api_response for complete response tracking
  • Proceeding with UPDATE instead of CREATE

Behavior After Fix

When this race condition occurs, deployment will now log:
[warn] Item 'Meter Activator' already exists (possible throttling race condition). Attempting to recover by fetching current state. Recovered item GUID: <guid>. Will update instead of create.

Linked Issue: 791

…in use" error when API throttling occurs during POST request by adding error detection and recovery logic.
Copilot AI review requested due to automatic review settings February 4, 2026 07:03
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR addresses a race condition that occurs when API throttling causes stale cache data during item deployment. When an item creation request is throttled after server-side success but before client confirmation, retries fail with "already in use" errors, causing deployment failures.

Changes:

  • Added exception handling in _fabric_endpoint.py to raise a descriptive error for ItemDisplayNameAlreadyInUse API error code
  • Implemented recovery logic in fabric_workspace.py to catch "already in use" errors, re-fetch the item GUID, update caches, and proceed with UPDATE instead of CREATE
  • Changed control flow from elif to if at line 709 to allow update operations after recovery
  • Added test coverage for the new error handler in _fabric_endpoint.py

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
src/fabric_cicd/_common/_fabric_endpoint.py Adds explicit handler for ItemDisplayNameAlreadyInUse error code that raises an exception with item name and context
src/fabric_cicd/fabric_workspace.py Adds try/catch around item creation with recovery logic to re-fetch GUID and update caches when item already exists, plus control flow fix to allow update after recovery
tests/test__fabric_endpoint.py Adds test to verify exception is raised for ItemDisplayNameAlreadyInUse error code

and response.headers.get("x-ms-public-api-error-code") == "ItemDisplayNameAlreadyInUse"
):
response_json = response.json() if response.text else {}
item_name = response_json.get("message", "").replace("Requested '", "").split("'")[0] if response_json else ""
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The item name extraction logic is fragile and may fail if the message format differs from expected. If response_json.get("message", "") returns an empty string or doesn't contain the expected format (e.g., no quotes), the extraction will produce an empty string or unexpected results. Consider using a safer extraction method with error handling or regex matching to handle variations in the API response format.

Copilot uses AI. Check for mistakes.
@slavatrofimov slavatrofimov marked this pull request as draft February 4, 2026 07:28
@slavatrofimov slavatrofimov changed the title Fixes # 791 - Fixed a bug with non-idempotent item creation retries causing "already in use" error Fixes #791 - Fixed a bug with non-idempotent item creation retries causing 'already in use' error Feb 4, 2026
@slavatrofimov slavatrofimov changed the title Fixes #791 - Fixed a bug with non-idempotent item creation retries causing 'already in use' error Fixes #791 Fixed a bug with non-idempotent item creation retries causing 'already in use' error Feb 4, 2026
@slavatrofimov slavatrofimov marked this pull request as ready for review February 4, 2026 07:49
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 5 out of 5 changed files in this pull request and generated 2 comments.

Comment on lines +677 to +680
try:
item_guid = self._lookup_item_attribute(self.workspace_id, item_type, item_name, "id")
except InputError:
item_guid = None
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The exception handling only catches InputError when attempting to recover the item GUID. If _lookup_item_attribute fails due to an API error (not because the item wasn't found), that error will propagate up and replace the original "already in use" error, making it harder to diagnose the root cause.

Consider catching a broader exception type (like Exception) to ensure the original error is re-raised if recovery fails for any reason, or at least log the lookup failure before re-raising the original error.

Copilot uses AI. Check for mistakes.
"'https://api.fabric.microsoft.com/v1/workspaces/test/items'. "
"Message: Requested 'TestNotebook' is already in use."
)
ALREADY_IN_USE_ERROR_WITH_CODE = "Item 'TestNotebook' already exists (ItemDisplayNameAlreadyInUse)"
Copy link

Copilot AI Feb 4, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The error message used in this test ("Item 'TestNotebook' already exists (ItemDisplayNameAlreadyInUse)") does not match the actual error message that would be raised by the ItemDisplayNameAlreadyInUse handler in _fabric_endpoint.py line 317-318, which would be "Item 'TestNotebook' already exists in the workspace but was not found during initial scan."

This test is passing because it contains "itemdisplaynamealreadyinuse" in the error string, but it's testing with an error format that doesn't match the actual implementation. The test should use the actual error message format from the handler to ensure it accurately tests the recovery logic.

Suggested change
ALREADY_IN_USE_ERROR_WITH_CODE = "Item 'TestNotebook' already exists (ItemDisplayNameAlreadyInUse)"
ALREADY_IN_USE_ERROR_WITH_CODE = (
"Item 'TestNotebook' already exists in the workspace but was not found during initial scan."
)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant

Comments