Fixes #791 Fixed a bug with non-idempotent item creation retries causing 'already in use' error#793
Conversation
…in use" error when API throttling occurs during POST request by adding error detection and recovery logic.
There was a problem hiding this comment.
Pull request overview
This PR addresses a race condition that occurs when API throttling causes stale cache data during item deployment. When an item creation request is throttled after server-side success but before client confirmation, retries fail with "already in use" errors, causing deployment failures.
Changes:
- Added exception handling in
_fabric_endpoint.pyto raise a descriptive error forItemDisplayNameAlreadyInUseAPI error code - Implemented recovery logic in
fabric_workspace.pyto catch "already in use" errors, re-fetch the item GUID, update caches, and proceed with UPDATE instead of CREATE - Changed control flow from
eliftoifat line 709 to allow update operations after recovery - Added test coverage for the new error handler in
_fabric_endpoint.py
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.
| File | Description |
|---|---|
| src/fabric_cicd/_common/_fabric_endpoint.py | Adds explicit handler for ItemDisplayNameAlreadyInUse error code that raises an exception with item name and context |
| src/fabric_cicd/fabric_workspace.py | Adds try/catch around item creation with recovery logic to re-fetch GUID and update caches when item already exists, plus control flow fix to allow update after recovery |
| tests/test__fabric_endpoint.py | Adds test to verify exception is raised for ItemDisplayNameAlreadyInUse error code |
| and response.headers.get("x-ms-public-api-error-code") == "ItemDisplayNameAlreadyInUse" | ||
| ): | ||
| response_json = response.json() if response.text else {} | ||
| item_name = response_json.get("message", "").replace("Requested '", "").split("'")[0] if response_json else "" |
There was a problem hiding this comment.
The item name extraction logic is fragile and may fail if the message format differs from expected. If response_json.get("message", "") returns an empty string or doesn't contain the expected format (e.g., no quotes), the extraction will produce an empty string or unexpected results. Consider using a safer extraction method with error handling or regex matching to handle variations in the API response format.
…em creation recovery functionality.
| try: | ||
| item_guid = self._lookup_item_attribute(self.workspace_id, item_type, item_name, "id") | ||
| except InputError: | ||
| item_guid = None |
There was a problem hiding this comment.
The exception handling only catches InputError when attempting to recover the item GUID. If _lookup_item_attribute fails due to an API error (not because the item wasn't found), that error will propagate up and replace the original "already in use" error, making it harder to diagnose the root cause.
Consider catching a broader exception type (like Exception) to ensure the original error is re-raised if recovery fails for any reason, or at least log the lookup failure before re-raising the original error.
| "'https://api.fabric.microsoft.com/v1/workspaces/test/items'. " | ||
| "Message: Requested 'TestNotebook' is already in use." | ||
| ) | ||
| ALREADY_IN_USE_ERROR_WITH_CODE = "Item 'TestNotebook' already exists (ItemDisplayNameAlreadyInUse)" |
There was a problem hiding this comment.
The error message used in this test ("Item 'TestNotebook' already exists (ItemDisplayNameAlreadyInUse)") does not match the actual error message that would be raised by the ItemDisplayNameAlreadyInUse handler in _fabric_endpoint.py line 317-318, which would be "Item 'TestNotebook' already exists in the workspace but was not found during initial scan."
This test is passing because it contains "itemdisplaynamealreadyinuse" in the error string, but it's testing with an error format that doesn't match the actual implementation. The test should use the actual error message format from the handler to ensure it accurately tests the recovery logic.
| ALREADY_IN_USE_ERROR_WITH_CODE = "Item 'TestNotebook' already exists (ItemDisplayNameAlreadyInUse)" | |
| ALREADY_IN_USE_ERROR_WITH_CODE = ( | |
| "Item 'TestNotebook' already exists in the workspace but was not found during initial scan." | |
| ) |
Description
This PR adds recovery logic to handle a race condition that occurs when API throttling delays cause the deployed_items cache to become stale during deployment. When an item creation (POST) request is throttled after the server has already created the item, the retry fails with an "already in use" error. Previously, this caused complete deployment failure.
Problem
When deploying items to a Fabric workspace under heavy API throttling:
Solution
_fabric_endpoint.py:
fabric_workspace.py:
Behavior After Fix
When this race condition occurs, deployment will now log:
[warn] Item 'Meter Activator' already exists (possible throttling race condition). Attempting to recover by fetching current state. Recovered item GUID: <guid>. Will update instead of create.Linked Issue: 791