-
Notifications
You must be signed in to change notification settings - Fork 92
feat: support retryable errors in custom column generators #464
Description
Custom column generators currently wrap all exceptions in CustomColumnGenerationError, which the async scheduler treats as non-retryable. This means transient failures (503s, rate limits, timeouts) cause rows to be permanently dropped instead of retried in salvage rounds.
Problem
In custom.py, the generate method catches all exceptions and wraps them:
except Exception as e:
raise CustomColumnGenerationError(...) from eThe scheduler only retries exceptions in _RETRYABLE_MODEL_ERRORS (ModelInternalServerError, ModelRateLimitError, etc.). The original error is buried as __cause__ and never checked.
Proposed fix
If the original exception is already a retryable model error, re-raise it unwrapped:
except Exception as e:
if isinstance(e, _RETRYABLE_MODEL_ERRORS):
raise
raise CustomColumnGenerationError(...) from eThis gives custom generators that call model APIs (via the models dict) the same salvage/retry behavior as built-in LLM columns, while non-model errors remain non-retryable.
Impact
- Custom generators using
model_aliaseswould benefit from automatic retries on transient failures - No change for custom generators that don't interact with models
- Consistent behavior between
LLMTextColumnConfigandCustomColumnConfigwhen both hit the same provider errors