Skip to content

feat: support retryable errors in custom column generators #464

@andreatgretel

Description

@andreatgretel

Custom column generators currently wrap all exceptions in CustomColumnGenerationError, which the async scheduler treats as non-retryable. This means transient failures (503s, rate limits, timeouts) cause rows to be permanently dropped instead of retried in salvage rounds.

Problem

In custom.py, the generate method catches all exceptions and wraps them:

except Exception as e:
    raise CustomColumnGenerationError(...) from e

The scheduler only retries exceptions in _RETRYABLE_MODEL_ERRORS (ModelInternalServerError, ModelRateLimitError, etc.). The original error is buried as __cause__ and never checked.

Proposed fix

If the original exception is already a retryable model error, re-raise it unwrapped:

except Exception as e:
    if isinstance(e, _RETRYABLE_MODEL_ERRORS):
        raise
    raise CustomColumnGenerationError(...) from e

This gives custom generators that call model APIs (via the models dict) the same salvage/retry behavior as built-in LLM columns, while non-model errors remain non-retryable.

Impact

  • Custom generators using model_aliases would benefit from automatic retries on transient failures
  • No change for custom generators that don't interact with models
  • Consistent behavior between LLMTextColumnConfig and CustomColumnConfig when both hit the same provider errors

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions