fix: handle discriminated unions in oneOf pruning validator#376
fix: handle discriminated unions in oneOf pruning validator#376andreatgretel merged 4 commits intomainfrom
Conversation
The pruning validator modifies instances in-place during oneOf validation. When trying a wrong variant, it strips properties needed by the correct variant, causing all variants to fail. Add a discriminator-aware oneOf validator that reads the discriminator mapping to select the correct variant directly, skipping the try-all-variants loop that causes the corruption. Fixes #375
Greptile SummaryThis PR fixes a bug where Pydantic discriminated union validation ( Key changes:
The implementation correctly assumes Pydantic-generated discriminator schemas where the mapping always points to declared
|
| Filename | Overview |
|---|---|
| packages/data-designer-engine/src/data_designer/engine/processing/gsonschema/validators.py | Adds _validate_one_of_with_discriminator to route oneOf validation through the discriminator mapping, preventing in-place pruning corruption. The function correctly implements the discriminator-based routing for Pydantic schemas. One minor issue: the return type annotation uses Any instead of Iterator[ValidationError], which should be fixed for proper type-checker support. |
| packages/data-designer-engine/tests/engine/processing/gsonschema/test_validators.py | Adds four new tests covering: discriminated union pruning with leaked fields in both directions, invalid discriminator value rejection, and non-discriminated oneOf fallback. Coverage is comprehensive for the changed logic. |
Flowchart
%%{init: {'theme': 'neutral'}}%%
flowchart TD
A["_validate_one_of_with_discriminator(validator, one_of, instance, schema)"] --> B{Has discriminator?\ninstance is dict?}
B -- No --> C["Fallback: Draft202012Validator.VALIDATORS['oneOf']\n(tries all variants — pruning bug still possible)"]
B -- Yes --> D{prop_name in instance\nAND mapping non-empty?}
D -- No --> C
D -- Yes --> E["matched_ref = mapping[instance[prop_name]]"]
E --> F{matched_ref is None?}
F -- Yes --> G["yield ValidationError\n'X is not a valid discriminator value'"]
F -- No --> H["matched_schema = {'$ref': matched_ref}"]
H --> I["errs = list(validator.descend(instance, matched_schema))"]
I --> J["prune_additional_properties fires\nin-place on instance\n(only removes non-variant fields)"]
J --> K["yield from errs\n(empty = valid, non-empty = invalid)"]
Last reviewed commit: 7d7a1a8
packages/data-designer-engine/src/data_designer/engine/processing/gsonschema/validators.py
Show resolved
Hide resolved
packages/data-designer-engine/src/data_designer/engine/processing/gsonschema/validators.py
Show resolved
Hide resolved
packages/data-designer-engine/tests/engine/processing/gsonschema/test_validators.py
Show resolved
Hide resolved
packages/data-designer-engine/tests/engine/processing/gsonschema/test_validators.py
Show resolved
Hide resolved
| def _validate_one_of_with_discriminator( | ||
| validator: Any, one_of: list[JSONSchemaT], instance: DataObjectT, schema: JSONSchemaT | ||
| ) -> Any: |
There was a problem hiding this comment.
Return type annotation too broad for a generator
_validate_one_of_with_discriminator is a generator function (uses yield/yield from), so its return type should reflect that rather than Any. This improves type-checker support and documents the generator contract.
| def _validate_one_of_with_discriminator( | |
| validator: Any, one_of: list[JSONSchemaT], instance: DataObjectT, schema: JSONSchemaT | |
| ) -> Any: | |
| def _validate_one_of_with_discriminator( | |
| validator: Any, one_of: list[JSONSchemaT], instance: DataObjectT, schema: JSONSchemaT | |
| ) -> Iterator[lazy.jsonschema.ValidationError]: |
You'll also need to add Iterator to the typing import at the top of the file.
Prompt To Fix With AI
This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/processing/gsonschema/validators.py
Line: 62-64
Comment:
Return type annotation too broad for a generator
`_validate_one_of_with_discriminator` is a generator function (uses `yield`/`yield from`), so its return type should reflect that rather than `Any`. This improves type-checker support and documents the generator contract.
```suggestion
def _validate_one_of_with_discriminator(
validator: Any, one_of: list[JSONSchemaT], instance: DataObjectT, schema: JSONSchemaT
) -> Iterator[lazy.jsonschema.ValidationError]:
```
You'll also need to add `Iterator` to the `typing` import at the top of the file.
How can I resolve this? If you propose a fix, please make it concise.
📋 Summary
Fixes validation failures when using Pydantic discriminated unions (e.g.
AlphaItem | BetaItem) withLLMStructuredColumnConfig. When the LLM leaks a property across variants, the pruning validator corrupts the instance by stripping properties during failed variant checks, causing all variants to fail.Fixes #375
🐛 Fixed
oneOfvalidation. When trying a wrong variant first, it strips properties that belong to the correct variant — by the time the correct variant is checked, its required fields are gone and validation fails with zero matches.🔄 Changes
🔧 Changed
validators.py— Added_validate_one_of_with_discriminator, aoneOfvalidator that reads the discriminator mapping to select the correct variant directly instead of trying all variants. Registered alongsideprune_additional_propertiesinextend_jsonschema_validator_with_pruning. Falls back to standardoneOffor schemas without a discriminator.🧪 Tests
test_validators.py— Added discriminated union schema fixture, parametrized test for leaked properties in both directions (alpha→beta, beta→alpha), and test for invalid discriminator values.🔍 Attention Areas
_validate_one_of_with_discriminator— This assumes discriminator mappings use$refvalues (which Pydantic always generates). Non-discriminatedoneOfschemas fall back to the original jsonschemaoneOfvalidator unchanged.🤖 Generated with AI