fix: support nested field access in schema transform templates by andreatgretel · Pull Request #435 · NVIDIA-NeMo/DataDesigner

andreatgretel · 2026-03-18T12:59:14Z

Summary

Schema transform templates now support nested dot notation on deserialized JSON columns:

template = {
    "score": "{{ result.quality.score }}",
    "label": "{{ result.quality.label }}",
}

Previously, {{ result.quality.score }} failed with 'str object' has no attribute 'quality'.

Why this needed a wrapper (TemplateValue) instead of a simpler fix

Schema transform is unlike other Jinja2 consumers in DataDesigner (prompts, expressions) because its output must be valid JSON. This creates a tension:

Nested access needs values to be dicts - Jinja2 resolves {{ x.y }} by looking up y on x
JSON output needs values to render as escaped strings - He said "hello" must become He said \"hello\" inside a JSON string

The old code (_json_escape_record) chose option 2: flatten all dicts to escaped strings before Jinja2 sees them. Safe JSON, but no nested access.

We can't just pass plain dicts and escape only the leaves either. When Jinja2 converts a whole dict to string (for {{ result }}), it uses Python repr: {'key': 'val'} with single quotes - invalid JSON. The escaping needs to happen dynamically at render time depending on whether Jinja2 is drilling into a value or interpolating it.

TemplateValue solves this by deferring the decision:

Dot access ({{ result.quality }}) triggers __getattr__, which returns a new TemplateValue wrapping the nested value
String interpolation ({{ result }}) triggers __str__, which applies a caller-provided escape function

The escape function (_escape_value_for_json) is specific to schema transform. Other Jinja2 consumers (prompt templates, expression columns) don't need any of this - Jinja2 natively handles dot access on plain dicts via getattr-to-getitem fallback, and plain str() is fine for text output.

Changes

ginja/record.py - new TemplateValue class + wrap_record helper
ginja/environment.py - prepare_jinja2_template_renderer accepts optional record_str_fn; safe_render accepts skip_record_sanitization (same pattern as existing skip_template_validation)
processors/schema_transform.py - replaced _json_escape_record with _escape_value_for_json passed as record_str_fn
test_schema_transform.py - regression tests for nested dot access, mixed nested+flat, and list indexing

Test plan

Existing schema transform tests pass (special chars, JSON serialized values, preview mode)
New parametrized tests cover nested dot access, mixed nested+flat, list indexing
All 150 processing tests + 191 column generator tests pass (backwards compatible)

Enable {{ result.quality.score }} style dot notation in schema transform Jinja2 templates, where result is a deserialized JSON column. Previously, _json_escape_record flattened all dict values to escaped JSON strings before Jinja2 saw them. This made the rendered output valid JSON but prevented nested access since Jinja2 only saw strings. The fix introduces TemplateValue, a wrapper that defers the choice between "drill into nested dict" and "render as escaped string" to template evaluation time. Jinja2 resolves dot notation via __getattr__ (returning a new TemplateValue for the nested value), and converts to string via __str__ (delegating to a caller-provided str_fn). This is necessary because plain dicts render as Python repr ({'key': 'val'}) which is invalid JSON - we need to control __str__ to produce properly escaped JSON, and that requires a wrapper object. Other Jinja2 consumers (prompt templates, expression columns) don't need this - Jinja2 natively supports dot access on plain dicts via getattr-to-getitem fallback, and plain str() is fine for text output. Schema transform is unique because its output must be valid JSON.

greptile-apps · 2026-03-18T13:03:34Z

Greptile Summary

This PR fixes a bug where schema transform templates could not perform nested dot-notation access on deserialized JSON columns (e.g. {{ result.quality.score }} would fail because the old _json_escape_record helper pre-flattened all dicts to escaped strings before Jinja2 saw them, leaving result as a plain str). The solution is clean and well-motivated: instead of pre-processing the record into strings, the raw deserialized dicts are now passed directly to Jinja2, and a finalize hook (_escape_value_for_json) handles JSON escaping at the point of interpolation. A companion prefer_dict_key_access override on getattr ensures that dict keys like "items" or "values" take priority over Python built-in methods (which ImmutableSandboxedEnvironment would otherwise return as unsafe_undefined, silently blocking the fallback to key lookup).

Key changes:

_json_escape_record (record-level pre-escaping) removed; replaced with _escape_value_for_json (value-level finalize function) in schema_transform.py
UserTemplateSandboxEnvironment gains an opt-in prefer_dict_key_access mode that prioritises dict key lookup in getattr, fixing the items/values/keys shadowing problem
prepare_jinja2_template_renderer accepts an optional record_str_fn; when set, both finalize and prefer_dict_key_access are enabled together
Note: the PR description's "Changes" section lists ginja/record.py as modified (to add a TemplateValue class), but the final implementation replaced that approach with the Jinja2 finalize hook (commit 5f90cbf); the description is slightly out-of-date
New parametrized regression tests cover nested dot access, mixed nested+flat rendering, and list indexing with a shadowed "items" key

Confidence Score: 4/5

Safe to merge; logic is correct and all previous review concerns have been addressed.
The core fix (deferring JSON escaping to the Jinja2 finalize hook instead of pre-flattening dicts) is sound and well-tested. The prefer_dict_key_access override is safe because sanitize_record guarantees all dict values are basic JSON types before rendering, so bypassing the sandbox's unsafe-attribute check for dict-key access carries no security risk. The two minor deductions are: (1) _escape_value_for_json has no explicit guard for non-finite floats (nan/inf), relying on sanitize_record to catch them upstream; and (2) prefer_dict_key_access and record_str_fn are implicitly coupled in the public API, which may limit future flexibility.
No files require special attention; the environment change is the most architecturally significant but is well-contained.

Important Files Changed

Filename	Overview
packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py	Adds `prefer_dict_key_access` flag to `UserTemplateSandboxEnvironment` that overrides `getattr` to prefer dict key lookup over Python attribute/method access; extends `prepare_jinja2_template_renderer` to accept an optional `record_str_fn` which is wired up as Jinja2's `finalize` hook and automatically enables `prefer_dict_key_access`. Logically correct and safe after JSON-sanitization guarantees basic types, but the two concerns are coupled in the public API.
packages/data-designer-engine/src/data_designer/engine/processing/processors/schema_transform.py	Replaces `_json_escape_record` (which pre-flattened all dicts to escaped strings, preventing nested access) with `_escape_value_for_json` (a single-value escape function passed as `record_str_fn`). The deserialized record is now passed directly to `render_template`, with escaping deferred to Jinja2's finalize hook. Clean and correct.
packages/data-designer-engine/tests/engine/processing/processors/test_schema_transform.py	Adds well-structured parametrized tests covering nested dot access, mixed nested+flat rendering, and list indexing. Uses clear `pytest.param` with `id=` labels. No fragile branching heuristics. The `items` key test case specifically validates the `prefer_dict_key_access` behaviour.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[SchemaTransformProcessor] --> B[prepare renderer\nwith escape finalize fn]
    B --> C[UserTemplateSandbox\nfinalize=escape fn\nprefer dict key access]
    A --> D[Loop over DataFrame rows]
    D --> E[deserialize JSON values\nstrings become Python dicts]
    E --> F[render template]
    F --> G[sanitize record\nJSON round-trip]
    G --> H{Expression type}
    H -->|dot access: result.quality.score| I[getattr override\ndict key takes priority\nover Python method]
    H -->|whole dict: result| J[return full dict]
    I --> K[finalize: escape for JSON\nstr for numbers\njson.dumps for strings\ndouble-encode for dicts]
    J --> K
    K --> L[Rendered JSON string]
    L --> M[json.loads to output DataFrame]

Prompt To Fix All With AI

This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py
Line: 449-452

Comment:
**`prefer_dict_key_access` and `finalize` are always coupled together**

`finalize` (JSON escaping) and `prefer_dict_key_access` (nested-dict attribute resolution) are logically independent concerns, but the current implementation always sets them together whenever `record_str_fn is not None`. A future caller who wants one without the other—e.g., nested dot access on a non-JSON output, or a custom finalizer without the dict-key priority—would have to bypass `prepare_jinja2_template_renderer` and construct `UserTemplateSandboxEnvironment` directly.

Both parameters exist independently on `UserTemplateSandboxEnvironment.__init__`, so consider exposing `prefer_dict_key_access` as its own optional argument here to make the two axes of configuration independently reachable:

```python
env_kwargs: dict[str, Any] = {}
if record_str_fn is not None:
    env_kwargs["finalize"] = record_str_fn
if record_str_fn is not None or prefer_dict_key_access:
    env_kwargs["prefer_dict_key_access"] = True
```

Or at minimum, a brief docstring note clarifying that the flag is currently an implicit side-effect of providing `record_str_fn` would help future readers understand the intentional coupling.

How can I resolve this? If you propose a fix, please make it concise.

---

This is a comment left during a code review.
Path: packages/data-designer-engine/src/data_designer/engine/processing/processors/schema_transform.py
Line: 33-44

Comment:
**No explicit guard for `float` special values (`nan` / `inf`)**

For numeric types that are not `bool`, `str`, `dict`, `list`, or `None`, the function falls through to `return str(value)`. This is correct for ordinary integers and floats, but `str(float('nan'))` → `"nan"` and `str(float('inf'))` → `"inf"`, which are not valid JSON tokens. If either value were to reach `_escape_value_for_json` they would produce a string that survives rendering and then fail silently (or raise) only at the `json.loads(rendered)` call in `_transform`, which makes the error harder to trace.

In practice `sanitize_record` (which runs a JSON round-trip via `serialize_data`) should catch or normalize NaN/Inf before rendering. But if `serialize_data` happens to use a lenient serializer (e.g. `allow_nan=True`), those values pass through. Adding an explicit guard makes the contract clear:

```python
if isinstance(value, float) and not (value == value):  # nan check
    return "null"
```

or simply document the assumption that callers guarantee `value` has already passed a strict JSON serialization step.

How can I resolve this? If you propose a fix, please make it concise.

_{Last reviewed commit: "docs: add missing pa..."}

packages/data-designer-engine/tests/engine/processing/processors/test_schema_transform.py

packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py

...ages/data-designer-engine/src/data_designer/engine/processing/processors/schema_transform.py

packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py

packages/data-designer-engine/src/data_designer/engine/processing/ginja/record.py

packages/data-designer-engine/tests/engine/processing/processors/test_schema_transform.py

andreatgretel · 2026-03-18T13:17:59Z

(AR) Suggestion: Multi-template path not extended with record_str_fn support

packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py:455

What: prepare_jinja2_multi_template_renderer and render_multi_template do not support record_str_fn, creating API asymmetry.

Why: The two rendering paths are now architecturally divergent. Future callers needing nested access in multi-template scenarios have no path.

Suggestion:

At minimum add a comment noting the gap. Ideally extract wrapping logic into a shared helper.

andreatgretel · 2026-03-18T13:18:02Z

(AR) Suggestion: No test coverage for error paths in nested access

packages/data-designer-engine/tests/engine/processing/processors/test_schema_transform.py:210

What: Tests cover happy paths but not missing keys, None intermediates, or type errors in nested access.

Why: TemplateValue introduces new error surfaces (AttributeError on missing keys) that are untested.

Suggestion:

Add parametrized error cases for missing nested keys and None intermediate values.

andreatgretel · 2026-03-18T13:18:15Z

(AR) This PR introduces TemplateValue, a wrapper enabling Jinja2 dot notation on deserialized JSON columns in schema transform templates. The design is well-motivated and the code is clean, with good separation between the generic wrapping mechanism in record.py and the JSON-specific escaping in schema_transform.py. The review covered all four changed files across processors, ginja environment, record handling, and tests.

The most important finding is a concrete bug: _escape_value_for_json produces Python-style "True"/"False" instead of JSON-conventional "true"/"false" for boolean values, because bool falls through all isinstance checks to the str(value) fallback. This was confirmed via smoke test. Additionally, the new _record_str_fn attribute lacks a class-level annotation (risking AttributeError if render_template is called before preparation), and the skip_record_sanitization parameter on safe_render widens the public security surface unnecessarily.

Verdict: needs-changes — 1 critical, 3 warnings, 5 suggestions.

- Fix boolean serialization: add bool check before str in _escape_value_for_json to produce JSON 'true'/'false' instead of Python 'True'/'False' - Add class-level _record_str_fn annotation to WithJinja2UserTemplateRendering - Rename skip_record_sanitization to _skip_record_sanitization (underscore prefix) to signal internal-only usage, and document it in safe_render docstring - Add defensive error handling in TemplateValue.__getitem__ and __iter__ - Promote test input data to parametrize column, removing brittle string scan

packages/data-designer-engine/src/data_designer/engine/processing/ginja/record.py

...ages/data-designer-engine/src/data_designer/engine/processing/processors/schema_transform.py

packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py

- Add __eq__ and __hash__ to TemplateValue so Jinja2 equality conditionals (e.g. {% if result.label == "excellent" %}) work - Add inline comment explaining deliberate double-encode in _escape_value_for_json for dict/list values - Default _record_str_fn to None at class level so accessing it before prepare_jinja2_template_renderer doesn't mask the real error

nabinchha · 2026-03-18T15:23:24Z

Small design thought, not a blocker: I wonder if there's a simpler path here using Jinja's finalize hook, so the template context can stay as plain sanitized dict / list data instead of introducing TemplateValue. Something along these lines:

env = UserTemplateSandboxEnvironment(...)
env.finalize = _escape_value_for_json
rendered = env.from_string(template).render(sanitize_record(record))

Since Jinja already supports nested access like {{ result.quality.score }} on normal dict-like data, this might preserve the nested lookup behavior while keeping the rendering model a bit simpler. I may be missing an edge case around the {{ result }} / JSON-string behavior, so totally fine if you already explored this and ruled it out. Just wanted to offer it as food for thought in case a quick spike shows it can cover the same cases with less custom wrapper logic.

packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py

Use Jinja2's built-in finalize hook for value-to-string conversion and a getattr override for dict-key-priority lookup, eliminating the custom TemplateValue wrapper class entirely.

andreatgretel · 2026-03-18T17:18:56Z

@nabinchha Great call on the finalize approach — implemented in 5f90cbf. Used env.finalize = record_str_fn combined with a getattr override on the sandbox environment (to prefer dict key lookup over method resolution for keys like items). This eliminated TemplateValue, wrap_record, _skip_record_sanitization, and _record_str_fn entirely (-58 lines net). All tests pass.

andreatgretel requested a review from a team as a code owner March 18, 2026 12:59

andreatgretel changed the title ~~feat: support nested field access in schema transform templates~~ fix: support nested field access in schema transform templates Mar 18, 2026

greptile-apps bot reviewed Mar 18, 2026

View reviewed changes

packages/data-designer-engine/tests/engine/processing/processors/test_schema_transform.py Outdated Show resolved Hide resolved

packages/data-designer-engine/src/data_designer/engine/processing/ginja/environment.py Show resolved Hide resolved