Add TripleDifferenceResults exemplar + lock alias asdict-exclusion (re-audit of #406)#428
Conversation
…ct-exclusion Restored CI reviewer's two P3s on PR #406: 1. The flat-schema exemplar lists in REGISTRY.md and CHANGELOG.md omit `TripleDifferenceResults`, even though `diff_diff/triple_diff.py:43-86` already uses the same native flat schema (`att` / `se` / `t_stat` / `p_value` / `conf_int`). Add it to both lists so the documented exemplar set is complete. 2. The new alias regression suite at `tests/test_result_aliases.py` locks read-through and read-only semantics, but does not assert that aliases stay out of `dataclasses.fields()` / `asdict()` output. The registry note explicitly documents that contract; if a future refactor converted an `@property` alias to a real field, serializers and field-walkers would silently start surfacing duplicate keys with no test catching it. Add a parametrized regression covering one Pattern B class (`CallawaySantAnnaResults`), the double-alias case (`ContinuousDiDResults`), and the `avg_*` mapping (`MultiPeriodDiDResults`) - asserts the five flat-alias names never appear in `fields(res)` or `asdict(res).keys()`. No runtime behavior change. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology No findings. Affected methods: none. The diff changes only documentation and tests, and the registry text at docs/methodology/REGISTRY.md:L5-L5 is consistent with the existing flat-schema Code Quality No findings. Performance No findings. Maintainability No findings beyond the test-gap note below. Tech Debt No findings. The PR does not introduce new deferred-work items, and none are needed for correctness. Security No findings. Documentation/Tests
Validation note: I could not run the targeted pytest in this sandbox because |
…aliases R0 review on the prior commit noted the new exclusion test covered the five flat aliases (att / se / conf_int / p_value / t_stat) but omitted the ContinuousDiDResults `overall_se` / `overall_conf_int` / `overall_p_value` / `overall_t_stat` aliases that the double-alias contract documents. Extend the ContinuousDiDResults branch to also lock those four out of `fields()` / `asdict()` output. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
🔁 AI review rerun (requested by @igerber) Head SHA: Overall Assessment ✅ Looks good — no unmitigated P0/P1 findings. Executive Summary
Methodology
Code Quality
Performance
Maintainability
Tech Debt
Security
Documentation/Tests
|
…st helper bugs Holistic re-audit of merged igerber#406 (inference-field aliases on staggered result classes) + igerber#428 (post-merge cleanup adding TripleDifferenceResults to alias mapping examples). Per-PR CI on igerber#428 couldn't see the combined post-PR holistic state. Local agentic codex review surfaced residuals across 4 rounds (R1-R4); a 5th round flagged Sphinx autosummary regen which is auto-handled on next docs build (not addressed here). **Test helper (R2)** — `_required_init_kwargs()` in `tests/test_result_aliases.py` had two bugs that are masked today but brittle as result dataclasses evolve: - Default-factory not honored: the `f.default is not f.default_factory and f.default is not MISSING` check returns False for factory-only fields (where default is MISSING and default_factory is a real callable), so the helper pre-filled those fields with a sentinel and the factory never ran. Replaced with explicit `_dc.MISSING` checks on both default and default_factory. - Dispatch order: `"float"` matched before `"Tuple"`, so `Tuple[float, float]` annotations were classified as scalar and got `0.0` instead of `(0.0, 0.0)`. Reordered the synthetic-value dispatch so container annotations (Tuple/List/Dict/DataFrame/ndarray) are checked before the scalar fallback. **Read-only assertion (R1)** — `test_aliases_are_read_only` checked `setattr`-fails on the flat aliases but not on the new `ContinuousDiDResults.overall_*` aliases. Extended the Continuous-specific branch. **Bundled guide documentation (R3)** — REGISTRY and CHANGELOG made the flat alias contract official, but bundled `practitioner.py`, `llms-practitioner.txt`, and `llms-full.txt` documented only the canonical `overall_*` / `overall_att_*` / `avg_*` names. Added a single "Flat-alias compatibility note" under the `## Results Objects` header in `llms-full.txt` (avoids per-class table bloat); extended the result-class snippet in `llms-practitioner.txt`; clarified the Step-4 covariate-comparison snippet in `practitioner.py`. **Registry scope correction (R4)** — the top-level REGISTRY note said "every scalar treatment-effect result class" exposes flat aliases, but flat-native classes (`DiDResults`, `SyntheticDiDResults`, `TROPResults`, `TripleDifferenceResults`, `HeterogeneousAdoptionDiDResults`) already carry these as native dataclass fields — they're unchanged by the alias contract. Narrowed the note to scope only the prefixed families. **Typo fix (R4)** — the R3 `llms-practitioner.txt` note named a nonexistent `overall_att_att` field. Replaced with the actual canonical field names (`overall_att` is the point estimate; `overall_att_se` etc. are the inference fields). 5 files, +51/-13. No behavior change; all edits are documentation alignment and test helper / test coverage hardening on the surface igerber#406 + igerber#428 already established.
Summary
Audit follow-up to PR #406. Restored CI reviewer's two actionable P3s:
REGISTRY + CHANGELOG: The flat-schema exemplar lists omit `TripleDifferenceResults`, even though `diff_diff/triple_diff.py:L43-L86` already uses the same native flat schema (`att` / `se` / `t_stat` / `p_value` / `conf_int`). Add it to both lists.
Alias regression suite: The existing `tests/test_result_aliases.py` locks read-through and read-only semantics but doesn't assert that aliases stay out of `dataclasses.fields()` / `asdict()` output. The registry explicitly documents that contract. If a future refactor converted an `@property` alias to a real field, serializers and field-walkers would silently start surfacing duplicate keys with no test catching it. Add a parametrized regression covering one Pattern B class (`CallawaySantAnnaResults`), the double-alias case (`ContinuousDiDResults`), and the `avg_*` mapping (`MultiPeriodDiDResults`) - asserts the five flat-alias names never appear in `fields(res)` or `asdict(res).keys()`.
No runtime behavior change.
Test plan
🤖 Generated with Claude Code