Add inference-field aliases on staggered result classes

igerber · claude · igerber · commit 5fbcb02c0102 · 2026-05-09T16:49:38.000-04:00
Adds read-only @Property aliases (att / se / conf_int / p_value / t_stat) on every result class that previously only carried prefixed canonical fields, so external adapters that read getattr(res, "se") get populated values transparently. Coverage: - Pattern B (overall_*): CallawaySantAnna, Stacked, EfficientDiD, dCDH, StaggeredTripleDiff, Wooldridge, SunAbraham, ImputationDiD, TwoStage - Pattern C (overall_att_*, ATT-side headline): ContinuousDiD — adds both flat aliases and overall_* aliases for naming consistency - Pattern D (avg_*): MultiPeriodDiD Aliases are pure read-throughs over canonical fields — no recomputation, no behavior change. NaN-canonical → NaN-alias inheritance is regression- locked at tests/test_result_aliases.py::test_pattern_b_aliases_propagate_nan. The native overall_* / overall_att_* / avg_* names remain canonical for documentation and computation. Motivated by an external adapter that reads getattr(res, "se", None) without a fallback to overall_se / overall_att_se. Pre-alias every staggered result class returned None on those keys; aliases fix the adapter's diagnostic surface transparently with no consumer-side change. 23 alias-mechanic + adapter-pattern regression tests at tests/test_result_aliases.py. Documented in CHANGELOG (Unreleased) and REGISTRY.md preamble. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 ## [Unreleased]
 
 ### Added
+- **Inference-field aliases on staggered result classes** for adapter / external-consumer compatibility. Read-only `@property` aliases expose the flat `att` / `se` / `conf_int` / `p_value` / `t_stat` names (matching `DiDResults` / `TROPResults` / `SyntheticDiDResults` / `HeterogeneousAdoptionDiDResults`) on every result class that previously only carried prefixed canonical fields: `CallawaySantAnnaResults`, `StackedDiDResults`, `EfficientDiDResults`, `ChaisemartinDHaultfoeuilleResults`, `StaggeredTripleDiffResults`, `WooldridgeDiDResults`, `SunAbrahamResults`, `ImputationDiDResults`, `TwoStageDiDResults` (mapping to `overall_*`); `ContinuousDiDResults` (mapping to `overall_att_*`, ATT-side as the headline, ACRT-side accessible unchanged via `overall_acrt_*`); `MultiPeriodDiDResults` (mapping to `avg_*`). `ContinuousDiDResults` additionally exposes `overall_se` / `overall_conf_int` / `overall_p_value` / `overall_t_stat` aliases for naming consistency with the rest of the staggered family. Aliases are pure read-throughs over the canonical fields — no recomputation, no behavior change — so the `safe_inference()` joint-NaN contract (per CLAUDE.md "Inference computation") is inherited automatically (NaN canonical → NaN alias, locked at `tests/test_result_aliases.py::test_pattern_b_aliases_propagate_nan`). The native `overall_*` / `overall_att_*` / `avg_*` fields remain canonical for documentation and computation. Motivated by the `balance.interop.diff_diff.as_balance_diagnostic()` adapter (`facebookresearch/balance` PR #465) which calls `getattr(res, "se", None)` / `getattr(res, "conf_int", None)` without a fallback chain — pre-alias, every staggered result class returned `None` on those keys, silently dropping `se` and `conf_int` from the adapter's diagnostic dict. 23 alias-mechanic + balance-adapter regression tests at `tests/test_result_aliases.py`. Patch-level (additive on stable surfaces).
 - **`ChaisemartinDHaultfoeuille.by_path` + non-binary integer treatment** — `by_path=k` now accepts integer-coded discrete treatment (D in Z, e.g. ordinal `{0, 1, 2}`); path tuples become integer-state tuples like `(0, 2, 2, 2)`. The previous `NotImplementedError` gate at `chaisemartin_dhaultfoeuille.py:1870` is replaced by a `ValueError` for continuous D (e.g. `D=1.5`) at fit-time per the no-silent-failures contract — the existing `int(round(float(v)))` cast in `_enumerate_treatment_paths` is now defensive (no-op for integer-coded D). Validated against R `did_multiplegt_dyn(..., by_path)` for D in `{0, 1, 2}` via the new `multi_path_reversible_by_path_non_binary` golden-value scenario (78 switchers, 3 paths, single-baseline custom DGP, F_g >= 4): per-path point estimates match R bit-exactly (rtol ~1e-9 on event horizons; rtol+atol envelope for placebo near-zero values), per-path SE inherits the documented cross-path cohort-sharing deviation (~5% rtol observed; SE_RTOL=0.15 envelope). **Deviation from R for D >= 10:** R's `did_multiplegt_by_path` derives the per-path baseline via `path_index$baseline_XX <- substr(path_index$path, 1, 1)`, which captures only the first character of the comma-separated path string (e.g. for `path = "12,12,..."` it captures `"1"` instead of `"12"`); this mis-allocates R's per-path control-pool subset for D >= 10. Python's tuple-key matching is correct in this regime — the per-path point estimates we compute are correct; R's per-path subset for the same path is buggy. The shipped parity scenario stays in `D in {0, 1, 2}` to avoid the R bug. R-parity test at `tests/test_chaisemartin_dhaultfoeuille_parity.py::TestDCDHDynRParityByPathNonBinary`; cross-surface invariants regression-tested at `tests/test_chaisemartin_dhaultfoeuille.py::TestByPathNonBinary`.
 - **New `paths_of_interest` kwarg on `ChaisemartinDHaultfoeuille`** for user-specified treatment-path subsets, alternative to `by_path=k`'s top-k automatic ranking. Mutually exclusive with `by_path`; setting both raises `ValueError` at `__init__` and `set_params` time. Each path tuple must be a list/tuple of `int` of length `L_max + 1` (uniformity validated at `__init__`; length match against `L_max + 1` validated at fit-time); `bool` and `np.bool_` are explicitly rejected, `np.integer` accepted and canonicalized to Python `int` for tuple-key consistency. Duplicates emit a `UserWarning` and are deduplicated; paths not observed in the panel emit a `UserWarning` and are omitted from `path_effects`. Paths appear in `results.path_effects` in the user-specified order, modulo deduplication and unobserved-path filtering. Composes with non-binary D and all downstream `by_path` surfaces (bootstrap, per-path placebos, per-path joint sup-t bands, `controls`, `trends_linear`, `trends_nonparam`) — mechanical filter on observed paths via the same `_enumerate_treatment_paths` call site, no methodology change. **Python-only API extension; no R equivalent** — R's `did_multiplegt_dyn(..., by_path=k)` only accepts a positive int (top-k) or `-1` (all paths). The `by_path` precondition gate at `chaisemartin_dhaultfoeuille.py:1118` (drop_larger_lower / L_max / `heterogeneity` / `design2` / `honest_did` / `survey_design` mutex) and the 11 `self.by_path is not None` activation branches in `fit()` were rerouted to fire under either selector. Validation + behavior + cross-feature regressions at `tests/test_chaisemartin_dhaultfoeuille.py::TestPathsOfInterest`.
 - **HAD `practitioner_next_steps()` handler + `llms-full.txt` reference section** (Phase 5). Adds `_handle_had` and `_handle_had_event_study` to `diff_diff/practitioner.py::_HANDLERS`, routing both `HeterogeneousAdoptionDiDResults` (single-period) and `HeterogeneousAdoptionDiDEventStudyResults` (event-study) through HAD-specific Baker et al. (2025) step guidance: `did_had_pretest_workflow` (step 3 — paper Section 4.2 step-2 closure on the event-study path), an estimand-difference routing nudge to `ContinuousDiD` (step 4 — fires when the user wants per-dose ATT(d) / ACRT(d) curves rather than HAD's WAS estimand and has never-treated controls; framed around estimand difference, NOT around the existence of untreated units, since HAD remains valid with a small never-treated share per REGISTRY § HeterogeneousAdoptionDiD edge cases and explicitly retains never-treated units on the staggered event-study path per paper Appendix B.2 / `had.py:1325`), `results.bandwidth_diagnostics` inspection on continuous designs and simultaneous (sup-t) `cband_*` reading on weighted event-study fits (step 6), per-horizon WAS event-study disaggregation (step 7), and the explicit design-auto-detection / last-cohort-only-WAS framing (step 8). Symmetric pair: `_handle_continuous` gains a Step-4 nudge to `HeterogeneousAdoptionDiD` for ContinuousDiD users on no-untreated panels (this direction is correct because ContinuousDiD's identification requires never-treated controls). Extends `_check_nan_att` with an ndarray branch via lazy `numpy` import for HAD's per-horizon `att` array; uses `np.all(np.isnan(arr))` semantics so partial-NaN arrays (legitimate event-study output under degenerate horizon-specific designs) do not over-fire the warning. Scalar path is bit-exact preserved across all 12 untouched handlers. Adds full HAD section + `HeterogeneousAdoptionDiDResults` / `HeterogeneousAdoptionDiDEventStudyResults` blocks + `## HAD Pretests` index covering all 7 pretest entry points + Choosing-an-Estimator row to `diff_diff/guides/llms-full.txt` (the bundled-in-wheel agent reference); the documented constructor + `fit()` signatures match the real `HeterogeneousAdoptionDiD.__init__` / `.fit` API exactly (verified by `inspect.signature`-based regression tests). Tightens the existing `Continuous treatment intensity` Choosing row to surface ATT(d) vs WAS as the estimand differentiator. `docs/doc-deps.yaml` updated to remove the `llms-full.txt` deferral note on `had.py` and add `llms-full.txt` entries to `had.py`, `had_pretests.py`, and `practitioner.py` blocks. Patch-level (additive on stable surfaces). 26 new tests (16 in `tests/test_practitioner.py::TestHADDispatch` + 9 in `tests/test_guides.py::TestLLMsFullHADCoverage` + 1 fixture-minimality regression locking the "handlers are STRING-ONLY at runtime" stability invariant). Closes the Phase 5 "agent surfaces" gap; T21 pretest tutorial and T22 weighted/survey tutorial remain queued as separate notebook PRs.
diff --git a/diff_diff/chaisemartin_dhaultfoeuille_results.py b/diff_diff/chaisemartin_dhaultfoeuille_results.py
@@ -695,6 +695,27 @@ def _estimand_label(self) -> str:
             return f"{did_part}{suffix}_{sub_part}" if sub_part else f"{did_part}{suffix}"
         return base
 
+    # --- Inference-field aliases (balance/external-adapter compatibility) ---
+    @property
+    def att(self) -> float:
+        return self.overall_att
+
+    @property
+    def se(self) -> float:
+        return self.overall_se
+
+    @property
+    def conf_int(self) -> Tuple[float, float]:
+        return self.overall_conf_int
+
+    @property
+    def p_value(self) -> float:
+        return self.overall_p_value
+
+    @property
+    def t_stat(self) -> float:
+        return self.overall_t_stat
+
     def __repr__(self) -> str:
         """Concise string representation."""
         sig = _get_significance_stars(self.overall_p_value)
diff --git a/diff_diff/continuous_did_results.py b/diff_diff/continuous_did_results.py
@@ -143,6 +143,45 @@ class ContinuousDiDResults:
     # Survey design metadata (SurveyMetadata instance from diff_diff.survey)
     survey_metadata: Optional[Any] = field(default=None)
 
+    # --- Inference-field aliases (balance/external-adapter compatibility) ---
+    # ATT-side is the headline contract; ACRT remains accessible via overall_acrt_*.
+    @property
+    def att(self) -> float:
+        return self.overall_att
+
+    @property
+    def se(self) -> float:
+        return self.overall_att_se
+
+    @property
+    def conf_int(self) -> Tuple[float, float]:
+        return self.overall_att_conf_int
+
+    @property
+    def p_value(self) -> float:
+        return self.overall_att_p_value
+
+    @property
+    def t_stat(self) -> float:
+        return self.overall_att_t_stat
+
+    # `overall_*` aliases for naming consistency with the rest of the staggered family.
+    @property
+    def overall_se(self) -> float:
+        return self.overall_att_se
+
+    @property
+    def overall_conf_int(self) -> Tuple[float, float]:
+        return self.overall_att_conf_int
+
+    @property
+    def overall_p_value(self) -> float:
+        return self.overall_att_p_value
+
+    @property
+    def overall_t_stat(self) -> float:
+        return self.overall_att_t_stat
+
     def __repr__(self) -> str:
         sig_att = _get_significance_stars(self.overall_att_p_value)
         sig_acrt = _get_significance_stars(self.overall_acrt_p_value)
diff --git a/diff_diff/efficient_did_results.py b/diff_diff/efficient_did_results.py
@@ -167,6 +167,27 @@ class EfficientDiDResults:
     # Survey design metadata (SurveyMetadata instance from diff_diff.survey)
     survey_metadata: Optional[Any] = field(default=None)
 
+    # --- Inference-field aliases (balance/external-adapter compatibility) ---
+    @property
+    def att(self) -> float:
+        return self.overall_att
+
+    @property
+    def se(self) -> float:
+        return self.overall_se
+
+    @property
+    def conf_int(self) -> Tuple[float, float]:
+        return self.overall_conf_int
+
+    @property
+    def p_value(self) -> float:
+        return self.overall_p_value
+
+    @property
+    def t_stat(self) -> float:
+        return self.overall_t_stat
+
     def __repr__(self) -> str:
         sig = _get_significance_stars(self.overall_p_value)
         path = "DR" if self.estimation_path == "dr" else "nocov"
diff --git a/diff_diff/imputation_results.py b/diff_diff/imputation_results.py
@@ -143,6 +143,27 @@ class ImputationDiDResults:
     # Survey design metadata (SurveyMetadata instance from diff_diff.survey)
     survey_metadata: Optional[Any] = field(default=None, repr=False)
 
+    # --- Inference-field aliases (balance/external-adapter compatibility) ---
+    @property
+    def att(self) -> float:
+        return self.overall_att
+
+    @property
+    def se(self) -> float:
+        return self.overall_se
+
+    @property
+    def conf_int(self) -> Tuple[float, float]:
+        return self.overall_conf_int
+
+    @property
+    def p_value(self) -> float:
+        return self.overall_p_value
+
+    @property
+    def t_stat(self) -> float:
+        return self.overall_t_stat
+
     def __repr__(self) -> str:
         """Concise string representation."""
         sig = _get_significance_stars(self.overall_p_value)
diff --git a/diff_diff/results.py b/diff_diff/results.py
@@ -447,6 +447,27 @@ class MultiPeriodDiDResults:
     vcov_type: Optional[str] = field(default=None)
     cluster_name: Optional[str] = field(default=None)
 
+    # --- Inference-field aliases (balance/external-adapter compatibility) ---
+    @property
+    def att(self) -> float:
+        return self.avg_att
+
+    @property
+    def se(self) -> float:
+        return self.avg_se
+
+    @property
+    def conf_int(self) -> Tuple[float, float]:
+        return self.avg_conf_int
+
+    @property
+    def p_value(self) -> float:
+        return self.avg_p_value
+
+    @property
+    def t_stat(self) -> float:
+        return self.avg_t_stat
+
     def __repr__(self) -> str:
         """Concise string representation."""
         sig = _get_significance_stars(self.avg_p_value)
@@ -1180,7 +1201,7 @@ def get_loo_effects_df(self) -> pd.DataFrame:
                 "back to fit-time unit IDs is not well-defined. See "
                 "``result.placebo_effects`` for the raw PSU-level replicate "
                 "array and ``docs/methodology/REGISTRY.md`` §SyntheticDiD "
-                "\"Note (survey + jackknife composition)\" for the "
+                '"Note (survey + jackknife composition)" for the '
                 "aggregation formula."
             )
         if self._loo_unit_ids is None or self._loo_roles is None or self.placebo_effects is None:
@@ -1386,9 +1407,7 @@ def in_time_placebo(
                 lambda_fake,
             )
             synthetic_pre_fake_n = Y_pre_c_n @ omega_eff_fake
-            pre_fit_n = float(
-                np.sqrt(np.mean((y_pre_t_mean_n - synthetic_pre_fake_n) ** 2))
-            )
+            pre_fit_n = float(np.sqrt(np.mean((y_pre_t_mean_n - synthetic_pre_fake_n) ** 2)))
             # ATT is scale-equivariant and shift-invariant in Y; RMSE is
             # scale-equivariant. Rescale back to original-Y units.
             row["att"] = float(att_fake_n * Y_scale)
@@ -1482,12 +1501,8 @@ def sensitivity_to_zeta_omega(
         Y_post_treated_n = (snap.Y_post_treated - Y_shift) / Y_scale
 
         if snap.w_treated is not None:
-            y_pre_t_mean_n = np.average(
-                Y_pre_treated_n, axis=1, weights=snap.w_treated
-            )
-            y_post_t_mean_n = np.average(
-                Y_post_treated_n, axis=1, weights=snap.w_treated
-            )
+            y_pre_t_mean_n = np.average(Y_pre_treated_n, axis=1, weights=snap.w_treated)
+            y_post_t_mean_n = np.average(Y_post_treated_n, axis=1, weights=snap.w_treated)
         else:
             y_pre_t_mean_n = np.mean(Y_pre_treated_n, axis=1)
             y_post_t_mean_n = np.mean(Y_post_treated_n, axis=1)
diff --git a/diff_diff/stacked_did_results.py b/diff_diff/stacked_did_results.py
@@ -97,6 +97,27 @@ class StackedDiDResults:
     # Survey design metadata (SurveyMetadata instance from diff_diff.survey)
     survey_metadata: Optional[Any] = field(default=None)
 
+    # --- Inference-field aliases (balance/external-adapter compatibility) ---
+    @property
+    def att(self) -> float:
+        return self.overall_att
+
+    @property
+    def se(self) -> float:
+        return self.overall_se
+
+    @property
+    def conf_int(self) -> Tuple[float, float]:
+        return self.overall_conf_int
+
+    @property
+    def p_value(self) -> float:
+        return self.overall_p_value
+
+    @property
+    def t_stat(self) -> float:
+        return self.overall_t_stat
+
     def __repr__(self) -> str:
         """Concise string representation."""
         sig = _get_significance_stars(self.overall_p_value)
diff --git a/diff_diff/staggered_results.py b/diff_diff/staggered_results.py
@@ -138,6 +138,27 @@ class CallawaySantAnnaResults:
     epv_threshold: float = 10
     pscore_fallback: str = "error"
 
+    # --- Inference-field aliases (balance/external-adapter compatibility) ---
+    @property
+    def att(self) -> float:
+        return self.overall_att
+
+    @property
+    def se(self) -> float:
+        return self.overall_se
+
+    @property
+    def conf_int(self) -> Tuple[float, float]:
+        return self.overall_conf_int
+
+    @property
+    def p_value(self) -> float:
+        return self.overall_p_value
+
+    @property
+    def t_stat(self) -> float:
+        return self.overall_t_stat
+
     def __repr__(self) -> str:
         """Concise string representation."""
         sig = _get_significance_stars(self.overall_p_value)
diff --git a/diff_diff/staggered_triple_diff_results.py b/diff_diff/staggered_triple_diff_results.py
@@ -95,6 +95,27 @@ class StaggeredTripleDiffResults:
     epv_threshold: float = 10
     pscore_fallback: str = "error"
 
+    # --- Inference-field aliases (balance/external-adapter compatibility) ---
+    @property
+    def att(self) -> float:
+        return self.overall_att
+
+    @property
+    def se(self) -> float:
+        return self.overall_se
+
+    @property
+    def conf_int(self) -> Tuple[float, float]:
+        return self.overall_conf_int
+
+    @property
+    def p_value(self) -> float:
+        return self.overall_p_value
+
+    @property
+    def t_stat(self) -> float:
+        return self.overall_t_stat
+
     def __repr__(self) -> str:
         """Concise string representation."""
         sig = _get_significance_stars(self.overall_p_value)
diff --git a/diff_diff/sun_abraham.py b/diff_diff/sun_abraham.py
@@ -92,6 +92,27 @@ class SunAbrahamResults:
     # Survey design metadata (SurveyMetadata instance from diff_diff.survey)
     survey_metadata: Optional[Any] = field(default=None)
 
+    # --- Inference-field aliases (balance/external-adapter compatibility) ---
+    @property
+    def att(self) -> float:
+        return self.overall_att
+
+    @property
+    def se(self) -> float:
+        return self.overall_se
+
+    @property
+    def conf_int(self) -> Tuple[float, float]:
+        return self.overall_conf_int
+
+    @property
+    def p_value(self) -> float:
+        return self.overall_p_value
+
+    @property
+    def t_stat(self) -> float:
+        return self.overall_t_stat
+
     def __repr__(self) -> str:
         """Concise string representation."""
         sig = _get_significance_stars(self.overall_p_value)
diff --git a/diff_diff/two_stage_results.py b/diff_diff/two_stage_results.py
@@ -141,6 +141,27 @@ class TwoStageDiDResults:
     # Survey design metadata (SurveyMetadata instance from diff_diff.survey)
     survey_metadata: Optional[Any] = field(default=None, repr=False)
 
+    # --- Inference-field aliases (balance/external-adapter compatibility) ---
+    @property
+    def att(self) -> float:
+        return self.overall_att
+
+    @property
+    def se(self) -> float:
+        return self.overall_se
+
+    @property
+    def conf_int(self) -> Tuple[float, float]:
+        return self.overall_conf_int
+
+    @property
+    def p_value(self) -> float:
+        return self.overall_p_value
+
+    @property
+    def t_stat(self) -> float:
+        return self.overall_t_stat
+
     def __repr__(self) -> str:
         """Concise string representation."""
         sig = _get_significance_stars(self.overall_p_value)
diff --git a/diff_diff/wooldridge_results.py b/diff_diff/wooldridge_results.py
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
diff --git a/tests/test_result_aliases.py b/tests/test_result_aliases.py