Address CI R9 codex review (1 P3) on PreTrendsPower PR-B

igerber · claude · igerber · commit 02b74a8ba4cd · 2026-05-19T06:19:10.000-04:00
R9 verdict ✅ Looks good with one actionable P3: CHANGELOG.md L39 and
REPORTING.md L324-342 overstated the legacy covariance-source path.
Both said the `"diag_fallback_available_full_vcov_unused"` sentinel is
gone / that legacy inference labels CS / SA + `event_study_vcov` as
`"full_pre_period_vcov"`, but the R4 fix RESTORED the conservative
sentinel for legacy precomputed results that lack the persisted
`covariance_source` field — because without it we cannot distinguish
a pre-PR-B fit (used diag) from a post-PR-B fit (used full Σ_22).

Code and tests are correct; docs were the inconsistent piece.

Fix: reword both surfaces to distinguish two paths explicitly:

- **New fits** (post-PR-B): persisted `covariance_source` is read
  directly, non-bootstrap CS / SA report `"full_pre_period_vcov"` and
  are NOT downgraded.
- **Legacy serialized results** (pre-PR-B, no field): legacy
  type-based inference still emits the conservative sentinel for
  CS / SA + populated `event_study_vcov`, and the
  `well_powered → moderately_powered` downgrade still applies. For
  legacy `MultiPeriodDiDResults` without `interaction_indices`, the
  fallback reports `"diag_fallback"` (genuine fallback, no
  downgrade).

CHANGELOG entry expanded to list all four covariance-source DR
regression tests by name; REPORTING.md "Pre-period covariance
routing for staggered-estimator power" Note rewritten with the
two-path structure.

R-parity P3 deferred to PR-C per the existing TODO row (codex
labeled it informational / non-blocker).

No source changes; 309 tests across pretrends + DR + BR continue to
pass.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -36,7 +36,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
 
 ### Fixed
 - **PreTrendsPower: `PreTrendsPowerResults.power_at(M)` for `violation_type='custom'` (PR-B Step 5).** PR-A R18 added a `NotImplementedError` guard to prevent silent equal-weights output when `power_at()` couldn't reconstruct the fitted custom weights. PR-B Step 5 persists the normalized `violation_weights` on `PreTrendsPowerResults` at fit time, so `power_at(M)` now works correctly for all four violation types (linear / constant / last_period / custom) on fresh fits. The PR-A guard is retained only for legacy serialized results lacking the new `violation_weights` field (refit with current library version to lift). Verified by the new `test_power_at_works_for_custom_violation_type` regression test and the companion `test_power_at_raises_on_legacy_custom_result_without_weights` (simulates a legacy serialized result by clearing `violation_weights` to None).
-- **`DiagnosticReport` / `BusinessReport` covariance-source provenance propagation (PR-B Step 3, R3 follow-up).** Before PR-B, `DiagnosticReport._infer_cov_source` flagged CS / SA fits with populated `event_study_vcov` as `"diag_fallback_available_full_vcov_unused"`, and `_apply_diag_fallback_downgrade` then conservatively downgraded the `well_powered` tier to `moderately_powered`. PR-B Step 3 routes those fits through the full `Σ_22` sub-block at the estimator layer — but the report layer kept the old type-based inference, so correctly-computed full-VCV power results were silently being downgraded. Fix: `PreTrendsPowerResults` gains a new `covariance_source` field that `pretrends.py:_extract_pre_period_params` populates with `"full_pre_period_vcov"` or `"diag_fallback"` based on the actual extraction path taken; `DiagnosticReport._check_pretrends_power` and `_format_precomputed_pretrends_power` prefer that persisted label and fall back to type-based inference only for legacy serialized results. The legacy inference at `_infer_cov_source` is also updated to correctly label CS / SA + `event_study_vcov` as `"full_pre_period_vcov"`. Effect: non-bootstrap CS / SA pre-trends power blocks now keep their well_powered tier through the report layer (instead of being downgraded by the dead-code-since-PR-B sentinel `"diag_fallback_available_full_vcov_unused"`). Verified by the rewritten `test_precomputed_pretrends_power_full_vcov_yields_no_downgrade` and the new `test_precomputed_pretrends_power_consumes_persisted_cov_source` that explicitly exercises the persisted-field path.
+- **`DiagnosticReport` / `BusinessReport` covariance-source provenance propagation (PR-B Step 3, R3 follow-up).** Before PR-B, `DiagnosticReport._infer_cov_source` flagged CS / SA fits with populated `event_study_vcov` as `"diag_fallback_available_full_vcov_unused"`, and `_apply_diag_fallback_downgrade` then conservatively downgraded the `well_powered` tier to `moderately_powered`. PR-B Step 3 routes those fits through the full `Σ_22` sub-block at the estimator layer — but the report layer kept the old type-based inference, so correctly-computed full-VCV power results were silently being downgraded. Fix: `PreTrendsPowerResults` gains a new `covariance_source` field that `pretrends.py:_extract_pre_period_params` populates with `"full_pre_period_vcov"` or `"diag_fallback"` based on the actual extraction path taken; `DiagnosticReport._check_pretrends_power` and `_format_precomputed_pretrends_power` prefer that persisted label and fall back to type-based inference only for legacy serialized results that lack the field. Two paths now coexist through the report layer: **new fits** (post-PR-B, `covariance_source` is persisted) consume the persisted label directly — non-bootstrap CS / SA report `"full_pre_period_vcov"` and are NOT downgraded; **legacy serialized results** (pre-PR-B, no `covariance_source` field on the object) fall through to `_infer_cov_source`, which STILL emits the conservative `"diag_fallback_available_full_vcov_unused"` sentinel for CS / SA + populated `event_study_vcov` because without the persisted label we cannot distinguish a pre-PR-B fit (which used `diag(ses^2)`) from a post-PR-B fit, and the PR-A conservative downgrade still applies to preserve backwards-compat. For `MultiPeriodDiDResults` without `interaction_indices`, the legacy fallback reports `"diag_fallback"` (a genuine fallback, no downgrade applies). Effect: non-bootstrap CS / SA pre-trends power blocks on fresh fits now keep their well_powered tier through the report layer (instead of being downgraded by the conservative sentinel); legacy serialized results are unchanged. Verified by `test_precomputed_pretrends_power_persisted_full_vcov_no_downgrade` (new fits), `test_precomputed_pretrends_power_legacy_missing_field_still_downgraded` (legacy fallback contract), `test_precomputed_pretrends_power_consumes_persisted_cov_source` (persisted label takes precedence over legacy inference), and `test_precomputed_pretrends_power_legacy_mpd_without_interaction_indices_reports_diag`.
 
 ## [3.3.3] - 2026-05-15
 
diff --git a/docs/methodology/REPORTING.md b/docs/methodology/REPORTING.md
@@ -330,16 +330,35 @@ a library setting.
   `PreTrendsPowerResults.covariance_source` field records the actual
   extraction path (`"full_pre_period_vcov"` vs `"diag_fallback"`), and
   the `DiagnosticReport.pretrends_power` block surfaces that label
-  unchanged. The PR-A-era `well_powered → moderately_powered`
-  conservative downgrade was a workaround for the implementation gap
-  PR-B closed; it now fires only for the dead-code legacy sentinel
-  label `"diag_fallback_available_full_vcov_unused"` (no in-tree path
-  produces this anymore — see
-  `_apply_diag_fallback_downgrade` in `diagnostic_report.py`).
-  Remaining `"diag_fallback"` cases — bootstrap / replicate-weight CS
-  and SA, plus ImputationDiD / Stacked / EfficientDiD / TwoStageDiD —
-  pass through unchanged because nothing better is available on those
-  result types yet.
+  unchanged. There are two paths through the report layer with
+  different downgrade semantics:
+
+  - **New fits** (post-PR-B, `PreTrendsPowerResults.covariance_source`
+    is populated): `DiagnosticReport` reads the persisted label
+    directly. Non-bootstrap CS / SA fits report
+    `"full_pre_period_vcov"` and are NOT downgraded; bootstrap /
+    replicate-weight paths report `"diag_fallback"` and also pass
+    through unchanged (no "available but unused" concern — the
+    estimator did its best with what was available).
+  - **Legacy serialized results** (pre-PR-B, no
+    `covariance_source` field on the object): the report layer falls
+    back to type-based inference in
+    `_infer_cov_source(source_fit)`. For event-study result types
+    (CS / SA / etc.) with populated `event_study_vcov`, the legacy-
+    ambiguous case still emits the conservative
+    `"diag_fallback_available_full_vcov_unused"` sentinel and the
+    `well_powered → moderately_powered` downgrade still applies —
+    because without the persisted provenance we cannot rule out that
+    the stored power was computed from `diag(ses^2)` under PR-A
+    semantics. For `MultiPeriodDiDResults` without
+    `interaction_indices`, the legacy fallback reports
+    `"diag_fallback"` (a genuine fallback, not the "available but
+    unused" case, so no downgrade applies).
+
+  Remaining `"diag_fallback"` cases on new fits — bootstrap /
+  replicate-weight CS and SA, plus ImputationDiD / Stacked /
+  EfficientDiD / TwoStageDiD — pass through unchanged because
+  nothing better is available on those result types yet.
 
 - **Note:** Unit-translation policy. BusinessReport does not
   arithmetically translate log-points to percents or level effects to