Skip to content

Commit ed57c7c

Browse files
igerberclaude
andcommitted
Address PR #366 CI review round 5 (1 P1 + 1 P2): add strictly-positive-dose gate to ContinuousDiD prerequisite summaries
P1 (newly identified — `dose_min > 0` ContinuousDiD gate omitted): Reviewer correctly noted that the round-3 prerequisite summary (`has_never_treated`, `treatment_varies_within_unit == False`, `is_balanced`, no `duplicate_unit_time_rows` alert) omits the estimator's strictly-positive-treated-dose restriction. Verified `continuous_did.py:287-294` raises `ValueError` ("Dose must be strictly positive for treated units (D > 0)") on negative treated dose support — a panel can satisfy every documented gate above and still hard-fail at fit time when `treatment_dose.dose_min < 0`. Updated wording in five surfaces to add `treatment_dose.dose_min > 0` as the fifth pre-fit gate: - `diff_diff/profile.py` `TreatmentDoseShape` docstring (now spells out all five gates with numbered list referencing the `continuous_did.py:287-294` line range). - `diff_diff/guides/llms-autonomous.txt` §2 field reference (notes `dose_min > 0` is itself a gate; the other treatment_dose sub-fields remain descriptive). - `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design feature paragraph). - `diff_diff/guides/llms-autonomous.txt` §5.2 worked example reasoning chain (now five gates checked; added a new counter-example covering the negative-dose path so an agent reading the example sees the contradictory case explicitly). - `CHANGELOG.md` Unreleased entry. - `ROADMAP.md` AI-Agent Track building-block. P2 (regression test for negative-dose path): Added `test_treatment_dose_min_flags_negative_dose_continuous_panels` in `tests/test_profile_panel.py` covering a balanced, never-treated, time-invariant continuous panel where every dose level is negative. The test asserts that all four other gates pass cleanly and that `dose.dose_min < 0` is correctly observed — the fixture an agent would see when reasoning about whether ContinuousDiD applies. Added `dose_min > 0` content-stability assertion in `tests/test_guides.py` so future wording changes can't silently drop the gate from the autonomous-guide summary. Rebased onto origin/main; resolved CHANGELOG conflicts (kept HAD Phase 4.5 B + dCDH by_path Wave 2 + my Wave 2 entries side by side). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent a8df32c commit ed57c7c

6 files changed

Lines changed: 135 additions & 47 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -9,7 +9,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
99

1010
### Added
1111
- **`HeterogeneousAdoptionDiD` mass-point `survey=` / `weights=` + event-study `aggregate="event_study"` survey composition + multiplier-bootstrap sup-t simultaneous confidence band (Phase 4.5 B).** Closes the two Phase 4.5 A `NotImplementedError` gates: `design="mass_point" + weights/survey` and `aggregate="event_study" + weights/survey`. Weighted 2SLS sandwich in `_fit_mass_point_2sls` follows the Wooldridge 2010 Ch. 12 pweight convention (`w²` in the HC1 meat, `w·u` in the CR1 cluster score, weighted bread `Z'WX`); HC1 and CR1 ("stata" `se_type`) bit-parity with `estimatr::iv_robust(..., weights=, clusters=)` at `atol=1e-10` (new cross-language golden at `benchmarks/data/estimatr_iv_robust_golden.json`, generated by `benchmarks/R/generate_estimatr_iv_robust_golden.R`; `estimatr` added to `benchmarks/R/requirements.R`). `_fit_mass_point_2sls` gains `weights=` + `return_influence=` kwargs and now always returns a 3-tuple `(beta, se, psi)` — `psi` is the per-unit IF on the β̂-scale scaled so `compute_survey_if_variance(psi, trivial_resolved) ≈ V_HC1[1,1]` at `atol=1e-10` (PR #359 IF scale convention applied uniformly; no `sum(psi²)` claims). Event-study per-horizon variance: `survey=` path composes Binder-TSL via `compute_survey_if_variance`; `weights=` shortcut uses the analytical weighted-robust SE (continuous: CCT-2014 `bc_fit.se_robust / |den|`; mass-point: weighted 2SLS pweight sandwich from `_fit_mass_point_2sls` — HC1 / classical / CR1). `survey_metadata` / `variance_formula` / `effective_dose_mean` populated in both regimes (previously hardcoded `None` at `had.py:3366`). New multiplier-bootstrap sup-t: `_sup_t_multiplier_bootstrap` reuses `diff_diff.bootstrap_utils.generate_survey_multiplier_weights_batch` for PSU-level draws with stratum centering + sqrt(n_h/(n_h-1)) small-sample correction + FPC scaling + lonely-PSU handling. On the `weights=` shortcut, sup-t calibration is routed through a synthetic trivial `ResolvedSurveyDesign` so the centered + small-sample-corrected branch fires uniformly — targets the analytical HC1 variance family (`compute_survey_if_variance(IF, trivial) ≈ V_HC1` per the PR #359 IF scale invariant) rather than the raw `sum(ψ²) = ((n-1)/n) · V_HC1` that unit-level Rademacher multipliers would produce on the HC1-scaled IF. Perturbations: `delta = weights @ IF` with NO `(1/n)` prefactor (matching `staggered_bootstrap.py:373` idiom), normalized by per-horizon analytical SE, `(1-alpha)`-quantile of the sup-t distribution. At H=1 the quantile reduces to `Φ⁻¹(1 − alpha/2) ≈ 1.96` up to MC noise (regression-locked by `TestSupTReducesToNormalAtH1`). `HeterogeneousAdoptionDiD.__init__` gains `n_bootstrap: int = 999` and `seed: Optional[int] = None` (CS-parity singular seed); `fit()` gains `cband: bool = True` (only consulted on weighted event-study). `HeterogeneousAdoptionDiDEventStudyResults` extended with `variance_formula`, `effective_dose_mean`, `cband_low`, `cband_high`, `cband_crit_value`, `cband_method`, `cband_n_bootstrap` (all `None` on unweighted fits); surfaced in `to_dict`, `to_dataframe`, `summary`, `__repr__`. Unweighted event-study with `cband=False` preserves pre-Phase 4.5 B numerical output bit-exactly (stability invariant, locked by regression tests). Zero-weight subpopulation convention carries over from PR #359 (filter for design decisions; preserve full ResolvedSurveyDesign for variance). Non-pweight SurveyDesigns (`aweight`, `fweight`, replicate designs) raise `NotImplementedError` on both new paths (reciprocal-guard discipline). Pretest surfaces (`qug_test`, `stute_test`, `yatchew_hr_test`, joint variants, `did_had_pretest_workflow`) remain unweighted in this release — Phase 4.5 C / C0. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted 2SLS (Phase 4.5 B)", "Event-study survey composition", and "Sup-t multiplier bootstrap" for derivations and invariants.
12-
- **`PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions + `llms-autonomous.txt` worked examples (Wave 2 of the AI-agent enablement track).** `profile_panel(...)` now populates two new optional sub-dataclasses on the returned `PanelProfile`: `outcome_shape: Optional[OutcomeShape]` (numeric outcomes only — exposes `n_distinct_values`, `pct_zeros`, `value_min` / `value_max`, `skewness` and `excess_kurtosis` (NaN-safe; `None` when `n_distinct_values < 3` or variance is zero), `is_integer_valued`, `is_count_like` (heuristic: integer-valued AND has zeros AND right-skewed AND > 2 distinct values; flags WooldridgeDiD QMLE consideration over linear OLS), `is_bounded_unit` ([0, 1] support)) and `treatment_dose: Optional[TreatmentDoseShape]` (continuous treatments only — exposes `n_distinct_doses`, `has_zero_dose`, `dose_min` / `dose_max` / `dose_mean` over non-zero doses). Both `OutcomeShape` and `TreatmentDoseShape` are descriptive only; the core field-based pre-fit gates for `ContinuousDiD` remain the existing `PanelProfile.has_never_treated` (unit-level), `PanelProfile.treatment_varies_within_unit == False` (per-unit full-path dose constancy, matching `ContinuousDiD.fit()`'s `df.groupby(unit)[dose].nunique() > 1` rejection), and `PanelProfile.is_balanced`, plus the absence of the `duplicate_unit_time_rows` alert (the precompute path silently resolves duplicate `(unit, time)` cells via last-row-wins). The shape extensions provide distributional context (effect-size range, count-shape detection) that supplements but does not replace those gates. Both fields are `None` when their classification gate is not met (e.g., `treatment_dose is None` for binary treatments). `to_dict()` serializes the nested dataclasses as JSON-compatible nested dicts. New exports: `OutcomeShape`, `TreatmentDoseShape` from top-level `diff_diff`. `llms-autonomous.txt` gains a new §5 "Worked examples" section with three end-to-end PanelProfile -> reasoning -> validation walkthroughs (binary staggered with never-treated controls, continuous dose with zero baseline, count-shaped outcome) plus §2 field-reference subsections for the new shape fields and §4.7 / §4.11 cross-references for outcome-shape considerations. Existing §5-§8 of the autonomous guide are renumbered to §6-§9. Descriptive only — no recommender language inside the worked examples.
12+
- **`PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions + `llms-autonomous.txt` worked examples (Wave 2 of the AI-agent enablement track).** `profile_panel(...)` now populates two new optional sub-dataclasses on the returned `PanelProfile`: `outcome_shape: Optional[OutcomeShape]` (numeric outcomes only — exposes `n_distinct_values`, `pct_zeros`, `value_min` / `value_max`, `skewness` and `excess_kurtosis` (NaN-safe; `None` when `n_distinct_values < 3` or variance is zero), `is_integer_valued`, `is_count_like` (heuristic: integer-valued AND has zeros AND right-skewed AND > 2 distinct values; flags WooldridgeDiD QMLE consideration over linear OLS), `is_bounded_unit` ([0, 1] support)) and `treatment_dose: Optional[TreatmentDoseShape]` (continuous treatments only — exposes `n_distinct_doses`, `has_zero_dose`, `dose_min` / `dose_max` / `dose_mean` over non-zero doses). Both `OutcomeShape` and `TreatmentDoseShape` are descriptive context except `treatment_dose.dose_min > 0`, which IS a `ContinuousDiD` pre-fit gate (the estimator rejects negative treated doses at line 287-294 of `continuous_did.py`). The full `ContinuousDiD` pre-fit gate set is `PanelProfile.has_never_treated` (unit-level), `PanelProfile.treatment_varies_within_unit == False` (per-unit full-path dose constancy, matching `ContinuousDiD.fit()`'s `df.groupby(unit)[dose].nunique() > 1` rejection), `PanelProfile.is_balanced`, absence of the `duplicate_unit_time_rows` alert (the precompute path silently resolves duplicate `(unit, time)` cells via last-row-wins), and `treatment_dose.dose_min > 0`. The shape extensions provide distributional context (effect-size range, count-shape detection) that supplements but does not replace those gates. Both fields are `None` when their classification gate is not met (e.g., `treatment_dose is None` for binary treatments). `to_dict()` serializes the nested dataclasses as JSON-compatible nested dicts. New exports: `OutcomeShape`, `TreatmentDoseShape` from top-level `diff_diff`. `llms-autonomous.txt` gains a new §5 "Worked examples" section with three end-to-end PanelProfile -> reasoning -> validation walkthroughs (binary staggered with never-treated controls, continuous dose with zero baseline, count-shaped outcome) plus §2 field-reference subsections for the new shape fields and §4.7 / §4.11 cross-references for outcome-shape considerations. Existing §5-§8 of the autonomous guide are renumbered to §6-§9. Descriptive only — no recommender language inside the worked examples.
1313
- **`HeterogeneousAdoptionDiD.fit(survey=..., weights=...)` on continuous-dose paths (Phase 4.5 survey support).** The `continuous_at_zero` (paper Design 1') and `continuous_near_d_lower` (Design 1 continuous-near-d̲) designs accept survey weights through two interchangeable kwargs: `weights=<array>` (pweight shortcut, weighted-robust SE from the CCT-2014 lprobust port) and `survey=SurveyDesign(weights, strata, psu, fpc)` (design-based inference via Binder-TSL variance using the existing `compute_survey_if_variance` helper at `diff_diff/survey.py:1802`). Point estimates match across both entry paths; SE diverges by design (pweight-only vs PSU-aggregated). `HeterogeneousAdoptionDiDResults.survey_metadata` is a repo-standard `SurveyMetadata` dataclass (weight_type / effective_n / design_effect / sum_weights / weight_range / n_strata / n_psu / df_survey); HAD-specific extras (`variance_formula` label, `effective_dose_mean`) are separate top-level result fields. `to_dict()` surfaces the full `SurveyMetadata` object plus `variance_formula` + `effective_dose_mean`; `summary()` renders `variance_formula`, `effective_n`, `effective_dose_mean`, and (when the survey= path is used) `df_survey`; `__repr__` surfaces `variance_formula` + `effective_dose_mean` when present. The HAD `mass_point` design and `aggregate="event_study"` path raise `NotImplementedError` under survey/weights (deferred to Phase 4.5 B: weighted 2SLS + event-study survey composition); the HAD pretests stay unweighted in this release (Phase 4.5 C). Parity ceiling acknowledged — no public weighted-CCF bias-corrected local-linear reference exists in any language; methodology confidence comes from (1) uniform-weights bit-parity at `atol=1e-14` on the full lprobust output struct, (2) cross-language weighted-OLS parity (manual R reference) at `atol=1e-12`, and (3) Monte Carlo oracle consistency on known-τ DGPs. `_nprobust_port.lprobust` gains `weights=` and `return_influence=` (used internally by the Binder-TSL path); `bias_corrected_local_linear` removes the Phase 1c `NotImplementedError` on `weights=` and forwards. Auto-bandwidth selection remains unweighted in this release — pass `h`/`b` explicitly for weight-aware bandwidths. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted extension (Phase 4.5 survey support)".
1414
- **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication).
1515
- **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`.

ROADMAP.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ Long-running program, framed as "building toward" rather than with discrete ship
138138
- Baker et al. (2025) 8-step workflow enforcement in `diff_diff/practitioner.py`.
139139
- `practitioner_next_steps()` context-aware guidance.
140140
- Runtime LLM guides via `get_llm_guide(...)` (`llms.txt`, `llms-full.txt`, `llms-practitioner.txt`, `llms-autonomous.txt`), bundled in the wheel.
141-
- `profile_panel(df, ...)` returns a `PanelProfile` dataclass of structural facts about the panel - factual, not opinionated. Pairs with the `"autonomous"` guide variant (reference-shaped: estimator-support matrix + per-design-feature reasoning) so agents describe the data then consult a bundled reference rather than calling a deterministic recommender. `PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions add descriptive distributional context (count-likeness / bounded-support hints on numeric outcomes; dose support and zero-dose presence on continuous treatments). They are descriptive only — `outcome_shape.is_count_like` informs the WooldridgeDiD-QMLE-vs-linear-OLS judgment but does not gate it, and the core field-based ContinuousDiD pre-fit gates remain the existing `has_never_treated`, `treatment_varies_within_unit`, `is_balanced`, and the absence of the `duplicate_unit_time_rows` alert. The autonomous guide §5 walks through three end-to-end PanelProfile -> reasoning -> validation worked examples.
141+
- `profile_panel(df, ...)` returns a `PanelProfile` dataclass of structural facts about the panel - factual, not opinionated. Pairs with the `"autonomous"` guide variant (reference-shaped: estimator-support matrix + per-design-feature reasoning) so agents describe the data then consult a bundled reference rather than calling a deterministic recommender. `PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions add descriptive distributional context (count-likeness / bounded-support hints on numeric outcomes; dose support and zero-dose presence on continuous treatments). They are descriptive context except `treatment_dose.dose_min > 0`, which IS a ContinuousDiD pre-fit gate (the estimator rejects negative treated doses). `outcome_shape.is_count_like` informs the WooldridgeDiD-QMLE-vs-linear-OLS judgment but does not gate it. The full ContinuousDiD pre-fit gate set is `has_never_treated`, `treatment_varies_within_unit == False`, `is_balanced`, absence of the `duplicate_unit_time_rows` alert, and `treatment_dose.dose_min > 0`. The autonomous guide §5 walks through three end-to-end PanelProfile -> reasoning -> validation worked examples.
142142
- Package docstring leads with an "For AI agents" entry block so `help(diff_diff)` surfaces the agent entry points automatically.
143143
- Silent-operation warnings so agents and humans see the same signals at the same time.
144144

0 commit comments

Comments
 (0)