Skip to content

Commit cc91a7d

Browse files
igerberclaude
andcommitted
Address PR #366 CI review round 1: scope TreatmentDoseShape as descriptive-only; fix WooldridgeDiD method kwarg
P1 (TreatmentDoseShape vs ContinuousDiD contract): - Reviewer correctly flagged that the new `is_time_invariant` field (per-unit non-zero distinct-count) does NOT match the actual `ContinuousDiD.fit()` gate at `continuous_did.py:222-228`, which uses `df.groupby(unit)[dose].nunique() > 1` over the FULL dose column (including pre-treatment zeros). My nonzero-only check silently classified `0,0,d,d` paths as time-invariant while ContinuousDiD would reject them. - Removed `is_time_invariant` field from `TreatmentDoseShape` entirely. The pre-existing `PanelProfile.treatment_varies_within_unit` field already encodes the correct ContinuousDiD prerequisite (matches the estimator's nunique check at line 224) and is correctly documented in §2 of the autonomous guide. Adding a second, narrower, mismatched gate was confusing - the reviewer's "scope as descriptive-only" path is the cleaner fix. - Reframed `TreatmentDoseShape` docstring + autonomous guide §2 field reference: explicitly NOT a ContinuousDiD prerequisite. `n_distinct_doses`, `has_zero_dose`, `dose_min/max/mean` provide descriptive distributional context; `has_never_treated` (unit-level) + `treatment_varies_within_unit == False` (full-path constancy) + `is_balanced` are the authoritative gates. - Rewrote §5.2 worked example reasoning chain to use the existing correct gates and added a counter-example showing `has_zero_dose=True` does NOT imply `has_never_treated=True` (the row-level vs unit-level distinction). - Added `test_treatment_dose_does_not_gate_continuous_did` covering the two contradictory cases the reviewer named: (1) `0,0,d,d` within-unit dose path, asserting `treatment_varies_within_unit=True` (the actual ContinuousDiD gate fires correctly); (2) row-level zeros without never-treated units, asserting `has_zero_dose=True` BUT `has_never_treated=False` (the two facts are distinct). - Removed `test_treatment_dose_continuous_time_varying_within_unit` and `test_treatment_dose_distinguishes_doses_at_high_precision` - both tested the dropped `is_time_invariant` field. P2 (WooldridgeDiD constructor kwarg): - The autonomous guide §5.3 worked example used `WooldridgeDiD(family="poisson")` but the actual constructor at `wooldridge.py:264` takes `method=`. Following the example would raise `TypeError: __init__() got an unexpected keyword argument 'family'`. Fixed in two places (the prose and the code snippet) and added a negative assertion in `test_guides.py` to prevent regression: `assert 'WooldridgeDiD(family="poisson")' not in text`. CHANGELOG updated to reflect the revised TreatmentDoseShape scope. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent ce73058 commit cc91a7d

5 files changed

Lines changed: 146 additions & 117 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11-
- **`PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions + `llms-autonomous.txt` worked examples (Wave 2 of the AI-agent enablement track).** `profile_panel(...)` now populates two new optional sub-dataclasses on the returned `PanelProfile`: `outcome_shape: Optional[OutcomeShape]` (numeric outcomes only — exposes `n_distinct_values`, `pct_zeros`, `value_min` / `value_max`, `skewness` and `excess_kurtosis` (NaN-safe; `None` when `n_distinct_values < 3` or variance is zero), `is_integer_valued`, `is_count_like` (heuristic: integer-valued AND has zeros AND right-skewed AND > 2 distinct values; flags WooldridgeDiD QMLE consideration over linear OLS), `is_bounded_unit` ([0, 1] support)) and `treatment_dose: Optional[TreatmentDoseShape]` (continuous treatments only — exposes `n_distinct_doses`, `has_zero_dose`, `dose_min` / `dose_max` / `dose_mean` over non-zero doses, `is_time_invariant` (per-unit non-zero doses have at most one distinct value; gates the ContinuousDiD `fit()`-time prerequisites pre-fit)). Both fields are `None` when their classification gate is not met (e.g., `treatment_dose is None` for binary treatments). `to_dict()` serializes the nested dataclasses as JSON-compatible nested dicts. New exports: `OutcomeShape`, `TreatmentDoseShape` from top-level `diff_diff`. `llms-autonomous.txt` gains a new §5 "Worked examples" section with three end-to-end PanelProfile -> reasoning -> validation walkthroughs (binary staggered with never-treated controls, continuous dose with zero baseline, count-shaped outcome) plus §2 field-reference subsections for the new shape fields and §4.7 / §4.11 cross-references for outcome-shape considerations. Existing §5-§8 of the autonomous guide are renumbered to §6-§9. Descriptive only — no recommender language inside the worked examples.
11+
- **`PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions + `llms-autonomous.txt` worked examples (Wave 2 of the AI-agent enablement track).** `profile_panel(...)` now populates two new optional sub-dataclasses on the returned `PanelProfile`: `outcome_shape: Optional[OutcomeShape]` (numeric outcomes only — exposes `n_distinct_values`, `pct_zeros`, `value_min` / `value_max`, `skewness` and `excess_kurtosis` (NaN-safe; `None` when `n_distinct_values < 3` or variance is zero), `is_integer_valued`, `is_count_like` (heuristic: integer-valued AND has zeros AND right-skewed AND > 2 distinct values; flags WooldridgeDiD QMLE consideration over linear OLS), `is_bounded_unit` ([0, 1] support)) and `treatment_dose: Optional[TreatmentDoseShape]` (continuous treatments only — exposes `n_distinct_doses`, `has_zero_dose`, `dose_min` / `dose_max` / `dose_mean` over non-zero doses). Both `OutcomeShape` and `TreatmentDoseShape` are descriptive only; the authoritative pre-fit gates for `ContinuousDiD` remain the existing `PanelProfile.has_never_treated` (unit-level), `PanelProfile.treatment_varies_within_unit == False` (per-unit full-path dose constancy, matching `ContinuousDiD.fit()`'s `df.groupby(unit)[dose].nunique() > 1` rejection), and `PanelProfile.is_balanced`. The shape extensions provide distributional context (effect-size range, count-shape detection) that supplements but does not replace those gates. Both fields are `None` when their classification gate is not met (e.g., `treatment_dose is None` for binary treatments). `to_dict()` serializes the nested dataclasses as JSON-compatible nested dicts. New exports: `OutcomeShape`, `TreatmentDoseShape` from top-level `diff_diff`. `llms-autonomous.txt` gains a new §5 "Worked examples" section with three end-to-end PanelProfile -> reasoning -> validation walkthroughs (binary staggered with never-treated controls, continuous dose with zero baseline, count-shaped outcome) plus §2 field-reference subsections for the new shape fields and §4.7 / §4.11 cross-references for outcome-shape considerations. Existing §5-§8 of the autonomous guide are renumbered to §6-§9. Descriptive only — no recommender language inside the worked examples.
1212
- **`HeterogeneousAdoptionDiD.fit(survey=..., weights=...)` on continuous-dose paths (Phase 4.5 survey support).** The `continuous_at_zero` (paper Design 1') and `continuous_near_d_lower` (Design 1 continuous-near-d̲) designs accept survey weights through two interchangeable kwargs: `weights=<array>` (pweight shortcut, weighted-robust SE from the CCT-2014 lprobust port) and `survey=SurveyDesign(weights, strata, psu, fpc)` (design-based inference via Binder-TSL variance using the existing `compute_survey_if_variance` helper at `diff_diff/survey.py:1802`). Point estimates match across both entry paths; SE diverges by design (pweight-only vs PSU-aggregated). `HeterogeneousAdoptionDiDResults.survey_metadata` is a repo-standard `SurveyMetadata` dataclass (weight_type / effective_n / design_effect / sum_weights / weight_range / n_strata / n_psu / df_survey); HAD-specific extras (`variance_formula` label, `effective_dose_mean`) are separate top-level result fields. `to_dict()` surfaces the full `SurveyMetadata` object plus `variance_formula` + `effective_dose_mean`; `summary()` renders `variance_formula`, `effective_n`, `effective_dose_mean`, and (when the survey= path is used) `df_survey`; `__repr__` surfaces `variance_formula` + `effective_dose_mean` when present. The HAD `mass_point` design and `aggregate="event_study"` path raise `NotImplementedError` under survey/weights (deferred to Phase 4.5 B: weighted 2SLS + event-study survey composition); the HAD pretests stay unweighted in this release (Phase 4.5 C). Parity ceiling acknowledged — no public weighted-CCF bias-corrected local-linear reference exists in any language; methodology confidence comes from (1) uniform-weights bit-parity at `atol=1e-14` on the full lprobust output struct, (2) cross-language weighted-OLS parity (manual R reference) at `atol=1e-12`, and (3) Monte Carlo oracle consistency on known-τ DGPs. `_nprobust_port.lprobust` gains `weights=` and `return_influence=` (used internally by the Binder-TSL path); `bias_corrected_local_linear` removes the Phase 1c `NotImplementedError` on `weights=` and forwards. Auto-bandwidth selection remains unweighted in this release — pass `h`/`b` explicitly for weight-aware bandwidths. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted extension (Phase 4.5 survey support)".
1313
- **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication).
1414
- **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`.

diff_diff/guides/llms-autonomous.txt

Lines changed: 61 additions & 31 deletions
Original file line numberDiff line numberDiff line change
@@ -197,18 +197,27 @@ view. Every field below appears as a top-level key in that dict.
197197
model can predict outside `[0, 1]`).
198198
- **`treatment_dose: Optional[TreatmentDoseShape]`** - distributional
199199
facts for continuous-treatment dose columns; `None` unless
200-
`treatment_type == "continuous"`. Sub-fields:
200+
`treatment_type == "continuous"`. **Descriptive only — none of these
201+
sub-fields are `ContinuousDiD` prerequisites.** The authoritative
202+
pre-fit gates are `has_never_treated` (unit-level, above) for the
203+
zero-dose-control requirement and `treatment_varies_within_unit ==
204+
False` (above) for per-unit full-path dose constancy, matching
205+
`ContinuousDiD.fit()`'s `df.groupby(unit)[dose].nunique() > 1`
206+
rejection. Sub-fields:
201207
- `n_distinct_doses: int` - count of distinct non-NaN dose values
202-
(including zero if observed).
208+
(including zero if observed). Useful supplement to the gate
209+
checks for understanding the dose support.
203210
- `has_zero_dose: bool` - at least one unit-period has dose
204-
exactly zero. Required by `ContinuousDiD` (`P(D=0) > 0`); when
205-
`False` the estimator cannot identify the dose-response curve.
206-
- `dose_min: float`, `dose_max: float`, `dose_mean: float` - over
207-
the strictly non-zero doses (NaN if no non-zero values exist).
208-
- `is_time_invariant: bool` - within every unit, the set of
209-
distinct non-zero dose values has size at most one (always-zero
210-
units are skipped). Required by `ContinuousDiD`; when `False`,
211-
`fit()` will raise. See §5.2 for a worked example.
211+
exactly zero. **Row-level fact**: a panel can have
212+
`has_zero_dose == True` (some pre-treatment rows are zero) while
213+
`has_never_treated == False` (every unit eventually treated), in
214+
which case the panel still fails the ContinuousDiD never-treated
215+
gate. Consult `has_never_treated` for the unit-level gate.
216+
- `dose_min: float`, `dose_max: float`, `dose_mean: float` -
217+
computed over the strictly non-zero doses; useful for effect-size
218+
context and dose-response interpretation. See §5.2 for a worked
219+
example showing how these fields supplement (not replace) the
220+
authoritative gates.
212221

213222
### Alerts
214223

@@ -334,8 +343,10 @@ estimator from the remaining rows (CS/SA/dCDH/Imputation/TwoStage/
334343
Stacked/ETWFE all accept unbalanced input, with some caveats in
335344
their own docs).
336345

337-
For two common prerequisite-resolution patterns walked through end-to-end
338-
(continuous dose with `treatment_dose` introspection, and count-shaped
346+
For two common reasoning patterns walked through end-to-end (continuous
347+
dose checked against the existing `has_never_treated` /
348+
`treatment_varies_within_unit` / `is_balanced` gates with
349+
`treatment_dose` providing descriptive context, and count-shaped
339350
outcome with `outcome_shape` introspection), see §5.2 and §5.3.
340351

341352

@@ -485,10 +496,15 @@ When `treatment_type == "continuous"`:
485496
scalar first-stage adoption summary. Useful when adoption is
486497
graded rather than binary.
487498

488-
The new `PanelProfile.treatment_dose` sub-fields (`has_zero_dose`,
489-
`is_time_invariant`, `n_distinct_doses`) let you check the three
490-
ContinuousDiD prerequisites pre-fit; §5.2 walks through the full
491-
profile -> reasoning -> validation flow.
499+
The authoritative ContinuousDiD pre-fit gates are
500+
`has_never_treated == True` (unit-level never-treated control) and
501+
`treatment_varies_within_unit == False` (per-unit full-path dose
502+
constancy, including pre-treatment zeros) plus `is_balanced == True`.
503+
The `PanelProfile.treatment_dose` sub-fields (`n_distinct_doses`,
504+
`dose_min/max/mean`, `has_zero_dose`) provide descriptive context
505+
about the dose support, but are NOT themselves prerequisite gates;
506+
§5.2 walks through the full profile -> reasoning -> validation flow
507+
including the gate check.
492508

493509
### §4.8 Few treated units (one or a handful)
494510

@@ -700,7 +716,6 @@ PanelProfile(
700716
n_distinct_doses=4,
701717
has_zero_dose=True,
702718
dose_min=1.0, dose_max=4.0, dose_mean=2.4,
703-
is_time_invariant=True,
704719
),
705720
outcome_shape=OutcomeShape(
706721
is_count_like=False, is_bounded_unit=False, ...
@@ -714,18 +729,33 @@ Reasoning chain:
714729
1. `treatment_type == "continuous"` -> §3 row narrows to
715730
`ContinuousDiD` (`✓`) and `HeterogeneousAdoptionDiD` (`partial`,
716731
for graded adoption). All other estimators are `✗` on continuous.
717-
2. ContinuousDiD prerequisites map directly onto the new dose fields:
718-
`treatment_dose.has_zero_dose == True` (P(D=0) > 0 satisfied),
719-
`treatment_dose.is_time_invariant == True` (per-unit constant
720-
dose), `is_balanced == True`. All three pass, so `ContinuousDiD`
721-
is in scope. (If any failed, `fit()` would raise `ValueError`.)
722-
3. Counter-example: had `is_time_invariant == False` (a unit's
723-
nonzero dose changed across periods), `ContinuousDiD` would not
724-
apply. The two paths from there are (a)
725-
`HeterogeneousAdoptionDiD` if a scalar adoption summary fits, or
726-
(b) aggregate the dose to a binary indicator and fall back to a
727-
binary staggered estimator.
728-
4. Fit `ContinuousDiD`; the result object exposes the dose-response
732+
2. ContinuousDiD prerequisites are checked against the authoritative
733+
gates (the existing top-level `PanelProfile` fields, not
734+
`treatment_dose`): `has_never_treated == True` (unit-level
735+
never-treated control, mapping to `first_treat == 0` units in
736+
`ContinuousDiD.fit()`), `treatment_varies_within_unit == False`
737+
(per-unit full-path dose constancy, matching
738+
`ContinuousDiD.fit()`'s `df.groupby(unit)[dose].nunique() > 1`
739+
check), and `is_balanced == True`. All three pass, so
740+
`ContinuousDiD` is in scope. The `treatment_dose` sub-fields
741+
(`n_distinct_doses`, `has_zero_dose`, `dose_min/max/mean`)
742+
provide descriptive context — useful for reasoning about dose
743+
support and the eventual dose-response interpretation, but not
744+
themselves gates.
745+
3. Counter-example: had `treatment_varies_within_unit == True` (any
746+
unit's full dose path - including pre-treatment zeros - has more
747+
than one distinct value, e.g., a `0,0,d,d` adoption path with
748+
varying nonzero `d`), `ContinuousDiD` would not apply. The two
749+
paths from there are (a) `HeterogeneousAdoptionDiD` if a scalar
750+
adoption summary fits, or (b) aggregate the dose to a binary
751+
indicator and fall back to a binary staggered estimator.
752+
4. Counter-example: had `has_never_treated == False` (every unit
753+
eventually treated, even if pre-treatment rows have zero dose so
754+
`treatment_dose.has_zero_dose == True`), `ContinuousDiD` would
755+
reject the panel under default `control_group="never_treated"`.
756+
Row-level zeros are not a substitute for unit-level
757+
never-treated controls.
758+
5. Fit `ContinuousDiD`; the result object exposes the dose-response
729759
curve (`ATT(d)`) and average causal response (`ACRT(d)`); choose
730760
the headline estimand based on the business question (overall
731761
ATT under PT, or the dose-response curve under Strong PT).
@@ -765,10 +795,10 @@ Reasoning chain:
765795
raw count gives unbiased point estimates of the additive effect
766796
but its asymptotic SEs assume normal-shaped errors, which a
767797
right-skewed count distribution violates. `WooldridgeDiD`
768-
(`family="poisson"`) estimates the multiplicative
798+
(`method="poisson"`) estimates the multiplicative
769799
(log-link) effect under QMLE with correct asymptotic SEs, and
770800
maps onto the staggered design natively.
771-
3. Decision: fit `WooldridgeDiD(family="poisson")` for the
801+
3. Decision: fit `WooldridgeDiD(method="poisson")` for the
772802
primary estimate; report the multiplicative effect (proportional
773803
change) rather than the additive effect. Optionally fit a linear
774804
DiD as a robustness check and document which scale the headline

0 commit comments

Comments
 (0)