Skip to content

Commit 22ff5dc

Browse files
igerberclaude
andcommitted
Address PR #366 CI review round 3 (1 P3): add duplicate-row gate to ContinuousDiD prerequisite summaries
Reviewer correctly noted that the round-2 wording lists `has_never_treated` + `treatment_varies_within_unit == False` + `is_balanced` as the "authoritative" ContinuousDiD pre-fit gates but omits the duplicate-cell hard stop. Verified `continuous_did.py:_precompute_structures` (line 818-823) builds `outcome_matrix` cell-by-cell with last-row-wins on duplicate `(unit, time)` keys - so absence of the `duplicate_unit_time_rows` alert is also a real prerequisite, not just a style preference. Updated wording in five places to add "+ absence of the `duplicate_unit_time_rows` alert" alongside the other gates and explain the silent-overwrite behavior: - `diff_diff/profile.py` `TreatmentDoseShape` docstring - `diff_diff/guides/llms-autonomous.txt` §2 field reference - `diff_diff/guides/llms-autonomous.txt` §4.7 (continuous design feature) - `diff_diff/guides/llms-autonomous.txt` §5.2 worked example reasoning chain (now lists four gates instead of three) - `CHANGELOG.md` Unreleased entry - `ROADMAP.md` AI-Agent Track building-block Also softened "authoritative" -> "core field-based" since the non-field-based duplicate-row gate makes the original phrasing slightly misleading. Added a test_guides.py regression asserting the autonomous guide mentions `duplicate_unit_time_rows` so future wording changes can't silently drop the gate from the summary. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 370915e commit 22ff5dc

5 files changed

Lines changed: 42 additions & 20 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -8,7 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11-
- **`PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions + `llms-autonomous.txt` worked examples (Wave 2 of the AI-agent enablement track).** `profile_panel(...)` now populates two new optional sub-dataclasses on the returned `PanelProfile`: `outcome_shape: Optional[OutcomeShape]` (numeric outcomes only — exposes `n_distinct_values`, `pct_zeros`, `value_min` / `value_max`, `skewness` and `excess_kurtosis` (NaN-safe; `None` when `n_distinct_values < 3` or variance is zero), `is_integer_valued`, `is_count_like` (heuristic: integer-valued AND has zeros AND right-skewed AND > 2 distinct values; flags WooldridgeDiD QMLE consideration over linear OLS), `is_bounded_unit` ([0, 1] support)) and `treatment_dose: Optional[TreatmentDoseShape]` (continuous treatments only — exposes `n_distinct_doses`, `has_zero_dose`, `dose_min` / `dose_max` / `dose_mean` over non-zero doses). Both `OutcomeShape` and `TreatmentDoseShape` are descriptive only; the authoritative pre-fit gates for `ContinuousDiD` remain the existing `PanelProfile.has_never_treated` (unit-level), `PanelProfile.treatment_varies_within_unit == False` (per-unit full-path dose constancy, matching `ContinuousDiD.fit()`'s `df.groupby(unit)[dose].nunique() > 1` rejection), and `PanelProfile.is_balanced`. The shape extensions provide distributional context (effect-size range, count-shape detection) that supplements but does not replace those gates. Both fields are `None` when their classification gate is not met (e.g., `treatment_dose is None` for binary treatments). `to_dict()` serializes the nested dataclasses as JSON-compatible nested dicts. New exports: `OutcomeShape`, `TreatmentDoseShape` from top-level `diff_diff`. `llms-autonomous.txt` gains a new §5 "Worked examples" section with three end-to-end PanelProfile -> reasoning -> validation walkthroughs (binary staggered with never-treated controls, continuous dose with zero baseline, count-shaped outcome) plus §2 field-reference subsections for the new shape fields and §4.7 / §4.11 cross-references for outcome-shape considerations. Existing §5-§8 of the autonomous guide are renumbered to §6-§9. Descriptive only — no recommender language inside the worked examples.
11+
- **`PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions + `llms-autonomous.txt` worked examples (Wave 2 of the AI-agent enablement track).** `profile_panel(...)` now populates two new optional sub-dataclasses on the returned `PanelProfile`: `outcome_shape: Optional[OutcomeShape]` (numeric outcomes only — exposes `n_distinct_values`, `pct_zeros`, `value_min` / `value_max`, `skewness` and `excess_kurtosis` (NaN-safe; `None` when `n_distinct_values < 3` or variance is zero), `is_integer_valued`, `is_count_like` (heuristic: integer-valued AND has zeros AND right-skewed AND > 2 distinct values; flags WooldridgeDiD QMLE consideration over linear OLS), `is_bounded_unit` ([0, 1] support)) and `treatment_dose: Optional[TreatmentDoseShape]` (continuous treatments only — exposes `n_distinct_doses`, `has_zero_dose`, `dose_min` / `dose_max` / `dose_mean` over non-zero doses). Both `OutcomeShape` and `TreatmentDoseShape` are descriptive only; the core field-based pre-fit gates for `ContinuousDiD` remain the existing `PanelProfile.has_never_treated` (unit-level), `PanelProfile.treatment_varies_within_unit == False` (per-unit full-path dose constancy, matching `ContinuousDiD.fit()`'s `df.groupby(unit)[dose].nunique() > 1` rejection), and `PanelProfile.is_balanced`, plus the absence of the `duplicate_unit_time_rows` alert (the precompute path silently resolves duplicate `(unit, time)` cells via last-row-wins). The shape extensions provide distributional context (effect-size range, count-shape detection) that supplements but does not replace those gates. Both fields are `None` when their classification gate is not met (e.g., `treatment_dose is None` for binary treatments). `to_dict()` serializes the nested dataclasses as JSON-compatible nested dicts. New exports: `OutcomeShape`, `TreatmentDoseShape` from top-level `diff_diff`. `llms-autonomous.txt` gains a new §5 "Worked examples" section with three end-to-end PanelProfile -> reasoning -> validation walkthroughs (binary staggered with never-treated controls, continuous dose with zero baseline, count-shaped outcome) plus §2 field-reference subsections for the new shape fields and §4.7 / §4.11 cross-references for outcome-shape considerations. Existing §5-§8 of the autonomous guide are renumbered to §6-§9. Descriptive only — no recommender language inside the worked examples.
1212
- **`HeterogeneousAdoptionDiD.fit(survey=..., weights=...)` on continuous-dose paths (Phase 4.5 survey support).** The `continuous_at_zero` (paper Design 1') and `continuous_near_d_lower` (Design 1 continuous-near-d̲) designs accept survey weights through two interchangeable kwargs: `weights=<array>` (pweight shortcut, weighted-robust SE from the CCT-2014 lprobust port) and `survey=SurveyDesign(weights, strata, psu, fpc)` (design-based inference via Binder-TSL variance using the existing `compute_survey_if_variance` helper at `diff_diff/survey.py:1802`). Point estimates match across both entry paths; SE diverges by design (pweight-only vs PSU-aggregated). `HeterogeneousAdoptionDiDResults.survey_metadata` is a repo-standard `SurveyMetadata` dataclass (weight_type / effective_n / design_effect / sum_weights / weight_range / n_strata / n_psu / df_survey); HAD-specific extras (`variance_formula` label, `effective_dose_mean`) are separate top-level result fields. `to_dict()` surfaces the full `SurveyMetadata` object plus `variance_formula` + `effective_dose_mean`; `summary()` renders `variance_formula`, `effective_n`, `effective_dose_mean`, and (when the survey= path is used) `df_survey`; `__repr__` surfaces `variance_formula` + `effective_dose_mean` when present. The HAD `mass_point` design and `aggregate="event_study"` path raise `NotImplementedError` under survey/weights (deferred to Phase 4.5 B: weighted 2SLS + event-study survey composition); the HAD pretests stay unweighted in this release (Phase 4.5 C). Parity ceiling acknowledged — no public weighted-CCF bias-corrected local-linear reference exists in any language; methodology confidence comes from (1) uniform-weights bit-parity at `atol=1e-14` on the full lprobust output struct, (2) cross-language weighted-OLS parity (manual R reference) at `atol=1e-12`, and (3) Monte Carlo oracle consistency on known-τ DGPs. `_nprobust_port.lprobust` gains `weights=` and `return_influence=` (used internally by the Binder-TSL path); `bias_corrected_local_linear` removes the Phase 1c `NotImplementedError` on `weights=` and forwards. Auto-bandwidth selection remains unweighted in this release — pass `h`/`b` explicitly for weight-aware bandwidths. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted extension (Phase 4.5 survey support)".
1313
- **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication).
1414
- **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`.

ROADMAP.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -138,7 +138,7 @@ Long-running program, framed as "building toward" rather than with discrete ship
138138
- Baker et al. (2025) 8-step workflow enforcement in `diff_diff/practitioner.py`.
139139
- `practitioner_next_steps()` context-aware guidance.
140140
- Runtime LLM guides via `get_llm_guide(...)` (`llms.txt`, `llms-full.txt`, `llms-practitioner.txt`, `llms-autonomous.txt`), bundled in the wheel.
141-
- `profile_panel(df, ...)` returns a `PanelProfile` dataclass of structural facts about the panel - factual, not opinionated. Pairs with the `"autonomous"` guide variant (reference-shaped: estimator-support matrix + per-design-feature reasoning) so agents describe the data then consult a bundled reference rather than calling a deterministic recommender. `PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions add descriptive distributional context (count-likeness / bounded-support hints on numeric outcomes; dose support and zero-dose presence on continuous treatments). They are descriptive only — `outcome_shape.is_count_like` informs the WooldridgeDiD-QMLE-vs-linear-OLS judgment but does not gate it, and the authoritative ContinuousDiD pre-fit gates remain the existing `has_never_treated`, `treatment_varies_within_unit`, and `is_balanced` fields. The autonomous guide §5 walks through three end-to-end PanelProfile -> reasoning -> validation worked examples.
141+
- `profile_panel(df, ...)` returns a `PanelProfile` dataclass of structural facts about the panel - factual, not opinionated. Pairs with the `"autonomous"` guide variant (reference-shaped: estimator-support matrix + per-design-feature reasoning) so agents describe the data then consult a bundled reference rather than calling a deterministic recommender. `PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions add descriptive distributional context (count-likeness / bounded-support hints on numeric outcomes; dose support and zero-dose presence on continuous treatments). They are descriptive only — `outcome_shape.is_count_like` informs the WooldridgeDiD-QMLE-vs-linear-OLS judgment but does not gate it, and the core field-based ContinuousDiD pre-fit gates remain the existing `has_never_treated`, `treatment_varies_within_unit`, `is_balanced`, and the absence of the `duplicate_unit_time_rows` alert. The autonomous guide §5 walks through three end-to-end PanelProfile -> reasoning -> validation worked examples.
142142
- Package docstring leads with an "For AI agents" entry block so `help(diff_diff)` surfaces the agent entry points automatically.
143143
- Silent-operation warnings so agents and humans see the same signals at the same time.
144144

diff_diff/guides/llms-autonomous.txt

Lines changed: 22 additions & 15 deletions
Original file line numberDiff line numberDiff line change
@@ -198,12 +198,15 @@ view. Every field below appears as a top-level key in that dict.
198198
- **`treatment_dose: Optional[TreatmentDoseShape]`** - distributional
199199
facts for continuous-treatment dose columns; `None` unless
200200
`treatment_type == "continuous"`. **Descriptive only — none of these
201-
sub-fields are `ContinuousDiD` prerequisites.** The authoritative
201+
sub-fields are `ContinuousDiD` prerequisites.** The core field-based
202202
pre-fit gates are `has_never_treated` (unit-level, above) for the
203203
zero-dose-control requirement and `treatment_varies_within_unit ==
204204
False` (above) for per-unit full-path dose constancy, matching
205205
`ContinuousDiD.fit()`'s `df.groupby(unit)[dose].nunique() > 1`
206-
rejection. Sub-fields:
206+
rejection, plus the absence of the `duplicate_unit_time_rows` alert
207+
(the precompute path silently resolves duplicate `(unit, time)`
208+
cells via last-row-wins, so duplicates must be removed before
209+
fitting). Sub-fields:
207210
- `n_distinct_doses: int` - count of distinct non-NaN dose values
208211
(including zero if observed). Useful supplement to the gate
209212
checks for understanding the dose support.
@@ -496,11 +499,13 @@ When `treatment_type == "continuous"`:
496499
scalar first-stage adoption summary. Useful when adoption is
497500
graded rather than binary.
498501

499-
The authoritative ContinuousDiD pre-fit gates are
500-
`has_never_treated == True` (unit-level never-treated control) and
502+
The core field-based ContinuousDiD pre-fit gates are
503+
`has_never_treated == True` (unit-level never-treated control),
501504
`treatment_varies_within_unit == False` (per-unit full-path dose
502-
constancy, including pre-treatment zeros) plus `is_balanced == True`.
503-
The `PanelProfile.treatment_dose` sub-fields (`n_distinct_doses`,
505+
constancy, including pre-treatment zeros), `is_balanced == True`, and
506+
the absence of the `duplicate_unit_time_rows` alert (the precompute
507+
path silently resolves duplicate cells via last-row-wins). The
508+
`PanelProfile.treatment_dose` sub-fields (`n_distinct_doses`,
504509
`dose_min/max/mean`, `has_zero_dose`) provide descriptive context
505510
about the dose support, but are NOT themselves prerequisite gates;
506511
§5.2 walks through the full profile -> reasoning -> validation flow
@@ -729,19 +734,21 @@ Reasoning chain:
729734
1. `treatment_type == "continuous"` -> §3 row narrows to
730735
`ContinuousDiD` (`✓`) and `HeterogeneousAdoptionDiD` (`partial`,
731736
for graded adoption). All other estimators are `✗` on continuous.
732-
2. ContinuousDiD prerequisites are checked against the authoritative
733-
gates (the existing top-level `PanelProfile` fields, not
734-
`treatment_dose`): `has_never_treated == True` (unit-level
737+
2. ContinuousDiD prerequisites are checked against the core
738+
field-based gates (the existing top-level `PanelProfile` fields,
739+
not `treatment_dose`): `has_never_treated == True` (unit-level
735740
never-treated control, mapping to `first_treat == 0` units in
736741
`ContinuousDiD.fit()`), `treatment_varies_within_unit == False`
737742
(per-unit full-path dose constancy, matching
738743
`ContinuousDiD.fit()`'s `df.groupby(unit)[dose].nunique() > 1`
739-
check), and `is_balanced == True`. All three pass, so
740-
`ContinuousDiD` is in scope. The `treatment_dose` sub-fields
741-
(`n_distinct_doses`, `has_zero_dose`, `dose_min/max/mean`)
742-
provide descriptive context — useful for reasoning about dose
743-
support and the eventual dose-response interpretation, but not
744-
themselves gates.
744+
check), `is_balanced == True`, and the absence of a
745+
`duplicate_unit_time_rows` alert (the precompute path silently
746+
resolves duplicate cells via last-row-wins, so duplicates must
747+
be removed before fitting). All four pass, so `ContinuousDiD` is
748+
in scope. The `treatment_dose` sub-fields (`n_distinct_doses`,
749+
`has_zero_dose`, `dose_min/max/mean`) provide descriptive context
750+
— useful for reasoning about dose support and the eventual
751+
dose-response interpretation, but not themselves gates.
745752
3. Counter-example: had `treatment_varies_within_unit == True` (any
746753
unit's full dose path - including pre-treatment zeros - has more
747754
than one distinct value, e.g., a `0,0,d,d` adoption path with

diff_diff/profile.py

Lines changed: 6 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -67,12 +67,15 @@ class TreatmentDoseShape:
6767
6868
Populated on :class:`PanelProfile` only when ``treatment_type ==
6969
"continuous"``; ``None`` otherwise. **Descriptive only** — none of
70-
these fields are ``ContinuousDiD`` prerequisites. The authoritative
71-
gates are ``PanelProfile.has_never_treated`` (unit-level
70+
these fields are ``ContinuousDiD`` prerequisites. The core
71+
field-based gates are ``PanelProfile.has_never_treated`` (unit-level
7272
never-treated existence), ``PanelProfile.treatment_varies_within_unit
7373
== False`` (per-unit full-path dose constancy, matching
7474
``ContinuousDiD.fit()``'s ``df.groupby(unit)[dose].nunique() > 1``
75-
rejection), and ``PanelProfile.is_balanced``.
75+
rejection), and ``PanelProfile.is_balanced``, plus the absence of
76+
the ``duplicate_unit_time_rows`` alert (``ContinuousDiD``'s
77+
precompute path silently resolves duplicate ``(unit, time)`` cells
78+
via last-row-wins, so duplicates must be removed before fitting).
7679
7780
``has_zero_dose`` is a row-level fact ("at least one observation has
7881
dose == 0"); it is NOT a substitute for ``has_never_treated``, which

tests/test_guides.py

Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,18 @@ def test_content_stability_autonomous_fingerprints():
4545
# has_never_treated is the authoritative ContinuousDiD gate;
4646
# treatment_dose fields are descriptive only.
4747
assert "has_never_treated" in text
48+
# The ContinuousDiD prerequisite summary must continue to mention
49+
# the duplicate-row hard stop alongside the field-based gates -
50+
# `_precompute_structures()` silently resolves duplicate cells via
51+
# last-row-wins, so a reader treating the summary as exhaustive
52+
# could route duplicate-containing panels into a silent-overwrite
53+
# path. Guard against that wording regression.
54+
assert "duplicate_unit_time_rows" in text, (
55+
"ContinuousDiD prerequisite summary must mention the "
56+
"`duplicate_unit_time_rows` alert: the precompute path resolves "
57+
"duplicate (unit, time) cells via last-row-wins, so duplicates "
58+
"must be removed before fitting."
59+
)
4860

4961

5062
def test_autonomous_contains_worked_examples_section():

0 commit comments

Comments
 (0)