Skip to content

Commit 335de3d

Browse files
igerberclaude
andcommitted
Add PanelProfile outcome_shape + treatment_dose extensions and autonomous-guide worked examples (Wave 2)
Wave 2 of the AI-agent enablement track. Extends profile_panel() with two new optional sub-dataclasses: - OutcomeShape (numeric outcomes only): n_distinct_values, pct_zeros, value_min/max, NaN-safe skewness + excess_kurtosis (gated on n_distinct >= 3 and std > 0), is_integer_valued, is_count_like (heuristic: integer-valued AND has zeros AND right-skewed AND > 2 distinct values), is_bounded_unit ([0, 1] support). - TreatmentDoseShape (treatment_type == "continuous" only): n_distinct_doses, has_zero_dose, dose_min/max/mean over non-zero doses, is_time_invariant (per-unit non-zero doses have at most one distinct value). Both fields are None when their classification gate is not met. to_dict() serializes the nested dataclasses as JSON-compatible nested dicts. llms-autonomous.txt gains a new §5 "Worked examples" with three end-to-end PanelProfile -> reasoning -> validation walkthroughs (binary staggered with never-treated controls, continuous dose with zero baseline, count-shaped outcome) plus §2 field-reference subsections, §3 footnote cross-ref, §4.7 cross-ref, and a new §4.11 outcome-shape considerations section. Existing §5-§8 renumbered to §6-§9. Descriptive only - no recommender language inside the worked examples. Tests: 16 new unit tests in tests/test_profile_panel.py covering each heuristic (count-like Poisson, binary-as-not-count-like, continuous normal, bounded unit, categorical returning None, skewness gating, JSON roundtrip, time-invariant dose, time-varying dose, no-zero-dose, binary-treatment returning None, categorical-treatment returning None, JSON roundtrip, frozen invariants on both new dataclasses). Two new content-stability tests in tests/test_guides.py guard the §5 worked examples and the new field references. CHANGELOG and ROADMAP updated; ROADMAP marks Wave 2 shipped, promotes sanity_checks block to top of "Next blocks toward the vision," and documents why the originally-proposed post-hoc mismatch detection was rescoped (largely overlaps existing fit-time validators and caveats). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent 4852b34 commit 335de3d

7 files changed

Lines changed: 818 additions & 17 deletions

File tree

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -8,6 +8,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
88
## [Unreleased]
99

1010
### Added
11+
- **`PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions + `llms-autonomous.txt` worked examples (Wave 2 of the AI-agent enablement track).** `profile_panel(...)` now populates two new optional sub-dataclasses on the returned `PanelProfile`: `outcome_shape: Optional[OutcomeShape]` (numeric outcomes only — exposes `n_distinct_values`, `pct_zeros`, `value_min` / `value_max`, `skewness` and `excess_kurtosis` (NaN-safe; `None` when `n_distinct_values < 3` or variance is zero), `is_integer_valued`, `is_count_like` (heuristic: integer-valued AND has zeros AND right-skewed AND > 2 distinct values; flags WooldridgeDiD QMLE consideration over linear OLS), `is_bounded_unit` ([0, 1] support)) and `treatment_dose: Optional[TreatmentDoseShape]` (continuous treatments only — exposes `n_distinct_doses`, `has_zero_dose`, `dose_min` / `dose_max` / `dose_mean` over non-zero doses, `is_time_invariant` (per-unit non-zero doses have at most one distinct value; gates the ContinuousDiD `fit()`-time prerequisites pre-fit)). Both fields are `None` when their classification gate is not met (e.g., `treatment_dose is None` for binary treatments). `to_dict()` serializes the nested dataclasses as JSON-compatible nested dicts. New exports: `OutcomeShape`, `TreatmentDoseShape` from top-level `diff_diff`. `llms-autonomous.txt` gains a new §5 "Worked examples" section with three end-to-end PanelProfile -> reasoning -> validation walkthroughs (binary staggered with never-treated controls, continuous dose with zero baseline, count-shaped outcome) plus §2 field-reference subsections for the new shape fields and §4.7 / §4.11 cross-references for outcome-shape considerations. Existing §5-§8 of the autonomous guide are renumbered to §6-§9. Descriptive only — no recommender language inside the worked examples.
1112
- **`HeterogeneousAdoptionDiD.fit(survey=..., weights=...)` on continuous-dose paths (Phase 4.5 survey support).** The `continuous_at_zero` (paper Design 1') and `continuous_near_d_lower` (Design 1 continuous-near-d̲) designs accept survey weights through two interchangeable kwargs: `weights=<array>` (pweight shortcut, weighted-robust SE from the CCT-2014 lprobust port) and `survey=SurveyDesign(weights, strata, psu, fpc)` (design-based inference via Binder-TSL variance using the existing `compute_survey_if_variance` helper at `diff_diff/survey.py:1802`). Point estimates match across both entry paths; SE diverges by design (pweight-only vs PSU-aggregated). `HeterogeneousAdoptionDiDResults.survey_metadata` is a repo-standard `SurveyMetadata` dataclass (weight_type / effective_n / design_effect / sum_weights / weight_range / n_strata / n_psu / df_survey); HAD-specific extras (`variance_formula` label, `effective_dose_mean`) are separate top-level result fields. `to_dict()` surfaces the full `SurveyMetadata` object plus `variance_formula` + `effective_dose_mean`; `summary()` renders `variance_formula`, `effective_n`, `effective_dose_mean`, and (when the survey= path is used) `df_survey`; `__repr__` surfaces `variance_formula` + `effective_dose_mean` when present. The HAD `mass_point` design and `aggregate="event_study"` path raise `NotImplementedError` under survey/weights (deferred to Phase 4.5 B: weighted 2SLS + event-study survey composition); the HAD pretests stay unweighted in this release (Phase 4.5 C). Parity ceiling acknowledged — no public weighted-CCF bias-corrected local-linear reference exists in any language; methodology confidence comes from (1) uniform-weights bit-parity at `atol=1e-14` on the full lprobust output struct, (2) cross-language weighted-OLS parity (manual R reference) at `atol=1e-12`, and (3) Monte Carlo oracle consistency on known-τ DGPs. `_nprobust_port.lprobust` gains `weights=` and `return_influence=` (used internally by the Binder-TSL path); `bias_corrected_local_linear` removes the Phase 1c `NotImplementedError` on `weights=` and forwards. Auto-bandwidth selection remains unweighted in this release — pass `h`/`b` explicitly for weight-aware bandwidths. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Weighted extension (Phase 4.5 survey support)".
1213
- **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication).
1314
- **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`.

ROADMAP.md

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -138,14 +138,14 @@ Long-running program, framed as "building toward" rather than with discrete ship
138138
- Baker et al. (2025) 8-step workflow enforcement in `diff_diff/practitioner.py`.
139139
- `practitioner_next_steps()` context-aware guidance.
140140
- Runtime LLM guides via `get_llm_guide(...)` (`llms.txt`, `llms-full.txt`, `llms-practitioner.txt`, `llms-autonomous.txt`), bundled in the wheel.
141-
- `profile_panel(df, ...)` returns a `PanelProfile` dataclass of structural facts about the panel - factual, not opinionated. Pairs with the `"autonomous"` guide variant (reference-shaped: estimator-support matrix + per-design-feature reasoning) so agents describe the data then consult a bundled reference rather than calling a deterministic recommender.
141+
- `profile_panel(df, ...)` returns a `PanelProfile` dataclass of structural facts about the panel - factual, not opinionated. Pairs with the `"autonomous"` guide variant (reference-shaped: estimator-support matrix + per-design-feature reasoning) so agents describe the data then consult a bundled reference rather than calling a deterministic recommender. `PanelProfile.outcome_shape` and `PanelProfile.treatment_dose` extensions expose count-likeness, bounded-support, dose support, and time-invariance facts that gate WooldridgeDiD QMLE / ContinuousDiD prerequisites pre-fit. The autonomous guide §5 walks through three end-to-end PanelProfile -> reasoning -> validation worked examples.
142142
- Package docstring leads with an "For AI agents" entry block so `help(diff_diff)` surfaces the agent entry points automatically.
143143
- Silent-operation warnings so agents and humans see the same signals at the same time.
144144

145145
**Next blocks toward the vision.**
146146

147-
- **Post-hoc mismatch detection in BR/DR output** - surfaces structured warnings like "you fit TWFE on staggered data with 37% forbidden-comparison weights" when the profile and the fitted estimator disagree. Safety net, not a pre-emptive rules engine.
148-
- **Structured `sanity_checks` block in BR/DR** - machine-legible pass / warn / fail signals (pretrends, power, forbidden-comparisons, event-study cleanliness, placebo, sensitivity) so agents can dispatch on a stable schema rather than parsing prose.
147+
- **Structured `sanity_checks` block in BR/DR** - machine-legible pass / warn / fail signals for pretrends, power, forbidden-comparisons, event-study cleanliness, placebo, and sensitivity, so agents dispatch on a stable schema rather than parsing prose. Highest-leverage net-new agent decision surface; orthogonal to existing `caveats` and to fit-time validators.
148+
- **Post-hoc mismatch detection in BR/DR output** - originally proposed as Wave 2 but rescoped after a plan review showed most candidate checks duplicate fit-time validators (which raise `ValueError` before any fitted result exists) or the existing `caveats` block (TWFE-on-staggered is already surfaced via `bacon_contamination`). Held for revisiting only if the `sanity_checks` rollout uncovers genuine post-fit mismatch signals not caught by current surfaces.
149149
- **Context-aware `practitioner_next_steps()`** that substitutes actual column names - turns guidance into executable recommendations.
150150
- **Unified `assess_*` verb** across estimator native-diagnostic methods for a single discoverable convention.
151151
- **End-to-end scenario walkthrough templates** - reusable orchestration recipes an agent can adapt from data ingest through business-ready output.

diff_diff/__init__.py

Lines changed: 9 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -250,7 +250,13 @@
250250
DiagnosticReportResults,
251251
)
252252
from diff_diff._guides_api import get_llm_guide
253-
from diff_diff.profile import Alert, PanelProfile, profile_panel
253+
from diff_diff.profile import (
254+
Alert,
255+
OutcomeShape,
256+
PanelProfile,
257+
TreatmentDoseShape,
258+
profile_panel,
259+
)
254260
from diff_diff.datasets import (
255261
clear_cache,
256262
list_datasets,
@@ -498,6 +504,8 @@
498504
"profile_panel",
499505
"PanelProfile",
500506
"Alert",
507+
"OutcomeShape",
508+
"TreatmentDoseShape",
501509
# LLM guide accessor
502510
"get_llm_guide",
503511
]

0 commit comments

Comments
 (0)