Skip to content

Commit e07e547

Browse files
authored
Merge pull request #357 from igerber/dcdh-by-path
dCDH: add `by_path` per-path event-study disaggregation
2 parents 869c19a + d4d9bd5 commit e07e547

9 files changed

Lines changed: 1186 additions & 87 deletions

BRIEFING.md

Lines changed: 59 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -1,83 +1,59 @@
1-
# SDID Practitioner Validation Tooling - Briefing
2-
3-
## Problem
4-
5-
A data scientist runs `SyntheticDiD`, gets an ATT and a p-value, and then
6-
faces the question: *should I trust this estimate?* The library gives them the
7-
point estimate and inference, but the validation workflow - the steps between
8-
"I got a number" and "I'm confident enough to present this" - is largely
9-
left to the practitioner to assemble from scratch.
10-
11-
The standard validation workflow for synthetic control methods is well
12-
understood in the econometrics literature (Arkhangelsky et al. 2021,
13-
Abadie et al. 2010, Abadie 2021). The pieces include pre-treatment fit
14-
assessment, weight diagnostics, placebo/falsification tests, sensitivity
15-
analysis, and cross-estimator comparison. Our library provides some of the
16-
raw ingredients (pre-treatment RMSE, weight dicts, placebo effects array)
17-
but doesn't connect them into an accessible diagnostic workflow.
18-
19-
The gap is most visible in `practitioner.py`, where `_handle_synthetic`
20-
recommends in-time placebos and leave-one-out analysis but provides only
21-
comment-only pseudo-code. A practitioner following that guidance hits a wall.
22-
23-
## Current state
24-
25-
What we have today:
26-
27-
- `results.pre_treatment_fit` (RMSE) with a warning when it exceeds the
28-
treated pre-period SD
29-
- `results.get_unit_weights_df()` and `results.get_time_weights_df()`
30-
- Three variance methods: placebo (default), bootstrap, and jackknife (just
31-
landed in v3.1.1)
32-
- `results.placebo_effects` - stores per-iteration estimates for all three
33-
variance methods, but for jackknife these are positional LOO estimates
34-
with no unit labels
35-
- `results.summary()` shows top-5 unit weights and count of non-trivial weights
36-
- `practitioner.py` guidance that names the right steps but can't point to
37-
runnable code for most of them
38-
39-
What the practitioner must currently build themselves:
40-
41-
- Mapping jackknife LOO estimates back to unit identities to answer "which
42-
unit, when dropped, changes my estimate the most?"
43-
- In-time placebo tests (re-estimate with a fake treatment date)
44-
- Any weight concentration metric beyond eyeballing the sorted list
45-
- Any sense of whether their RMSE is "bad enough to worry about" beyond
46-
the binary warning
47-
- Regularization sensitivity (does the ATT change if I perturb zeta?)
48-
- Pre-treatment trajectory data for plotting (the Y matrices are internal
49-
to `fit()` and not returned)
50-
51-
## Context from prior discussion
52-
53-
The jackknife work created an interesting opportunity. The delete-one-re-estimate
54-
loop already runs for SE computation. The per-unit ATT estimates are stored in
55-
`results.placebo_effects`. The missing piece is a presentation layer that maps
56-
those estimates to unit identities and surfaces the diagnostic interpretation
57-
(which units are influential, how stable is the estimate to unit composition).
58-
59-
More broadly, the validation gaps fall into two categories:
60-
61-
1. **Low-marginal-cost additions** - things where the computation already
62-
exists and we just need to expose or label it (LOO diagnostic from
63-
jackknife, weight concentration metrics, trajectory data extraction)
64-
65-
2. **New functionality** - things that require new estimation loops or
66-
helpers (in-time placebo, regularization sensitivity sweep)
67-
68-
The practitioner guidance in `practitioner.py` should evolve alongside any
69-
new tooling so that the recommended steps point to real, runnable code paths.
70-
71-
## What "done" looks like
72-
73-
A practitioner using SyntheticDiD should be able to follow a credible
74-
validation workflow using library-provided tools and guidance, without
75-
needing to reverse-engineer internals or write substantial boilerplate.
76-
The validation steps recognized in the literature should either be directly
77-
supported or have clear, concrete guidance for how to perform them with
78-
the library's API.
79-
80-
This is not about adding visualization or plotting (that's a separate
81-
concern). It's about making the computational and diagnostic building
82-
blocks accessible and well-documented through the results API and
83-
practitioner guidance.
1+
# dcdh-by-path — Briefing
2+
3+
## The ask
4+
5+
Clément de Chaisemartin (dCDH author) suggested implementing the `by_path`
6+
option from R's `did_multiplegt_dyn`. It disaggregates the dynamic event-study
7+
by observed treatment trajectory so practitioners can compare paths like:
8+
9+
- `(0,1,0,0)` — one pulse
10+
- `(0,1,1,0)` — two periods on, then off
11+
- `(0,1,1,1)` — three periods on, then off
12+
- `(0,1,0,1)` vs `(0,1,1,0)` — sequencing
13+
14+
Use case: "is a single pulse enough, or do you need sustained exposure?"
15+
16+
## Where we stand today
17+
18+
`diff_diff/chaisemartin_dhaultfoeuille.py` implements `ChaisemartinDHaultfoeuille`.
19+
20+
- Supports reversible on/off treatments (the only estimator in the library
21+
that does)
22+
- **Currently drops multi-switch groups by default** (`drop_larger_lower=True`) —
23+
exactly the groups `by_path` wants to keep and compare
24+
- Stratifies by direction cohort (`DID_+`, `DID_-`, `S_g = sign(Δ)`) but not
25+
by trajectory
26+
- No `by_path`, `treatment_path`, or path-enumeration code exists anywhere
27+
- Not on ROADMAP.md; not in TODO.md
28+
29+
## Shape of the work
30+
31+
1. Parameter: likely `by_path: bool = False` (implies `drop_larger_lower=False`)
32+
2. Enumerate unique treatment histories `(D_{g,1}, …, D_{g,T})` per group;
33+
optionally accept a user-specified subset of paths of interest
34+
3. Per-path `DID_{g,l}` aggregation with influence-function SEs per path
35+
4. Result container extension: `path_effects` dict keyed by trajectory tuple,
36+
each holding ATT + SE + CI vectors
37+
5. Decide interaction with `drop_larger_lower`: probably forbid both being
38+
non-default simultaneously, or have `by_path` override
39+
6. REGISTRY.md section on path-heterogeneity methodology + deviation notes
40+
7. Methodology reference: `did_multiplegt_dyn` manual §on `by_path`; dCDH
41+
dynamic paper for the `DID_{g,l}` building block (already cited in REGISTRY)
42+
43+
## Open methodology questions (for plan mode)
44+
45+
- Which paths are enumerable? All observed, or user-specified subset only?
46+
R's default behavior on cardinality control is worth checking.
47+
- How does path stratification interact with the current cohort pooling
48+
`(D_{g,1}, F_g, S_g)` used for variance recentering — does it still apply
49+
per path?
50+
- Placebo and TWFE diagnostics: compute per-path or overall only?
51+
- Bootstrap interaction: per-path bootstrap blocks vs single bootstrap with
52+
per-path aggregation
53+
54+
## Before starting
55+
56+
- Pull the R manual section on `by_path` for `did_multiplegt_dyn` — the option
57+
spec there is load-bearing; don't infer from usage examples alone
58+
- Methodology changes: consult `docs/methodology/REGISTRY.md` first
59+
- New estimator surface → budget ~12-20 CI review rounds

CHANGELOG.md

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,7 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1010
### Added
1111
- **`stute_joint_pretest`, `joint_pretrends_test`, `joint_homogeneity_test` + `StuteJointResult`** (HeterogeneousAdoptionDiD Phase 3 follow-up). Joint Cramér-von Mises pretests across K horizons with shared-η Mammen wild bootstrap (preserves vector-valued empirical-process unit-level dependence per Delgado-Manteiga 2001 / Hlávka-Hušková 2020). The core `stute_joint_pretest` is residuals-in; two thin data-in wrappers construct per-horizon residuals for the two nulls the paper spells out: mean-independence (step 2 pre-trends, `OLS(Y_t − Y_base ~ 1)` per pre-period) and linearity (step 3 joint, `OLS(Y_t − Y_base ~ 1 + D)` per post-period). Sum-of-CvMs aggregation (`S_joint = Σ_k S_k`); per-horizon scale-invariant exact-linear short-circuit. Closes the paper Section 4.2 step-2 gap that Phase 3 `did_had_pretest_workflow` previously flagged with an "Assumption 7 pre-trends test NOT run" caveat. See `docs/methodology/REGISTRY.md` §HeterogeneousAdoptionDiD "Joint Stute tests" for algorithm, invariants, and scope exclusion of Eq 18 linear-trend detrending (deferred to Phase 4 Pierce-Schott replication).
1212
- **`did_had_pretest_workflow(aggregate="event_study")`**: multi-period dispatch on balanced ≥3-period panels. Runs QUG at `F` + joint pre-trends Stute across earlier pre-periods + joint homogeneity-linearity Stute across post-periods. Step 2 closure requires ≥2 pre-periods; with only a single pre-period (the base `F-1`) `pretrends_joint=None` and the verdict flags the skip. Reuses the Phase 2b event-study panel validator (last-cohort auto-filter under staggered timing with `UserWarning`; `ValueError` when `first_treat_col=None` and the panel is staggered). The data-in wrappers `joint_pretrends_test` and `joint_homogeneity_test` also route through that same validator internally, so direct wrapper calls inherit the last-cohort filter and constant-post-dose invariant. `HADPretestReport` extended with `pretrends_joint`, `homogeneity_joint`, and `aggregate` fields; serialization methods (`summary`, `to_dict`, `to_dataframe`, `__repr__`) preserve the Phase 3 output bit-exactly on `aggregate="overall"` — no `aggregate` key, no header row, no schema drift — and only surface the new fields on `aggregate="event_study"`.
13+
- **`ChaisemartinDHaultfoeuille.by_path`** — per-path event-study disaggregation, mirroring R `did_multiplegt_dyn(..., by_path=k)`. Passing `by_path=k` (positive int) to the estimator reports separate `DID_{path,l}` + SE + inference for the top-k most common observed treatment paths in the window `[F_g-1, F_g-1+L_max]`, answering the practitioner question "is a single pulse enough, or do you need sustained exposure?" across paths like `(0,1,0,0)` vs `(0,1,1,0)` vs `(0,1,1,1)`. The per-path SE follows the joiners-only / leavers-only IF precedent (switcher-side contribution zeroed for non-path groups; control pool and cohort structure unchanged; plug-in SE with path-specific divisor). Requires `drop_larger_lower=False` (multi-switch groups are the object of interest) and `L_max >= 1`. Binary treatment only in this release; combinations with `controls`, `trends_linear`, `trends_nonparam`, `heterogeneity`, `design2`, `honest_did`, `survey_design`, and `n_bootstrap > 0` raise `NotImplementedError` and are deferred to follow-up PRs. Results expose `results.path_effects: Dict[Tuple[int, ...], Dict[str, Any]]` and `results.to_dataframe(level="by_path")`; the summary grows a "Treatment-Path Disaggregation" block. Ties in path frequency are broken lexicographically on the path tuple for deterministic ranking. Overflow (`by_path > n_observed_paths`) returns all observed paths with a `UserWarning`. See `docs/methodology/REGISTRY.md` §ChaisemartinDHaultfoeuille `Note (Phase 3 by_path per-path event-study disaggregation)` for the full contract.
1314
- **`target_parameter` block in BR/DR schemas (experimental; schema version bumped to 2.0)** — `BUSINESS_REPORT_SCHEMA_VERSION` and `DIAGNOSTIC_REPORT_SCHEMA_VERSION` bumped from `"1.0"` to `"2.0"` because the new `"no_scalar_by_design"` value on the `headline.status` / `headline_metric.status` enum (dCDH `trends_linear=True, L_max>=2` configuration) is a breaking change per the REPORTING.md stability policy. BusinessReport and DiagnosticReport now emit a top-level `target_parameter` block naming what the headline scalar actually represents for each of the 16 result classes. Closes BR/DR foundation gap #6 (target-parameter clarity). Fields: `name`, `definition`, `aggregation` (machine-readable dispatch tag), `headline_attribute` (raw result attribute), `reference` (citation pointer). BR's summary emits the short `name` right after the headline; DR's overall-interpretation paragraph does the same; both full reports carry a "## Target Parameter" section with the full definition. Per-estimator dispatch is sourced from REGISTRY.md and lives in the new `diff_diff/_reporting_helpers.py::describe_target_parameter`. A few branches read fit-time config (`EfficientDiDResults.pt_assumption`, `StackedDiDResults.clean_control`, `ChaisemartinDHaultfoeuilleResults.L_max` / `covariate_residuals` / `linear_trends_effects`); others emit a fixed tag (the fit-time `aggregate` kwarg on CS / Imputation / TwoStage / Wooldridge does not change the `overall_att` scalar — disambiguating horizon / group tables is tracked under gap #9). See `docs/methodology/REPORTING.md` "Target parameter" section.
1415
- SyntheticDiD coverage Monte Carlo calibration table added to `docs/methodology/REGISTRY.md` §SyntheticDiD — rejection rates at α ∈ {0.01, 0.05, 0.10} across `placebo` / `bootstrap` / `jackknife` on 3 representative DGPs (balanced / exchangeable, unbalanced, and Arkhangelsky et al. (2021) AER §6.3 non-exchangeable). Artifact at `benchmarks/data/sdid_coverage.json` (500 seeds × B=200), regenerable via `benchmarks/python/coverage_sdid.py`.
1516

ROADMAP.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ See [Survey Design Support](docs/choosing_estimator.rst#survey-design-support) f
5858
Major landings since the prior roadmap revision. See [CHANGELOG.md](CHANGELOG.md) for the full history.
5959

6060
- **`BusinessReport` and `DiagnosticReport`** - practitioner-ready output layer. Plain-English stakeholder summaries + unified diagnostic runner with a stable AI-legible `to_dict()` schema. `BusinessReport` auto-constructs `DiagnosticReport` by default so summaries mention pre-trends, robustness, and design-effect findings in one call. Estimator-native validation surfaces are routed through: SyntheticDiD uses `pre_treatment_fit` / `in_time_placebo` / `sensitivity_to_zeta_omega`; EfficientDiD uses its native `hausman_pretest`; TROP exposes factor-model fit metrics. See `docs/methodology/REPORTING.md` for methodology deviations including no-traffic-light gates, pre-trends verdict thresholds, and power-aware phrasing.
61-
- **ChaisemartinDHaultfoeuille (dCDH)** - full feature set: `DID_M` contemporaneous-switch, multi-horizon `DID_l` event study, analytical SE, multiplier bootstrap, TWFE decomposition diagnostic, dynamic placebos, normalized estimator, cost-benefit aggregate, sup-t bands, covariate adjustment (`DID^X`), group-specific linear trends (`DID^{fd}`), state-set-specific trends, heterogeneity testing, non-binary treatment, HonestDiD integration, and survey support (TSL + pweight).
61+
- **ChaisemartinDHaultfoeuille (dCDH)** - full feature set: `DID_M` contemporaneous-switch, multi-horizon `DID_l` event study, analytical SE, multiplier bootstrap, TWFE decomposition diagnostic, dynamic placebos, normalized estimator, cost-benefit aggregate, sup-t bands, covariate adjustment (`DID^X`), group-specific linear trends (`DID^{fd}`), state-set-specific trends, heterogeneity testing, non-binary treatment, HonestDiD integration, survey support (TSL + pweight), and per-path event-study disaggregation via `by_path=k` (mirrors R `did_multiplegt_dyn(..., by_path=k)`).
6262
- **SyntheticDiD jackknife variance** (`variance_method='jackknife'`) with survey-weighted jackknife.
6363
- **SyntheticDiD validation diagnostics**.
6464
- **Survey support completion** - all 16 estimators accept `survey_design`; `aggregate_survey()` microdata-to-panel bridge with `second_stage_weights` parameter; `conditional_pt` DGP parameter for conditional-PT scenarios.

0 commit comments

Comments
 (0)