|
1 | | -# SDID Practitioner Validation Tooling - Briefing |
2 | | - |
3 | | -## Problem |
4 | | - |
5 | | -A data scientist runs `SyntheticDiD`, gets an ATT and a p-value, and then |
6 | | -faces the question: *should I trust this estimate?* The library gives them the |
7 | | -point estimate and inference, but the validation workflow - the steps between |
8 | | -"I got a number" and "I'm confident enough to present this" - is largely |
9 | | -left to the practitioner to assemble from scratch. |
10 | | - |
11 | | -The standard validation workflow for synthetic control methods is well |
12 | | -understood in the econometrics literature (Arkhangelsky et al. 2021, |
13 | | -Abadie et al. 2010, Abadie 2021). The pieces include pre-treatment fit |
14 | | -assessment, weight diagnostics, placebo/falsification tests, sensitivity |
15 | | -analysis, and cross-estimator comparison. Our library provides some of the |
16 | | -raw ingredients (pre-treatment RMSE, weight dicts, placebo effects array) |
17 | | -but doesn't connect them into an accessible diagnostic workflow. |
18 | | - |
19 | | -The gap is most visible in `practitioner.py`, where `_handle_synthetic` |
20 | | -recommends in-time placebos and leave-one-out analysis but provides only |
21 | | -comment-only pseudo-code. A practitioner following that guidance hits a wall. |
22 | | - |
23 | | -## Current state |
24 | | - |
25 | | -What we have today: |
26 | | - |
27 | | -- `results.pre_treatment_fit` (RMSE) with a warning when it exceeds the |
28 | | - treated pre-period SD |
29 | | -- `results.get_unit_weights_df()` and `results.get_time_weights_df()` |
30 | | -- Three variance methods: placebo (default), bootstrap, and jackknife (just |
31 | | - landed in v3.1.1) |
32 | | -- `results.placebo_effects` - stores per-iteration estimates for all three |
33 | | - variance methods, but for jackknife these are positional LOO estimates |
34 | | - with no unit labels |
35 | | -- `results.summary()` shows top-5 unit weights and count of non-trivial weights |
36 | | -- `practitioner.py` guidance that names the right steps but can't point to |
37 | | - runnable code for most of them |
38 | | - |
39 | | -What the practitioner must currently build themselves: |
40 | | - |
41 | | -- Mapping jackknife LOO estimates back to unit identities to answer "which |
42 | | - unit, when dropped, changes my estimate the most?" |
43 | | -- In-time placebo tests (re-estimate with a fake treatment date) |
44 | | -- Any weight concentration metric beyond eyeballing the sorted list |
45 | | -- Any sense of whether their RMSE is "bad enough to worry about" beyond |
46 | | - the binary warning |
47 | | -- Regularization sensitivity (does the ATT change if I perturb zeta?) |
48 | | -- Pre-treatment trajectory data for plotting (the Y matrices are internal |
49 | | - to `fit()` and not returned) |
50 | | - |
51 | | -## Context from prior discussion |
52 | | - |
53 | | -The jackknife work created an interesting opportunity. The delete-one-re-estimate |
54 | | -loop already runs for SE computation. The per-unit ATT estimates are stored in |
55 | | -`results.placebo_effects`. The missing piece is a presentation layer that maps |
56 | | -those estimates to unit identities and surfaces the diagnostic interpretation |
57 | | -(which units are influential, how stable is the estimate to unit composition). |
58 | | - |
59 | | -More broadly, the validation gaps fall into two categories: |
60 | | - |
61 | | -1. **Low-marginal-cost additions** - things where the computation already |
62 | | - exists and we just need to expose or label it (LOO diagnostic from |
63 | | - jackknife, weight concentration metrics, trajectory data extraction) |
64 | | - |
65 | | -2. **New functionality** - things that require new estimation loops or |
66 | | - helpers (in-time placebo, regularization sensitivity sweep) |
67 | | - |
68 | | -The practitioner guidance in `practitioner.py` should evolve alongside any |
69 | | -new tooling so that the recommended steps point to real, runnable code paths. |
70 | | - |
71 | | -## What "done" looks like |
72 | | - |
73 | | -A practitioner using SyntheticDiD should be able to follow a credible |
74 | | -validation workflow using library-provided tools and guidance, without |
75 | | -needing to reverse-engineer internals or write substantial boilerplate. |
76 | | -The validation steps recognized in the literature should either be directly |
77 | | -supported or have clear, concrete guidance for how to perform them with |
78 | | -the library's API. |
79 | | - |
80 | | -This is not about adding visualization or plotting (that's a separate |
81 | | -concern). It's about making the computational and diagnostic building |
82 | | -blocks accessible and well-documented through the results API and |
83 | | -practitioner guidance. |
| 1 | +# dcdh-by-path — Briefing |
| 2 | + |
| 3 | +## The ask |
| 4 | + |
| 5 | +Clément de Chaisemartin (dCDH author) suggested implementing the `by_path` |
| 6 | +option from R's `did_multiplegt_dyn`. It disaggregates the dynamic event-study |
| 7 | +by observed treatment trajectory so practitioners can compare paths like: |
| 8 | + |
| 9 | +- `(0,1,0,0)` — one pulse |
| 10 | +- `(0,1,1,0)` — two periods on, then off |
| 11 | +- `(0,1,1,1)` — three periods on, then off |
| 12 | +- `(0,1,0,1)` vs `(0,1,1,0)` — sequencing |
| 13 | + |
| 14 | +Use case: "is a single pulse enough, or do you need sustained exposure?" |
| 15 | + |
| 16 | +## Where we stand today |
| 17 | + |
| 18 | +`diff_diff/chaisemartin_dhaultfoeuille.py` implements `ChaisemartinDHaultfoeuille`. |
| 19 | + |
| 20 | +- Supports reversible on/off treatments (the only estimator in the library |
| 21 | + that does) |
| 22 | +- **Currently drops multi-switch groups by default** (`drop_larger_lower=True`) — |
| 23 | + exactly the groups `by_path` wants to keep and compare |
| 24 | +- Stratifies by direction cohort (`DID_+`, `DID_-`, `S_g = sign(Δ)`) but not |
| 25 | + by trajectory |
| 26 | +- No `by_path`, `treatment_path`, or path-enumeration code exists anywhere |
| 27 | +- Not on ROADMAP.md; not in TODO.md |
| 28 | + |
| 29 | +## Shape of the work |
| 30 | + |
| 31 | +1. Parameter: likely `by_path: bool = False` (implies `drop_larger_lower=False`) |
| 32 | +2. Enumerate unique treatment histories `(D_{g,1}, …, D_{g,T})` per group; |
| 33 | + optionally accept a user-specified subset of paths of interest |
| 34 | +3. Per-path `DID_{g,l}` aggregation with influence-function SEs per path |
| 35 | +4. Result container extension: `path_effects` dict keyed by trajectory tuple, |
| 36 | + each holding ATT + SE + CI vectors |
| 37 | +5. Decide interaction with `drop_larger_lower`: probably forbid both being |
| 38 | + non-default simultaneously, or have `by_path` override |
| 39 | +6. REGISTRY.md section on path-heterogeneity methodology + deviation notes |
| 40 | +7. Methodology reference: `did_multiplegt_dyn` manual §on `by_path`; dCDH |
| 41 | + dynamic paper for the `DID_{g,l}` building block (already cited in REGISTRY) |
| 42 | + |
| 43 | +## Open methodology questions (for plan mode) |
| 44 | + |
| 45 | +- Which paths are enumerable? All observed, or user-specified subset only? |
| 46 | + R's default behavior on cardinality control is worth checking. |
| 47 | +- How does path stratification interact with the current cohort pooling |
| 48 | + `(D_{g,1}, F_g, S_g)` used for variance recentering — does it still apply |
| 49 | + per path? |
| 50 | +- Placebo and TWFE diagnostics: compute per-path or overall only? |
| 51 | +- Bootstrap interaction: per-path bootstrap blocks vs single bootstrap with |
| 52 | + per-path aggregation |
| 53 | + |
| 54 | +## Before starting |
| 55 | + |
| 56 | +- Pull the R manual section on `by_path` for `did_multiplegt_dyn` — the option |
| 57 | + spec there is load-bearing; don't infer from usage examples alone |
| 58 | +- Methodology changes: consult `docs/methodology/REGISTRY.md` first |
| 59 | +- New estimator surface → budget ~12-20 CI review rounds |
0 commit comments