Skip to content

Commit 44a552f

Browse files
igerberclaude
andcommitted
Address PR #356 CI review round 10 (1 P1 + 1 P2 + 1 P3)
Balanced-panel eligibility (P1): ContinuousDiD, EfficientDiD, SyntheticDiD, and HeterogeneousAdoptionDiD all hard-reject unbalanced panels at fit() time (continuous_did.py:329-338, efficient_did.py: 407-414, synthetic_did.py:399-412, had.py:1173-1188; REGISTRY.md cross-refs). Guide updates surface this: - New "Balanced-panel eligibility" block after §3 matrix footnotes names the four affected estimators and points at `PanelProfile.is_balanced == True` as the gate. Directs users with unbalanced panels to `diff_diff.prep.balance_panel()` or to a balance-tolerant estimator. - §4 per-estimator bullets for all four estimators prepend or append the balanced-panel requirement with the specific fit() error the caller would otherwise hit. - ContinuousDiD §4.7 bullet now lists THREE eligibility prerequisites (zero-dose controls, time-invariant dose, balanced panel) where it previously listed two. Docstring (P3): profile_panel() docstring notes block updated to match the binary-only has_always_treated semantics shipped in round 9. The old wording claimed the field fired on "strictly positive treatment in every observed non-NaN row" across numeric types, which no longer matches the implementation. Tests (P2): - Semantic guide test asserts `is_balanced` is mentioned in the guide and each of the four balance-sensitive estimators appears within 400 characters of a "balanced" / "is_balanced" marker, so future edits cannot silently drop the eligibility gate from any of them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
1 parent f30c121 commit 44a552f

3 files changed

Lines changed: 69 additions & 13 deletions

File tree

diff_diff/guides/llms-autonomous.txt

Lines changed: 22 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -260,6 +260,16 @@ supported / out of scope; `warn` supported but with documented caveats;
260260
intensity as a continuous first-stage variable; not a pure
261261
dose-response estimator - use `ContinuousDiD` for that.
262262

263+
**Balanced-panel eligibility.** The following estimators hard-reject
264+
unbalanced panels (each raises `ValueError` at `fit()` when a unit is
265+
missing any period): `ContinuousDiD`, `EfficientDiD`, `SyntheticDiD`,
266+
`HeterogeneousAdoptionDiD`. Gate these on
267+
`PanelProfile.is_balanced == True`; if `False`, pre-process with
268+
`diff_diff.prep.balance_panel()` or pick a balance-tolerant
269+
estimator from the remaining rows (CS/SA/dCDH/Imputation/TwoStage/
270+
Stacked/ETWFE all accept unbalanced input, with some caveats in their
271+
own docs).
272+
263273

264274
## §4. Estimator-choice reasoning by design feature
265275

@@ -317,7 +327,8 @@ estimators:
317327
covariates interactions; heterogeneous covariate-by-cohort effects.
318328
- `EfficientDiD` (Chen, Sant'Anna, Xie 2025) - asymptotically efficient
319329
under either `PT-All` or `PT-Post`; use `EfficientDiD.hausman_pretest`
320-
to pick.
330+
to pick. Requires a balanced panel (`PanelProfile.is_balanced ==
331+
True`); `fit()` raises `ValueError` on unbalanced input.
321332

322333
Diagnostic: `bacon_decompose(df, ...)` shows the weight allocation of a
323334
TWFE fit to 2×2 comparison types. Forbidden-comparison weight > 10% is a
@@ -382,12 +393,13 @@ worth considering.
382393
When `treatment_type == "continuous"`:
383394

384395
- `ContinuousDiD` (Callaway, Goodman-Bacon, Sant'Anna 2024) -
385-
continuous / dose-response treatment. **Two eligibility
396+
continuous / dose-response treatment. **Three eligibility
386397
prerequisites**: (a) zero-dose control units must exist
387398
(`P(D=0) > 0`) because Remark 3.1 (lowest-dose-as-control) is not
388-
yet implemented, and (b) dose must be time-invariant per unit (rule
389-
out panels where `PanelProfile.treatment_varies_within_unit ==
390-
True`). `fit()` raises `ValueError` in either case. Note that
399+
yet implemented, (b) dose must be time-invariant per unit (rule out
400+
panels where `PanelProfile.treatment_varies_within_unit == True`),
401+
and (c) the panel must be balanced (`PanelProfile.is_balanced ==
402+
True`). `fit()` raises `ValueError` in any of the three cases. Note that
391403
staggered adoption IS supported natively (adoption timing is
392404
expressed via the `first_treat` column, not via within-unit dose
393405
variation). The estimator exposes several dose-indexed targets that
@@ -411,6 +423,8 @@ but derivable from `cohort_sizes` + `has_never_treated`):
411423
- `SyntheticDiD` - synthetic-control-meets-DiD. Requires never-treated
412424
donors and sufficient pre-treatment periods (Arkhangelsky et al. 2021).
413425
Block treatment only: all treated units must adopt at the same time.
426+
Requires a balanced panel (`PanelProfile.is_balanced == True`);
427+
`fit()` raises `ValueError` and points at `balance_panel()`.
414428
- `TROP` - factor-model-based generalized synthetic control. Uses every
415429
unit untreated at period `t` as the donor pool (via the absorbing-state
416430
D matrix); supports staggered adoption and more complex factor
@@ -426,7 +440,9 @@ methods in the library are preferred.
426440
When adoption varies in strength across units (partial-adoption settings,
427441
intensity of exposure differs):
428442

429-
- `HeterogeneousAdoptionDiD` - targets a Weighted Average Slope (WAS)
443+
- `HeterogeneousAdoptionDiD` - requires a balanced panel
444+
(`PanelProfile.is_balanced == True`; `fit()` raises `ValueError`
445+
when any unit is missing a period). Targets a Weighted Average Slope (WAS)
430446
on single-period Heterogeneous Adoption Designs where no genuinely
431447
untreated group exists (paper Equation 2 / Theorem 1). The
432448
`target_parameter` attribute on the results object is literally

diff_diff/profile.py

Lines changed: 15 additions & 7 deletions
Original file line numberDiff line numberDiff line change
@@ -192,13 +192,21 @@ def profile_panel(
192192
``"categorical"``; cast to ``int`` if you want binary-treatment
193193
profiling.
194194
195-
``has_never_treated`` and ``has_always_treated`` are computed
196-
generically across numeric treatment types (both binary and
197-
continuous). ``has_never_treated`` fires when some unit has
198-
``treatment == 0`` in every observed non-NaN row; for continuous
199-
panels this flags zero-dose controls. ``has_always_treated`` fires
200-
when some unit has strictly-positive treatment in every observed
201-
non-NaN row. Both are always ``False`` for ``"categorical"``.
195+
``has_never_treated`` is computed across both binary and
196+
continuous numeric treatment types: some unit has ``treatment ==
197+
0`` in every observed non-NaN row. For binary this flags the
198+
clean-control group; for continuous this flags zero-dose controls
199+
(required by ``ContinuousDiD``). Always ``False`` for
200+
``"categorical"``.
201+
202+
``has_always_treated`` has binary-only semantics: some unit has
203+
``treatment == 1`` in every observed non-NaN row (no pre-treatment
204+
information in the DiD sense). For ``"continuous"`` and
205+
``"categorical"`` treatment this field is always ``False``
206+
regardless of dose positivity — pre-treatment periods on
207+
continuous DiD are determined by the separate ``first_treat``
208+
column passed to ``ContinuousDiD.fit``, not by whether the dose
209+
is strictly positive.
202210
203211
Rows with ``NaN`` in ``unit`` or ``time`` are dropped up front and
204212
surfaced via the ``missing_id_rows_dropped`` alert; all subsequent

tests/test_profile_panel.py

Lines changed: 32 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -624,6 +624,38 @@ def test_guide_api_strings_resolve_against_public_api():
624624
assert '`"pass"` / `"warn"` / `"inconclusive"`' not in text
625625
assert "verdict" in text.lower()
626626

627+
# Balanced-panel eligibility: ContinuousDiD, EfficientDiD,
628+
# SyntheticDiD, and HeterogeneousAdoptionDiD all hard-reject
629+
# unbalanced panels at fit() time. The guide must surface this
630+
# so agents gate these estimators on PanelProfile.is_balanced
631+
# before selecting them.
632+
assert "is_balanced" in text, (
633+
"Guide must mention PanelProfile.is_balanced as an eligibility "
634+
"check for balance-sensitive estimators"
635+
)
636+
for estimator in (
637+
"ContinuousDiD",
638+
"EfficientDiD",
639+
"SyntheticDiD",
640+
"HeterogeneousAdoptionDiD",
641+
):
642+
idx = 0
643+
found = False
644+
while idx < len(text):
645+
loc = text.find(estimator, idx)
646+
if loc < 0:
647+
break
648+
window = text[max(0, loc - 400) : loc + 400]
649+
if "balanced" in window.lower() or "is_balanced" in window:
650+
found = True
651+
break
652+
idx = loc + 1
653+
assert found, (
654+
f"Guide must mention a balanced-panel constraint near the "
655+
f"{estimator!r} bullet / row (hard-rejects unbalanced panels "
656+
"at fit time)"
657+
)
658+
627659

628660
def test_min_pre_post_use_per_unit_observed_support():
629661
"""On an unbalanced panel where one treated unit is missing its

0 commit comments

Comments
 (0)