Skip to content

Commit 787eae2

Browse files
igerberclaude
andcommitted
Round 4: doc/contract cleanups (joiners_leavers DataFrame, stale docstrings)
P2: split joiners_leavers DataFrame into n_cells + n_obs columns - to_dataframe(level="joiners_leavers") previously had a single n_obs column with mixed semantics by row (DID_M used switcher cell count; DID_+/DID_- used raw observation counts). Two columns with consistent units across all rows: n_cells (count of switching (g, t) cells) and n_obs (sum of n_gt over the same cells). DID_M row uses union of joiner + leaver cells. Updated test_to_dataframe_joiners_leavers to pin the new contract. P3: stale docstrings on results object - DCDHBootstrapResults class docstring now states explicitly that placebo bootstrap fields ALWAYS remain None in Phase 1 (the previous wording said they were "populated when available"). Per-field docstrings for placebo_se / placebo_ci / placebo_p_value now point back to the class-level note. - n_groups_dropped_never_switching docstring now reflects the Round 2 full-IF fix: never-switching groups participate in the variance via stable-control roles and the field is reported for backwards compatibility only — no actual exclusion happens. - n_groups_dropped_singleton_baseline docstring clarifies the variance-only filter scope (cell DataFrame retains them as period-based stable controls). P3: misleading R-script + prep_dgp comments - benchmarks/R/generate_dcdh_dynr_test_values.R: clarified that the Python and R generators mirror each other STRUCTURALLY (same pattern logic, same FE/effect/noise model), not at the RNG level. R's set.seed and NumPy's default_rng use different RNGs. Parity tests load the R script's golden-value JSON so both sides operate on byte-identical input regardless of how it was originally generated. - prep_dgp.py generate_reversible_did_data: clarified that the default single_switch pattern is A5-safe by construction (every group has at most one transition). Other patterns (random/cycles/marketing) ARE allowed to violate A5 and exist primarily as stress tests for the drop_larger_lower=True filter. The cohort-recentered variance formula is derived under A5, which is why drop_larger_lower defaults to True. Tests: 103 dCDH passing (no new tests; the existing test_to_dataframe_joiners_leavers was strengthened to assert the new n_cells / n_obs contract). Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
1 parent df7a0a3 commit 787eae2

4 files changed

Lines changed: 83 additions & 25 deletions

File tree

benchmarks/R/generate_dcdh_dynr_test_values.R

Lines changed: 10 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -35,9 +35,16 @@ output_path <- file.path("benchmarks", "data", "dcdh_dynr_golden_values.json")
3535

3636
# ---------------------------------------------------------------------------
3737
# Helper: Python-mirror reversible-treatment generator.
38-
# Mirrors generate_reversible_did_data() in diff_diff/prep_dgp.py.
39-
# Both use np.random.default_rng(seed) / set.seed(seed) so the same seed
40-
# produces an identical treatment matrix and outcomes.
38+
# Mirrors generate_reversible_did_data() in diff_diff/prep_dgp.py at the
39+
# STRUCTURAL level — the two implementations apply the same pattern logic
40+
# (single_switch / joiners_only / leavers_only / mixed_single_switch) and
41+
# the same fixed-effect / treatment-effect / time-trend / noise model. They
42+
# do NOT produce bit-identical draws even with the same seed: R's set.seed
43+
# and NumPy's default_rng use different RNGs and the parity tests don't
44+
# rely on RNG identity. Instead, the parity tests load THIS R script's
45+
# golden-value JSON output and pass the SAME data (group/period/treatment/
46+
# outcome columns) to the Python estimator, so both sides operate on
47+
# byte-identical input regardless of how it was originally generated.
4148
# ---------------------------------------------------------------------------
4249
gen_reversible <- function(n_groups, n_periods, pattern, seed,
4350
p_switch = 0.2, initial_treat_frac = 0.3,

diff_diff/chaisemartin_dhaultfoeuille_results.py

Lines changed: 43 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -42,11 +42,19 @@ class DCDHBootstrapResults:
4242
Web Appendix Section 3.7.3 of the dynamic companion paper. Provided
4343
for consistency with CallawaySantAnna / ImputationDiD / TwoStageDiD.
4444
45-
Per-target SE / CI / p-value are populated for each scalar dCDH
46-
estimand: overall (``DID_M``), joiners (``DID_+``), leavers
47-
(``DID_-``), and the placebo (``DID_M^pl``). When a target is not
48-
available in the underlying data (e.g., no leavers), the matching
49-
fields are ``None``.
45+
Per-target SE / CI / p-value are populated for the three scalar
46+
dCDH estimands implemented in Phase 1: overall (``DID_M``), joiners
47+
(``DID_+``), and leavers (``DID_-``). When a target is not available
48+
in the underlying data (e.g., no leavers), the matching fields are
49+
``None``.
50+
51+
**Phase 1 placebo bootstrap is intentionally NOT computed.** The
52+
dynamic companion paper Section 3.7.3 derives the cohort-recentered
53+
analytical variance for ``DID_l`` only, not for the placebo
54+
``DID_M^pl``. The ``placebo_se`` / ``placebo_ci`` / ``placebo_p_value``
55+
fields below ALWAYS remain ``None`` in Phase 1, even when
56+
``n_bootstrap > 0``. Phase 2 will add multiplier-bootstrap support
57+
for the placebo via the dynamic paper's machinery.
5058
5159
Attributes
5260
----------
@@ -76,11 +84,12 @@ class DCDHBootstrapResults:
7684
leavers_p_value : float, optional
7785
Bootstrap p-value for leavers-only ``DID_-``.
7886
placebo_se : float, optional
79-
Bootstrap SE for the placebo ``DID_M^pl`` (``None`` if T < 3).
87+
**Always ``None`` in Phase 1** — placebo bootstrap is deferred
88+
to Phase 2 (see class docstring above).
8089
placebo_ci : tuple of float, optional
81-
Bootstrap CI for the placebo.
90+
**Always ``None`` in Phase 1** (see class docstring above).
8291
placebo_p_value : float, optional
83-
Bootstrap p-value for the placebo.
92+
**Always ``None`` in Phase 1** (see class docstring above).
8493
bootstrap_distribution : np.ndarray, optional
8594
Full bootstrap distribution of the overall ``DID_M`` estimator
8695
(shape: ``(n_bootstrap,)``). Stored for advanced diagnostics;
@@ -232,12 +241,19 @@ class ChaisemartinDHaultfoeuilleResults:
232241
R's ``drop_larger_lower=TRUE`` behavior). ``0`` when
233242
``drop_larger_lower=False`` or no crossers exist.
234243
n_groups_dropped_singleton_baseline : int
235-
Number of groups dropped because their baseline ``D_{g,1}`` was
236-
unique (footnote 15 of the dynamic paper).
244+
Number of groups whose baseline ``D_{g,1}`` is unique in the
245+
post-drop panel (footnote 15 of the dynamic paper). They are
246+
excluded from the cohort-recentered VARIANCE computation only —
247+
they remain in the point-estimate sample as period-based stable
248+
controls (see REGISTRY.md ``ChaisemartinDHaultfoeuille`` for the
249+
period-vs-cohort deviation that makes this distinction matter).
237250
n_groups_dropped_never_switching : int
238-
Number of groups with ``S_g = 0`` (never switched). These are
239-
excluded from the variance computation but may still contribute
240-
to the point estimate via stable controls.
251+
Number of groups with ``S_g = 0`` (never switched). **Reported
252+
for backwards compatibility only.** Per the Round 2 full
253+
influence-function fix, never-switching groups are NOT excluded
254+
from the variance: they contribute via their stable-control
255+
roles in the per-period IF formula. The field name retains
256+
"dropped" for API stability but no actual exclusion happens.
241257
alpha : float
242258
Significance level used for confidence intervals.
243259
event_study_effects : dict, optional
@@ -643,6 +659,16 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
643659
)
644660

645661
elif level == "joiners_leavers":
662+
# Two separate count columns so each has consistent units
663+
# across all rows:
664+
# n_cells: total switching cells (each (g, t) cell counted once)
665+
# n_obs: actual observation count summed over the same cells
666+
# (equals n_cells on balanced 1-obs-per-cell panels;
667+
# larger on individual-level inputs with multiple
668+
# observations per cell).
669+
# For the DID_M row, both quantities use the overall switching
670+
# cell set: n_cells = sum of joiner + leaver cells, and n_obs
671+
# is the same sum of raw observation counts.
646672
rows = [
647673
{
648674
"estimand": "DID_M",
@@ -652,7 +678,8 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
652678
"p_value": self.overall_p_value,
653679
"conf_int_lower": self.overall_conf_int[0],
654680
"conf_int_upper": self.overall_conf_int[1],
655-
"n_obs": self.n_switcher_cells,
681+
"n_cells": self.n_switcher_cells,
682+
"n_obs": self.n_joiner_obs + self.n_leaver_obs,
656683
"available": True,
657684
},
658685
{
@@ -663,6 +690,7 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
663690
"p_value": self.joiners_p_value,
664691
"conf_int_lower": self.joiners_conf_int[0],
665692
"conf_int_upper": self.joiners_conf_int[1],
693+
"n_cells": self.n_joiner_cells,
666694
"n_obs": self.n_joiner_obs,
667695
"available": self.joiners_available,
668696
},
@@ -674,6 +702,7 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
674702
"p_value": self.leavers_p_value,
675703
"conf_int_lower": self.leavers_conf_int[0],
676704
"conf_int_upper": self.leavers_conf_int[1],
705+
"n_cells": self.n_leaver_cells,
677706
"n_obs": self.n_leaver_obs,
678707
"available": self.leavers_available,
679708
},

diff_diff/prep_dgp.py

Lines changed: 12 additions & 8 deletions
Original file line numberDiff line numberDiff line change
@@ -1874,14 +1874,18 @@ def generate_reversible_did_data(
18741874
will produce data where many or all groups are filtered out before
18751875
estimation.
18761876
1877-
For binary treatment (Phase 1 of dCDH), the formal Assumption 5
1878-
(no-crossing) of the dCDH paper is automatically satisfied for every
1879-
group. The "drop multi-switch groups" filter applied by R
1880-
``DIDmultiplegtDYN`` (and by the diff-diff dCDH estimator with
1881-
``drop_larger_lower=True``) is what removes groups that have more than
1882-
one switch — this matches the influence-function support of the
1883-
cohort-recentered variance formula in the dynamic companion paper
1884-
(Web Appendix Section 3.7.3).
1877+
The default ``pattern="single_switch"`` is **A5-safe by construction**:
1878+
every group has at most one transition, so no group can be a "crosser"
1879+
that switches in and back out. The dCDH estimator's
1880+
``drop_larger_lower=True`` filter (matching R ``DIDmultiplegtDYN``) is
1881+
a no-op on this pattern. Other patterns (``random``, ``cycles``,
1882+
``marketing``) ARE allowed to violate A5 and are useful primarily for
1883+
stress-testing the multi-switch drop filter — passing them through the
1884+
estimator with ``drop_larger_lower=True`` should drop a non-zero count
1885+
of crosser groups, which is the intended check. The cohort-recentered
1886+
variance formula in Web Appendix Section 3.7.3 of the dynamic
1887+
companion paper is derived under A5, which is why the drop filter is
1888+
on by default.
18851889
18861890
Examples
18871891
--------

tests/test_chaisemartin_dhaultfoeuille.py

Lines changed: 18 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -887,6 +887,24 @@ def test_to_dataframe_joiners_leavers(self, results):
887887
df = results.to_dataframe("joiners_leavers")
888888
assert len(df) == 3
889889
assert set(df["estimand"].tolist()) == {"DID_M", "DID_+", "DID_-"}
890+
# Round 4: n_cells and n_obs are separate columns with consistent
891+
# units across all rows. n_cells counts switching (g, t) cells,
892+
# n_obs sums raw observation counts over the same cells. The DID_M
893+
# row uses the union of joiner + leaver cells.
894+
assert "n_cells" in df.columns
895+
assert "n_obs" in df.columns
896+
# On balanced 1-obs-per-cell test data, n_cells == n_obs everywhere
897+
for _, row in df.iterrows():
898+
assert row["n_cells"] == row["n_obs"], (
899+
f"On balanced data n_cells should equal n_obs for row "
900+
f"{row['estimand']}, got n_cells={row['n_cells']}, "
901+
f"n_obs={row['n_obs']}"
902+
)
903+
# The DID_M row's count is the sum of the DID_+ and DID_- rows'
904+
did_m_row = df[df["estimand"] == "DID_M"].iloc[0]
905+
did_plus_row = df[df["estimand"] == "DID_+"].iloc[0]
906+
did_minus_row = df[df["estimand"] == "DID_-"].iloc[0]
907+
assert did_m_row["n_cells"] == did_plus_row["n_cells"] + did_minus_row["n_cells"]
890908

891909
def test_to_dataframe_per_period(self, results):
892910
df = results.to_dataframe("per_period")

0 commit comments

Comments
 (0)