Round 4: doc/contract cleanups (joiners_leavers DataFrame, stale docstrings)

igerber · claude · igerber · commit 787eae20a2aa · 2026-04-11T18:24:03.000-04:00
P2: split joiners_leavers DataFrame into n_cells + n_obs columns
- to_dataframe(level="joiners_leavers") previously had a single n_obs
  column with mixed semantics by row (DID_M used switcher cell count;
  DID_+/DID_- used raw observation counts). Two columns with consistent
  units across all rows: n_cells (count of switching (g, t) cells) and
  n_obs (sum of n_gt over the same cells). DID_M row uses union of
  joiner + leaver cells. Updated test_to_dataframe_joiners_leavers to
  pin the new contract.

P3: stale docstrings on results object
- DCDHBootstrapResults class docstring now states explicitly that
  placebo bootstrap fields ALWAYS remain None in Phase 1 (the previous
  wording said they were "populated when available"). Per-field
  docstrings for placebo_se / placebo_ci / placebo_p_value now point
  back to the class-level note.
- n_groups_dropped_never_switching docstring now reflects the Round 2
  full-IF fix: never-switching groups participate in the variance via
  stable-control roles and the field is reported for backwards
  compatibility only — no actual exclusion happens.
- n_groups_dropped_singleton_baseline docstring clarifies the
  variance-only filter scope (cell DataFrame retains them as
  period-based stable controls).

P3: misleading R-script + prep_dgp comments
- benchmarks/R/generate_dcdh_dynr_test_values.R: clarified that the
  Python and R generators mirror each other STRUCTURALLY (same pattern
  logic, same FE/effect/noise model), not at the RNG level. R's
  set.seed and NumPy's default_rng use different RNGs. Parity tests
  load the R script's golden-value JSON so both sides operate on
  byte-identical input regardless of how it was originally generated.
- prep_dgp.py generate_reversible_did_data: clarified that the default
  single_switch pattern is A5-safe by construction (every group has at
  most one transition). Other patterns (random/cycles/marketing) ARE
  allowed to violate A5 and exist primarily as stress tests for the
  drop_larger_lower=True filter. The cohort-recentered variance
  formula is derived under A5, which is why drop_larger_lower defaults
  to True.

Tests: 103 dCDH passing (no new tests; the existing
test_to_dataframe_joiners_leavers was strengthened to assert the new
n_cells / n_obs contract).

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/benchmarks/R/generate_dcdh_dynr_test_values.R b/benchmarks/R/generate_dcdh_dynr_test_values.R
@@ -35,9 +35,16 @@ output_path <- file.path("benchmarks", "data", "dcdh_dynr_golden_values.json")
 
 # ---------------------------------------------------------------------------
 # Helper: Python-mirror reversible-treatment generator.
-# Mirrors generate_reversible_did_data() in diff_diff/prep_dgp.py.
-# Both use np.random.default_rng(seed) / set.seed(seed) so the same seed
-# produces an identical treatment matrix and outcomes.
+# Mirrors generate_reversible_did_data() in diff_diff/prep_dgp.py at the
+# STRUCTURAL level — the two implementations apply the same pattern logic
+# (single_switch / joiners_only / leavers_only / mixed_single_switch) and
+# the same fixed-effect / treatment-effect / time-trend / noise model. They
+# do NOT produce bit-identical draws even with the same seed: R's set.seed
+# and NumPy's default_rng use different RNGs and the parity tests don't
+# rely on RNG identity. Instead, the parity tests load THIS R script's
+# golden-value JSON output and pass the SAME data (group/period/treatment/
+# outcome columns) to the Python estimator, so both sides operate on
+# byte-identical input regardless of how it was originally generated.
 # ---------------------------------------------------------------------------
 gen_reversible <- function(n_groups, n_periods, pattern, seed,
                            p_switch = 0.2, initial_treat_frac = 0.3,
diff --git a/diff_diff/chaisemartin_dhaultfoeuille_results.py b/diff_diff/chaisemartin_dhaultfoeuille_results.py
@@ -42,11 +42,19 @@ class DCDHBootstrapResults:
     Web Appendix Section 3.7.3 of the dynamic companion paper. Provided
     for consistency with CallawaySantAnna / ImputationDiD / TwoStageDiD.
 
-    Per-target SE / CI / p-value are populated for each scalar dCDH
-    estimand: overall (``DID_M``), joiners (``DID_+``), leavers
-    (``DID_-``), and the placebo (``DID_M^pl``). When a target is not
-    available in the underlying data (e.g., no leavers), the matching
-    fields are ``None``.
+    Per-target SE / CI / p-value are populated for the three scalar
+    dCDH estimands implemented in Phase 1: overall (``DID_M``), joiners
+    (``DID_+``), and leavers (``DID_-``). When a target is not available
+    in the underlying data (e.g., no leavers), the matching fields are
+    ``None``.
+
+    **Phase 1 placebo bootstrap is intentionally NOT computed.** The
+    dynamic companion paper Section 3.7.3 derives the cohort-recentered
+    analytical variance for ``DID_l`` only, not for the placebo
+    ``DID_M^pl``. The ``placebo_se`` / ``placebo_ci`` / ``placebo_p_value``
+    fields below ALWAYS remain ``None`` in Phase 1, even when
+    ``n_bootstrap > 0``. Phase 2 will add multiplier-bootstrap support
+    for the placebo via the dynamic paper's machinery.
 
     Attributes
     ----------
@@ -76,11 +84,12 @@ class DCDHBootstrapResults:
     leavers_p_value : float, optional
         Bootstrap p-value for leavers-only ``DID_-``.
     placebo_se : float, optional
-        Bootstrap SE for the placebo ``DID_M^pl`` (``None`` if T < 3).
+        **Always ``None`` in Phase 1** — placebo bootstrap is deferred
+        to Phase 2 (see class docstring above).
     placebo_ci : tuple of float, optional
-        Bootstrap CI for the placebo.
+        **Always ``None`` in Phase 1** (see class docstring above).
     placebo_p_value : float, optional
-        Bootstrap p-value for the placebo.
+        **Always ``None`` in Phase 1** (see class docstring above).
     bootstrap_distribution : np.ndarray, optional
         Full bootstrap distribution of the overall ``DID_M`` estimator
         (shape: ``(n_bootstrap,)``). Stored for advanced diagnostics;
@@ -232,12 +241,19 @@ class ChaisemartinDHaultfoeuilleResults:
         R's ``drop_larger_lower=TRUE`` behavior). ``0`` when
         ``drop_larger_lower=False`` or no crossers exist.
     n_groups_dropped_singleton_baseline : int
-        Number of groups dropped because their baseline ``D_{g,1}`` was
-        unique (footnote 15 of the dynamic paper).
+        Number of groups whose baseline ``D_{g,1}`` is unique in the
+        post-drop panel (footnote 15 of the dynamic paper). They are
+        excluded from the cohort-recentered VARIANCE computation only —
+        they remain in the point-estimate sample as period-based stable
+        controls (see REGISTRY.md ``ChaisemartinDHaultfoeuille`` for the
+        period-vs-cohort deviation that makes this distinction matter).
     n_groups_dropped_never_switching : int
-        Number of groups with ``S_g = 0`` (never switched). These are
-        excluded from the variance computation but may still contribute
-        to the point estimate via stable controls.
+        Number of groups with ``S_g = 0`` (never switched). **Reported
+        for backwards compatibility only.** Per the Round 2 full
+        influence-function fix, never-switching groups are NOT excluded
+        from the variance: they contribute via their stable-control
+        roles in the per-period IF formula. The field name retains
+        "dropped" for API stability but no actual exclusion happens.
     alpha : float
         Significance level used for confidence intervals.
     event_study_effects : dict, optional
@@ -643,6 +659,16 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
             )
 
         elif level == "joiners_leavers":
+            # Two separate count columns so each has consistent units
+            # across all rows:
+            #   n_cells: total switching cells (each (g, t) cell counted once)
+            #   n_obs:   actual observation count summed over the same cells
+            #            (equals n_cells on balanced 1-obs-per-cell panels;
+            #            larger on individual-level inputs with multiple
+            #            observations per cell).
+            # For the DID_M row, both quantities use the overall switching
+            # cell set: n_cells = sum of joiner + leaver cells, and n_obs
+            # is the same sum of raw observation counts.
             rows = [
                 {
                     "estimand": "DID_M",
@@ -652,7 +678,8 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
                     "p_value": self.overall_p_value,
                     "conf_int_lower": self.overall_conf_int[0],
                     "conf_int_upper": self.overall_conf_int[1],
-                    "n_obs": self.n_switcher_cells,
+                    "n_cells": self.n_switcher_cells,
+                    "n_obs": self.n_joiner_obs + self.n_leaver_obs,
                     "available": True,
                 },
                 {
@@ -663,6 +690,7 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
                     "p_value": self.joiners_p_value,
                     "conf_int_lower": self.joiners_conf_int[0],
                     "conf_int_upper": self.joiners_conf_int[1],
+                    "n_cells": self.n_joiner_cells,
                     "n_obs": self.n_joiner_obs,
                     "available": self.joiners_available,
                 },
@@ -674,6 +702,7 @@ def to_dataframe(self, level: str = "overall") -> pd.DataFrame:
                     "p_value": self.leavers_p_value,
                     "conf_int_lower": self.leavers_conf_int[0],
                     "conf_int_upper": self.leavers_conf_int[1],
+                    "n_cells": self.n_leaver_cells,
                     "n_obs": self.n_leaver_obs,
                     "available": self.leavers_available,
                 },
diff --git a/diff_diff/prep_dgp.py b/diff_diff/prep_dgp.py
@@ -1874,14 +1874,18 @@ def generate_reversible_did_data(
     will produce data where many or all groups are filtered out before
     estimation.
 
-    For binary treatment (Phase 1 of dCDH), the formal Assumption 5
-    (no-crossing) of the dCDH paper is automatically satisfied for every
-    group. The "drop multi-switch groups" filter applied by R
-    ``DIDmultiplegtDYN`` (and by the diff-diff dCDH estimator with
-    ``drop_larger_lower=True``) is what removes groups that have more than
-    one switch — this matches the influence-function support of the
-    cohort-recentered variance formula in the dynamic companion paper
-    (Web Appendix Section 3.7.3).
+    The default ``pattern="single_switch"`` is **A5-safe by construction**:
+    every group has at most one transition, so no group can be a "crosser"
+    that switches in and back out. The dCDH estimator's
+    ``drop_larger_lower=True`` filter (matching R ``DIDmultiplegtDYN``) is
+    a no-op on this pattern. Other patterns (``random``, ``cycles``,
+    ``marketing``) ARE allowed to violate A5 and are useful primarily for
+    stress-testing the multi-switch drop filter — passing them through the
+    estimator with ``drop_larger_lower=True`` should drop a non-zero count
+    of crosser groups, which is the intended check. The cohort-recentered
+    variance formula in Web Appendix Section 3.7.3 of the dynamic
+    companion paper is derived under A5, which is why the drop filter is
+    on by default.
 
     Examples
     --------
diff --git a/tests/test_chaisemartin_dhaultfoeuille.py b/tests/test_chaisemartin_dhaultfoeuille.py
@@ -887,6 +887,24 @@ def test_to_dataframe_joiners_leavers(self, results):
         df = results.to_dataframe("joiners_leavers")
         assert len(df) == 3
         assert set(df["estimand"].tolist()) == {"DID_M", "DID_+", "DID_-"}
+        # Round 4: n_cells and n_obs are separate columns with consistent
+        # units across all rows. n_cells counts switching (g, t) cells,
+        # n_obs sums raw observation counts over the same cells. The DID_M
+        # row uses the union of joiner + leaver cells.
+        assert "n_cells" in df.columns
+        assert "n_obs" in df.columns
+        # On balanced 1-obs-per-cell test data, n_cells == n_obs everywhere
+        for _, row in df.iterrows():
+            assert row["n_cells"] == row["n_obs"], (
+                f"On balanced data n_cells should equal n_obs for row "
+                f"{row['estimand']}, got n_cells={row['n_cells']}, "
+                f"n_obs={row['n_obs']}"
+            )
+        # The DID_M row's count is the sum of the DID_+ and DID_- rows'
+        did_m_row = df[df["estimand"] == "DID_M"].iloc[0]
+        did_plus_row = df[df["estimand"] == "DID_+"].iloc[0]
+        did_minus_row = df[df["estimand"] == "DID_-"].iloc[0]
+        assert did_m_row["n_cells"] == did_plus_row["n_cells"] + did_minus_row["n_cells"]
 
     def test_to_dataframe_per_period(self, results):
         df = results.to_dataframe("per_period")