Address CI R8 codex review (1 P1 + 1 P3) on PreTrendsPower PR-B

igerber · claude · igerber · commit cfb3200d1b4d · 2026-05-18T21:15:02.000-04:00
R8 CI codex caught a P1 my local R7 reviewer missed — exactly the
`feedback_local_codex_vs_ci_codex_divergence.md` pattern.

**P1 — MPD non-numeric labels silently fell back to count-based,
undocumented as a deviation in REGISTRY**

R3's MPD branch returned `relative_times=None` for non-numeric
`reference_period` values (string period IDs, etc.), silently using
the legacy count-based normalized direction — but the REGISTRY note
described the γ-unit deviation as "resolved" without qualifying that
exception. Two-part fix:

1. **Better coercion** for datetime-like labels: new module-level
   helper `_coerce_relative_times_from_reference` (`pretrends.py:92`)
   handles three regimes:
   - Numeric (`int` / `float` / `np.int64`) — direct `float()`
   - `pandas.Period` / `Timestamp` / `np.datetime64` — subtraction-
     based offset arithmetic (`.n` for Period, `.days` for Timedelta,
     fall through to `/ np.timedelta64(1, 'D')`)
   - Genuinely non-numeric (string period IDs, unranked categoricals)
     — emits an explicit `UserWarning` documenting that the reported
     MDV is NOT in Roth's γ units under this fallback, and recommends
     re-fitting with numeric labels.

2. **Documentation alignment**: REGISTRY `## PreTrendsPower`
   convention note and METHODOLOGY_REVIEW.md `## PreTrendsPower`
   Verified Components checklist both enumerate the supported label
   types (numeric + pandas.Period + Timestamp + datetime64) and
   explicitly call out the non-numeric warn-and-fallback behavior as
   a documented edge case (not a "resolved" deviation).

**P3 — `docs/api/pretrends.rst` still referenced removed `custom_delta`
parameter name**

The custom-violation entry in the violation-types section used the
parameter name `custom_delta`, but the actual API exposes
`violation_weights` (both on `PreTrendsPower` and on the helper
functions per PR-B Step 6). Fix: rename in docs and add a one-line
note that both the class and the helpers accept the kwarg.

**Tests** (`tests/test_methodology_pretrends.py::TestPretrendsLinearGrid`):

- `test_mpd_non_numeric_reference_falls_back_to_legacy_weights`
  renamed to `..._warns_and_falls_back...` and now asserts the
  explicit `UserWarning` is emitted (mentioning "γ units").
- NEW `test_mpd_pandas_period_reference_yields_numeric_relative_times`:
  constructs a `MultiPeriodDiDResults` with `pd.Period('2019Q1..Q3')`
  pre-periods and `reference_period=pd.Period('2019Q4')`, asserts the
  derived `relative_times == [-3, -2, -1]` (quarters) and linear
  weights = `[3, 2, 1]` in γ units. Locks the Period-arithmetic path
  the codex specifically flagged.

The P3 R-parity-script placeholder is deferred to PR-C per the
existing TODO row (codex labeled it informational / non-blocker).

Tests: 403 pass across pretrends + DR + BR. 4 skipped (R-parity
stubs + 1 fixture skip). No regressions.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/METHODOLOGY_REVIEW.md b/METHODOLOGY_REVIEW.md
@@ -1063,7 +1063,7 @@ and covariate-adjusted specifications.)
 - [x] Non-bootstrap CS adapter consumes full `event_study_vcov` sub-block (not diag)
 - [x] Non-bootstrap SA adapter consumes full `event_study_vcov` sub-block (W-matrix construction `event_study_vcov = W @ vcov_cohort @ W.T` added to `SunAbrahamResults`)
 - [x] Bootstrap CS/SA and replicate-weight survey paths fall through to `diag(ses^2)` (analytical VCV cleared to prevent mixing with bootstrap/replicate SE overrides)
-- [x] `_get_violation_weights('linear')` honors actual pre-period relative-time labels via `fit()` threading → reported MDV is in Roth's γ units on irregular and anticipation-shifted grids
+- [x] `_get_violation_weights('linear')` honors actual pre-period relative-time labels via `fit()` threading → reported MDV is in Roth's γ units on irregular and anticipation-shifted grids. For `MultiPeriodDiDResults`, supported label types are numeric (`int` / `float` / `np.int64`) and `pandas.Period` / `pandas.Timestamp` / `np.datetime64`; **genuinely non-numeric labels** (string period IDs, unranked categoricals) emit an explicit `UserWarning` and fall through to the legacy count-based normalized direction (MDV is NOT in γ units in that case — re-fit with numeric labels)
 - [x] `PreTrendsPowerResults` persists fitted `violation_weights` + `pretest_form` + `nis_box_probability`; `power_at(M)` works for all four violation types on fresh fits
 - [x] Helper API (`compute_pretrends_power`, `compute_mdv`) accepts `violation_weights` and `pretest_form`; closes the PR-A R18 helper/class API gap
 - [x] Summary, `to_dict`, `to_dataframe` dispatch on `pretest_form` (NIS prints box probability; Wald prints noncentrality)
diff --git a/diff_diff/pretrends.py b/diff_diff/pretrends.py
@@ -25,6 +25,7 @@
 diff_diff.honest_did - Sensitivity analysis for parallel trends violations
 """
 
+import warnings
 from dataclasses import dataclass, field
 from typing import Any, Dict, List, Literal, Optional, Tuple, Union
 
@@ -88,6 +89,89 @@ def _compute_nis_acceptance_prob(
     return float(np.clip(accept_prob, 0.0, 1.0))
 
 
+def _coerce_relative_times_from_reference(
+    estimated_pre_periods: List[Any],
+    reference_period: Any,
+) -> Optional[np.ndarray]:
+    """
+    Convert ``estimated_pre_periods`` to Roth-style relative-time offsets
+    from a numeric / Period / datetime ``reference_period``.
+
+    Returns ``np.ndarray`` of float relative times when conversion succeeds,
+    or ``None`` when the labels are genuinely non-numeric / unordered
+    (string period IDs, categoricals, etc.). In the ``None`` case, the
+    caller's downstream linear-violation weight construction falls back to
+    the legacy count-based normalized direction — the reported MDV is then
+    NOT in Roth's γ units. We emit a ``UserWarning`` so the user knows
+    the γ-unit contract did not hold and can re-fit with numeric labels.
+
+    Supported regimes:
+
+    - Numeric (``int`` / ``float`` / ``np.int64``): direct ``float()``
+      coercion gives the correct relative offset.
+    - ``pandas.Period`` / ``pandas.Timestamp`` / ``np.datetime64``: period
+      arithmetic returns an offset / ``Timedelta`` that we coerce to a
+      float via ``.n`` (for Period frequencies) or ``.days`` (for
+      Timedelta-like). The result is in units of the reference's
+      frequency for Period, days for Timestamp / datetime64 — the linear
+      γ-units scale is per-unit-of-frequency.
+    - Anything else (string period IDs, categoricals with no ordering,
+      mixed types): returns ``None`` with a warning.
+    """
+    # Path 1: direct float coercion (numeric scalars).
+    try:
+        ref_float = float(reference_period)
+        return np.asarray(
+            [float(p) - ref_float for p in estimated_pre_periods],
+            dtype=float,
+        )
+    except (TypeError, ValueError):
+        pass
+
+    # Path 2: pandas.Period / pandas.Timestamp / datetime64 — try
+    # subtraction-based offset arithmetic.
+    try:
+        diffs = [p - reference_period for p in estimated_pre_periods]
+        floats: List[float] = []
+        for d in diffs:
+            # pandas.tseries.offsets.* or pandas.Period offset — has `.n`.
+            n_attr = getattr(d, "n", None)
+            if n_attr is not None:
+                floats.append(float(n_attr))
+                continue
+            # pandas.Timedelta / numpy.timedelta64 — convert to days.
+            days_attr = getattr(d, "days", None)
+            if days_attr is not None:
+                floats.append(float(days_attr))
+                continue
+            # Bare numpy.timedelta64 fallback.
+            try:
+                floats.append(float(d / np.timedelta64(1, "D")))
+                continue
+            except (TypeError, ValueError):
+                raise TypeError(
+                    f"cannot coerce difference {d!r} of type {type(d).__name__} "
+                    "to float days/periods"
+                )
+        return np.asarray(floats, dtype=float)
+    except (TypeError, ValueError):
+        pass
+
+    # Path 3: genuinely non-numeric labels — warn and fall back to legacy.
+    warnings.warn(
+        f"PreTrendsPower: reference_period {reference_period!r} (type "
+        f"{type(reference_period).__name__}) is not numeric or datetime-like, "
+        "so per-period relative times cannot be derived. Linear-violation "
+        "weights will use the legacy count-based [n_pre-1, ..., 0]/||·||_2 "
+        "direction; the reported MDV is NOT in Roth (2022) γ units. Re-fit "
+        "with numeric period labels (int year, pandas.Period, datetime) to "
+        "obtain γ-unit MDV.",
+        UserWarning,
+        stacklevel=3,
+    )
+    return None
+
+
 def _extract_event_study_vcov_subblock(
     results: Any,
     pre_periods: List[int],
@@ -914,27 +998,27 @@ def _extract_pre_period_params(
             # For MultiPeriodDiDResults, period identifiers are generic
             # (often calendar years, sometimes pre-shifted relative times).
             # Roth's δ_t = γ·t convention needs RELATIVE offsets from the
-            # treatment / reference period. Derive them from
-            # `results.reference_period` when numeric:
-            #   relative_times = estimated_pre_periods - reference_period
-            # If `reference_period` is None or non-numeric (string, categorical),
-            # return None so `_get_violation_weights('linear')` falls back to
-            # the legacy count-based [n_pre-1, ..., 0] / ||·||_2 direction
-            # (the pre-PR-B shipped behavior; preserves backwards-compat for
-            # MPD callers that don't expose a numeric reference period).
+            # treatment / reference period. Three label-type regimes:
+            #
+            #   1. Numeric (int / float / np.int64) — direct float() coercion
+            #      gives the correct relative offset.
+            #   2. pandas.Period — period arithmetic works on the Period
+            #      object directly (``p - ref`` returns ordinal-difference);
+            #      we cast via the `n` attribute on the resulting offset for
+            #      sub-period frequencies. Datetime-like labels (Timestamp,
+            #      np.datetime64) are caught the same way and converted to
+            #      days via numpy timedelta semantics.
+            #   3. Genuinely non-numeric / unordered labels (string period
+            #      IDs, categoricals without a ranking) — emit an explicit
+            #      UserWarning and fall back to the legacy count-based
+            #      [n_pre-1, ..., 0] / ||·||_2 normalized direction. The
+            #      reported MDV under this fallback is NOT in Roth's γ
+            #      units; users on non-numeric labels who need γ-unit MDV
+            #      should re-fit with numeric period labels.
             ref = getattr(results, "reference_period", None)
             relative_times: Optional[np.ndarray] = None
             if ref is not None:
-                try:
-                    ref_float = float(ref)
-                    relative_times = np.asarray(
-                        [float(p) - ref_float for p in estimated_pre_periods],
-                        dtype=float,
-                    )
-                except (TypeError, ValueError):
-                    # Non-numeric labels (string period IDs, etc.) — fall
-                    # back to legacy normalized linear direction.
-                    relative_times = None
+                relative_times = _coerce_relative_times_from_reference(estimated_pre_periods, ref)
             return effects, ses, vcov, n_pre, relative_times, covariance_source
 
         # Try CallawaySantAnnaResults
diff --git a/docs/api/pretrends.rst b/docs/api/pretrends.rst
@@ -133,7 +133,9 @@ The module supports several types of pre-trends violations:
    ``delta[-1] = M``, all other pre-periods are zero.
 
 **custom**
-   User-specified violation pattern via the ``custom_delta`` parameter.
+   User-specified violation pattern via the ``violation_weights`` parameter.
+   Accepted by both ``PreTrendsPower`` (constructor kwarg) and the convenience
+   helpers ``compute_pretrends_power`` / ``compute_mdv`` (forwarded kwarg).
 
 Complete Example
 ----------------
diff --git a/docs/methodology/REGISTRY.md b/docs/methodology/REGISTRY.md
@@ -2817,7 +2817,7 @@ Violation types:
 
 - **Note (paper-supported alternative — Wald pretest form):** the library retains the Wald noncentral-χ² form as `pretest_form='wald'`. NIS is the paper's primary analysis convention (used for all 12 surveyed papers' empirical exercises in Section I), but the Wald form is also a paper-supported alternative: Roth's Propositions 1, 3, and 4 apply to any (measurable) acceptance region for the conditional moments (Props 1+3) and to any convex acceptance region for the variance-reduction guarantee (Prop 4). The Wald ellipsoid is convex, so all four propositions apply. Wald is faster (no MVN CDF call) and matches the pre-PR-B shipped numerical baseline. Use Wald for backwards-compat / speed; use NIS for canonical paper alignment and R `pretrends` parity.
 
-- **Note (convention — `linear` violation pattern, γ-unit MDV):** `_get_violation_weights('linear')` consumes actual pre-period relative-time labels threaded through `fit()` (PR-B 2026-05-17 resolution of the PR-A linear-pattern deviation). When `relative_times` is provided (e.g., `[-3, -2, -1]` for a regular grid or `[-5, -3, -1]` for an irregular grid), weights = `|t|` directly with NO L2 normalization, so `δ_pre = M · |t|` reflects Roth's `δ_t = γ · t` convention and the reported MDV equals γ. Callers that bypass `fit()` and supply only `n_pre` retain the previous count-based, L2-normalized `[n_pre-1, ..., 0]` direction (preserves shipped Wald numerical baselines for unit tests).
+- **Note (convention — `linear` violation pattern, γ-unit MDV):** `_get_violation_weights('linear')` consumes actual pre-period relative-time labels threaded through `fit()` (PR-B 2026-05-17 resolution of the PR-A linear-pattern deviation). When `relative_times` is provided (e.g., `[-3, -2, -1]` for a regular grid or `[-5, -3, -1]` for an irregular grid), weights = `|t|` directly with NO L2 normalization, so `δ_pre = M · |t|` reflects Roth's `δ_t = γ · t` convention and the reported MDV equals γ. Callers that bypass `fit()` and supply only `n_pre` retain the previous count-based, L2-normalized `[n_pre-1, ..., 0]` direction (preserves shipped Wald numerical baselines for unit tests). **MPD period-label coverage:** for `MultiPeriodDiDResults`, the relative-time derivation in `_extract_pre_period_params` supports numeric labels (`int` / `float` / `np.int64`) and `pandas.Period` / `pandas.Timestamp` / `np.datetime64` (via Period or Timedelta arithmetic with units of frequency / days respectively). For genuinely non-numeric or unordered labels (string period IDs, unranked categoricals), the helper emits an explicit `UserWarning` and falls back to the legacy count-based normalized direction — the reported MDV is then NOT in Roth's γ units. Users on string period IDs who need γ-unit MDV should re-fit with numeric labels.
 
 *Standard errors:*
 - Power calculations are exact (no sampling variability — power is computed against a hypothesized population trend, not estimated)
diff --git a/tests/test_methodology_pretrends.py b/tests/test_methodology_pretrends.py
@@ -526,15 +526,20 @@ def test_mpd_calendar_period_ids_derive_relative_times_from_reference(self):
         weights = pt._get_violation_weights(n_pre, relative_times=relative_times)
         np.testing.assert_allclose(weights, [4.0, 3.0, 2.0, 1.0])
 
-    def test_mpd_non_numeric_reference_falls_back_to_legacy_weights(self):
-        """MPD with non-numeric reference_period falls back to legacy direction.
+    def test_mpd_non_numeric_reference_warns_and_falls_back_to_legacy_weights(self):
+        """MPD with non-numeric reference_period warns + falls back to legacy.
 
-        When ``reference_period`` is a string / categorical (e.g., "2019Q4"),
-        the MPD branch returns ``relative_times=None`` so
+        When ``reference_period`` is a genuinely non-numeric / non-datetime
+        label (e.g., the string "REF_STRING"), the MPD branch emits an
+        explicit ``UserWarning`` and returns ``relative_times=None`` so
         ``_get_violation_weights('linear')`` uses the legacy count-based
-        direction. Preserves backwards-compat for MPD callers that don't
-        expose a numeric reference period.
+        direction. The warning surfaces the contract that the reported
+        MDV is NOT in Roth's γ units under this fallback (R8 CI codex
+        fix: was previously a silent fallback, undocumented as a
+        deviation in REGISTRY).
         """
+        import warnings as _warnings
+
         from diff_diff.results import MultiPeriodDiDResults, PeriodEffect
 
         period_ids = ["A", "B", "C"]
@@ -556,12 +561,72 @@ def test_mpd_non_numeric_reference_falls_back_to_legacy_weights(self):
             n_control=50,
             pre_periods=period_ids,
             post_periods=["D", "E"],
-            reference_period="REF_STRING",  # non-numeric
+            reference_period="REF_STRING",  # non-numeric, non-datetime
         )
 
         pt = PreTrendsPower(pretest_form="nis", violation_type="linear")
-        _, _, _, _, relative_times, _ = pt._extract_pre_period_params(mpd_results)
+        with _warnings.catch_warnings(record=True) as caught:
+            _warnings.simplefilter("always")
+            _, _, _, _, relative_times, _ = pt._extract_pre_period_params(mpd_results)
+
         assert relative_times is None, "Non-numeric reference should yield None"
+        nis_warns = [
+            w
+            for w in caught
+            if "reference_period" in str(w.message) and "γ units" in str(w.message)
+        ]
+        assert len(nis_warns) >= 1, (
+            "Non-numeric reference_period must emit an explicit UserWarning "
+            f"noting the γ-unit contract is not held; got warnings: {[str(w.message) for w in caught]}"
+        )
+
+    def test_mpd_pandas_period_reference_yields_numeric_relative_times(self):
+        """MPD with pandas.Period reference_period produces γ-unit weights.
+
+        Quarterly-Period labels ``[2019Q1, 2019Q2, 2019Q3]`` with
+        ``reference_period=2019Q4`` produce relative offsets in units of
+        quarters: ``[-3, -2, -1]``. Validates the R8 CI codex fix that
+        datetime-like labels are NOT silently fall-through cases — Period
+        / Timestamp arithmetic supplies the γ-unit relative times the
+        legacy fallback would have lost.
+        """
+        from diff_diff.results import MultiPeriodDiDResults, PeriodEffect
+
+        periods = [pd.Period(f"2019Q{q}", freq="Q") for q in (1, 2, 3)]
+        reference_period = pd.Period("2019Q4", freq="Q")
+        period_effects = {
+            p: PeriodEffect(
+                period=p, effect=0.1, se=0.2, t_stat=0.0, p_value=0.5, conf_int=(0.0, 0.0)
+            )
+            for p in periods
+        }
+        mpd_results = MultiPeriodDiDResults(
+            period_effects=period_effects,
+            avg_att=0.0,
+            avg_se=0.2,
+            avg_t_stat=0.0,
+            avg_p_value=0.5,
+            avg_conf_int=(0.0, 0.0),
+            n_obs=100,
+            n_treated=50,
+            n_control=50,
+            pre_periods=periods,
+            post_periods=[pd.Period(f"2020Q{q}", freq="Q") for q in (1, 2)],
+            reference_period=reference_period,
+        )
+
+        pt = PreTrendsPower(pretest_form="nis", violation_type="linear")
+        _, _, _, n_pre, relative_times, _ = pt._extract_pre_period_params(mpd_results)
+
+        # Period subtraction yields a Period offset whose `.n` is the
+        # number-of-frequencies difference; signs matter and pre-periods
+        # are NEGATIVE offsets from the reference.
+        assert relative_times is not None
+        np.testing.assert_allclose(relative_times, [-3.0, -2.0, -1.0])
+
+        # Plumbed through to linear weights: |t| = [3, 2, 1] in γ units.
+        weights = pt._get_violation_weights(n_pre, relative_times=relative_times)
+        np.testing.assert_allclose(weights, [3.0, 2.0, 1.0])
 
     def test_backwards_compat_no_relative_times_uses_legacy_normalized(self):
         """Without relative_times: legacy [n-1, ..., 0]/||·||_2 direction.