Thread vcov_type through MultiPeriodDiD and TwoWayFixedEffects

igerber · claude · igerber · commit d907eca4b034 · 2026-04-18T21:06:47.000-04:00
CI review caught that Phase 1a wired vcov_type into DifferenceInDifferences
__init__/get_params but not into the overridden fit() paths on
MultiPeriodDiD and TwoWayFixedEffects, so `vcov_type="hc2_bm"` on either
silently produced HC1 inference. Summary output also mislabeled
wild-bootstrap inference with the analytical variance family.

- diff_diff/estimators.py MultiPeriodDiD.fit: pass vcov_type=self.vcov_type
  into the analytical solve_ols call; remove the `not self.robust`
  homoskedastic fallback (subsumed by compute_robust_vcov's classical
  branch). When vcov_type="hc2_bm" and no survey design, compute
  Bell-McCaffrey Satterthwaite DOF via _compute_bm_dof_from_contrasts for
  both per-coefficient period effects AND the post-period-average
  contrast; fall back to the shared analytical df otherwise. Store
  vcov_type and cluster_name on MultiPeriodDiDResults.
- diff_diff/twfe.py: forward self.robust and self.vcov_type into the two
  LinearRegression instantiations; store vcov_type and the TWFE auto-
  cluster label (or explicit self.cluster) on DiDResults.
- diff_diff/linalg.py: split _compute_bm_dof_oneway into a contrast-aware
  helper _compute_bm_dof_from_contrasts(X, bread, h_diag, contrasts) so
  MultiPeriodDiD can request BM DOF for the avg_att linear combination.
  The per-coefficient wrapper now delegates to the shared helper with
  contrasts=I_k.
- diff_diff/results.py DiDResults.summary and MultiPeriodDiDResults:
  gate the Variance family label on inference_method == "analytical" so
  wild-bootstrap output is no longer mislabeled; add vcov_type,
  cluster_name, inference_method, n_bootstrap, n_clusters fields to
  MultiPeriodDiDResults for symmetry with DiDResults and to drive the
  summary label.
- tests/test_estimators_vcov_type.py: add five end-to-end tests exercising
  the previously-untested paths - MultiPeriodDiD classical vs hc1 SE
  differ; MultiPeriodDiD hc2_bm CI is finite; TWFE hc1 vs hc2_bm SE differ
  (CR1 vs CR2); TWFE records the unit auto-cluster label in summary;
  wild-bootstrap with cluster suppresses the Variance line.

All 209 Phase 1a suites plus 145 estimator regression tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/diff_diff/estimators.py b/diff_diff/estimators.py
@@ -1303,6 +1303,7 @@ def fit(  # type: ignore[override]
             rank_deficient_action=self.rank_deficient_action,
             weights=survey_weights,
             weight_type=survey_weight_type,
+            vcov_type=self.vcov_type,
         )
 
         # Compute survey vcov if applicable
@@ -1423,25 +1424,70 @@ def _refit_mp_absorb(w_r):
             )
             df = None
 
-        # For non-robust, non-clustered case, we need homoskedastic vcov
-        # solve_ols returns HC1 by default, so compute homoskedastic if needed
-        if not self.robust and self.cluster is None and survey_weights is None:
-            n = len(y)
-            mse = np.sum(residuals**2) / (n - k_effective)
-            # Use solve() instead of inv() for numerical stability
-            # Only compute for identified columns (non-NaN coefficients)
-            identified_mask = ~np.isnan(coefficients)
-            if np.all(identified_mask):
-                vcov = np.linalg.solve(X.T @ X, mse * np.eye(X.shape[1]))
-            else:
-                # For rank-deficient case, compute vcov on reduced matrix then expand
-                X_reduced = X[:, identified_mask]
-                vcov_reduced = np.linalg.solve(
-                    X_reduced.T @ X_reduced, mse * np.eye(X_reduced.shape[1])
+        # Note: the prior homoskedastic-vcov fallback conditioned on
+        # `not self.robust` has been subsumed by the vcov_type dispatch in
+        # solve_ols above, which routes vcov_type="classical" through
+        # compute_robust_vcov's classical branch (identical math). The
+        # explicit branch is no longer needed; vcov above already matches the
+        # requested variance family.
+
+        # For hc2_bm with a non-survey fit, compute per-coefficient and
+        # per-contrast Bell-McCaffrey Satterthwaite DOF so period-specific
+        # effects and the post-period average use correct small-sample DOF
+        # rather than the shared n-k fallback.
+        _bm_dof_per_coef: Optional[np.ndarray] = None
+        _bm_dof_avg: Optional[float] = None
+        if (
+            self.vcov_type == "hc2_bm"
+            and not _use_survey_vcov
+            and vcov is not None
+            and not np.all(np.isnan(coefficients))
+        ):
+            from diff_diff.linalg import (
+                _compute_bm_dof_from_contrasts,
+                _compute_hat_diagonals,
+            )
+
+            _identified = ~np.isnan(coefficients)
+            _kept = np.where(_identified)[0]
+            if len(_kept) > 0:
+                X_kept = X[:, _kept]
+                bread_kept = X_kept.T @ (
+                    X_kept * survey_weights[:, np.newaxis]
+                    if survey_weights is not None
+                    else X_kept
+                )
+                h_diag_kept = _compute_hat_diagonals(
+                    X_kept, bread_kept, weights=survey_weights
+                )
+                # Build the contrast matrix: one column per identified coefficient
+                # plus one column for the post-period average contrast (1/n_post
+                # on each post-period interaction column, 0 elsewhere).
+                n_kept = len(_kept)
+                # Post-period contrast in full-width k dims, then subset to kept
+                post_contrast_full = np.zeros(X.shape[1])
+                _n_post = len(post_periods)
+                if _n_post > 0:
+                    for _p in post_periods:
+                        post_contrast_full[interaction_indices[_p]] = 1.0 / _n_post
+                post_contrast_kept = post_contrast_full[_kept]
+                contrasts = np.column_stack(
+                    [np.eye(n_kept), post_contrast_kept[:, np.newaxis]]
                 )
-                # Expand to full size with NaN for dropped columns
-                vcov = np.full((X.shape[1], X.shape[1]), np.nan)
-                vcov[np.ix_(identified_mask, identified_mask)] = vcov_reduced
+                _dof_all = _compute_bm_dof_from_contrasts(
+                    X_kept,
+                    bread_kept,
+                    h_diag_kept,
+                    contrasts,
+                    weights=survey_weights,
+                )
+                # Expand per-coefficient DOF back to full width (NaN for dropped).
+                _bm_dof_per_coef = np.full(X.shape[1], np.nan)
+                _bm_dof_per_coef[_kept] = _dof_all[:n_kept]
+                # Post-period average: last contrast column.
+                # Only meaningful if all post-period coefs are identified.
+                if np.all(_identified[[interaction_indices[p] for p in post_periods]]):
+                    _bm_dof_avg = float(_dof_all[-1])
 
         # Extract period-specific treatment effects for ALL non-reference periods
         period_effects = {}
@@ -1453,7 +1499,14 @@ def _refit_mp_absorb(w_r):
             idx = interaction_indices[period]
             effect = coefficients[idx]
             se = np.sqrt(vcov[idx, idx])
-            t_stat, p_value, conf_int = safe_inference(effect, se, alpha=self.alpha, df=df)
+            # Prefer per-coefficient BM DOF when available (hc2_bm path);
+            # otherwise fall back to the shared analytical df.
+            period_df = df
+            if _bm_dof_per_coef is not None and np.isfinite(_bm_dof_per_coef[idx]):
+                period_df = float(_bm_dof_per_coef[idx])
+            t_stat, p_value, conf_int = safe_inference(
+                effect, se, alpha=self.alpha, df=period_df
+            )
 
             period_effects[period] = PeriodEffect(
                 period=period,
@@ -1497,8 +1550,11 @@ def _refit_mp_absorb(w_r):
                 avg_conf_int = (np.nan, np.nan)
             else:
                 avg_se = float(np.sqrt(avg_var))
+                # Prefer the contrast-specific BM DOF for the post-period average
+                # when hc2_bm is in use; otherwise fall back to the shared df.
+                _avg_df = _bm_dof_avg if _bm_dof_avg is not None else df
                 avg_t_stat, avg_p_value, avg_conf_int = safe_inference(
-                    avg_att, avg_se, alpha=self.alpha, df=df
+                    avg_att, avg_se, alpha=self.alpha, df=_avg_df
                 )
 
         # Count observations (use raw counts to avoid demeaned values from absorb)
@@ -1530,6 +1586,13 @@ def _refit_mp_absorb(w_r):
             reference_period=reference_period,
             interaction_indices=interaction_indices,
             survey_metadata=survey_metadata,
+            vcov_type=self.vcov_type,
+            cluster_name=self.cluster,
+            n_clusters=(
+                len(np.unique(effective_cluster_ids))
+                if effective_cluster_ids is not None
+                else None
+            ),
         )
 
         self._coefficients = coefficients
diff --git a/diff_diff/linalg.py b/diff_diff/linalg.py
@@ -1341,69 +1341,96 @@ def _compute_cr2_bm(
     return vcov, dof_vec
 
 
-def _compute_bm_dof_oneway(
+def _compute_bm_dof_from_contrasts(
     X: np.ndarray,
     bread_matrix: np.ndarray,
     h_diag: np.ndarray,
+    contrasts: np.ndarray,
     weights: Optional[np.ndarray] = None,
 ) -> np.ndarray:
-    """Per-coefficient Bell-McCaffrey (Imbens-Kolesar 2016) DOF vector.
+    """Per-contrast Bell-McCaffrey (Imbens-Kolesar 2016) Satterthwaite DOF.
 
-    For contrast ``c_j = e_j`` (the j-th standard basis vector), define
-    ``q_j = X (X'WX)^{-1} c_j`` (length ``n``). Under a homoskedastic null,
-    the HC2 variance estimator for ``c_j' beta`` has a weighted-chi-squared
+    For each column ``c`` of ``contrasts`` (shape ``(k, m)``), define
+    ``q = X (X'WX)^{-1} c`` (length ``n``). Under a homoskedastic null, the
+    HC2 variance estimator for ``c' beta`` has a weighted-chi-squared
     distribution; matching mean and variance via Satterthwaite gives
 
-        DOF_j = (sum_i q_j(i)^2)^2 / sum_{i,k} a_j(i) a_j(k) M_{ik}^2
+        DOF(c) = (sum_i q(i)^2)^2 / sum_{i, k} a(i) a(k) M_{ik}^2
+
+    where ``M = I - H`` and ``a(i) = q(i)^2 / (1 - h_ii)``. Using the idempotent
+    identity ``M^2 = M``, ``trace(B) = sum_i q(i)^2`` matches the numerator.
+
+    Allocates an ``(n, n)`` temporary for ``M`` so the cost is ``O(n^2 k)`` for
+    the hat build plus ``O(n^2 m)`` for the per-contrast sums. Practical for
+    ``n < 10_000``; larger designs should switch to a scores-based formulation
+    (tracked in TODO.md).
 
-    where ``M = I - H`` and ``a_j(i) = q_j(i)^2 / (1 - h_ii)``. Using the
-    identity ``M^2 = M`` (M is idempotent), ``trace(B) = sum_i q_j(i)^2``
-    which matches the numerator.
+    Parameters
+    ----------
+    X : ndarray of shape (n, k)
+    bread_matrix : ndarray of shape (k, k) == (X'WX) or (X'X)
+    h_diag : ndarray of shape (n,), hat-matrix diagonals (already weighted)
+    contrasts : ndarray of shape (k, m). Pass ``np.eye(k)`` for per-coefficient DOF.
+    weights : optional weights (shape ``(n,)``) used to build the weighted hat
+        matrix. When ``None``, unweighted.
 
-    Allocates an ``(n, n)`` temporary for the sum and so is ``O(n^2 k)``.
-    Practical for ``n < 10_000``; larger designs should switch to a
-    scores-based formulation (tracked in TODO.md).
+    Returns
+    -------
+    ndarray of shape (m,) of Satterthwaite DOF per contrast column. NaN when
+    ``den <= 0`` (degenerate case).
     """
     n, k = X.shape
-    # q_cols[:, j] = X (bread_inv e_j) is column j of X bread_inv^T. Since
-    # bread_matrix is symmetric, bread_inv^T = bread_inv, so q_cols = X bread_inv.
+    if contrasts.ndim != 2 or contrasts.shape[0] != k:
+        raise ValueError(
+            f"contrasts must have shape (k={k}, m); got {contrasts.shape}"
+        )
     try:
-        q_cols = np.linalg.solve(bread_matrix, np.eye(k))  # (k, k), bread^{-1}
+        bread_inv_c = np.linalg.solve(bread_matrix, contrasts)
     except np.linalg.LinAlgError as e:
         if "Singular" in str(e):
             raise ValueError(
                 "Design matrix is rank-deficient (singular X'X matrix). "
                 "Cannot compute Bell-McCaffrey DOF."
             ) from e
         raise
-    # q_ij = X @ bread_inv has shape (n, k)
-    q = X @ q_cols
-    # M = I - H where H = X (X'WX)^{-1} X' (or its weighted analogue). For DOF,
-    # the relevant M is the residual-maker under the same weighting used for the
-    # hat diagonals, so H_ij = w_j * x_i' (X'WX)^{-1} x_j when weights are
-    # present. Build H explicitly (O(n^2 k) memory/time).
+    # q has shape (n, m); column j is X @ (bread_inv @ contrasts[:, j]).
+    q = X @ bread_inv_c
+    # Build the weighted residual-maker M = I - H once.
     if weights is not None:
         H = X @ np.linalg.solve(bread_matrix, (X * weights[:, np.newaxis]).T)
     else:
         H = X @ np.linalg.solve(bread_matrix, X.T)
     M = np.eye(n) - H
-    M_sq = M * M  # elementwise square; also equal to M*M^T when M is symmetric
-
-    # Guard 1 - h_ii away from zero so `a` stays finite. The calling function
-    # has already warned/fallback-handled the h_ii > 1 case; this is a
-    # float-stability belt-and-suspenders.
+    M_sq = M * M  # elementwise square
     one_minus_h = np.maximum(1.0 - h_diag, 1e-10)
-    dof = np.empty(k)
-    for j in range(k):
-        qj = q[:, j]
-        qj_sq = qj * qj
+    m = contrasts.shape[1]
+    dof = np.empty(m)
+    for j in range(m):
+        qj_sq = q[:, j] * q[:, j]
         num = qj_sq.sum() ** 2
         a_j = qj_sq / one_minus_h
         den = float(a_j @ M_sq @ a_j)
         dof[j] = num / den if den > 0 else np.nan
     return dof
 
 
+def _compute_bm_dof_oneway(
+    X: np.ndarray,
+    bread_matrix: np.ndarray,
+    h_diag: np.ndarray,
+    weights: Optional[np.ndarray] = None,
+) -> np.ndarray:
+    """Per-coefficient Bell-McCaffrey DOF vector (Imbens-Kolesar 2016).
+
+    Thin wrapper over :func:`_compute_bm_dof_from_contrasts` with
+    ``contrasts = I_k``, so each column picks out one coefficient.
+    """
+    k = X.shape[1]
+    return _compute_bm_dof_from_contrasts(
+        X, bread_matrix, h_diag, np.eye(k), weights=weights
+    )
+
+
 def _compute_robust_vcov_numpy(
     X: np.ndarray,
     residuals: np.ndarray,
diff --git a/diff_diff/results.py b/diff_diff/results.py
@@ -192,8 +192,10 @@ def summary(self, alpha: Optional[float] = None) -> str:
             if self.n_clusters is not None:
                 lines.append(f"{'Number of clusters:':<25} {self.n_clusters:>10}")
 
-        # Add variance family label (vcov_type) when set.
-        if self.vcov_type is not None:
+        # Add variance family label (vcov_type) only when inference was analytical.
+        # For wild-bootstrap etc. the reported SE/CI come from resampling, so the
+        # analytical variance family would mislabel the actual inference source.
+        if self.vcov_type is not None and self.inference_method == "analytical":
             label = _format_vcov_label(
                 self.vcov_type,
                 cluster_name=self.cluster_name,
@@ -426,6 +428,14 @@ class MultiPeriodDiDResults:
     interaction_indices: Optional[Dict[Any, int]] = field(default=None, repr=False)
     # Survey design metadata (SurveyMetadata instance from diff_diff.survey)
     survey_metadata: Optional[Any] = field(default=None)
+    # Inference method (always "analytical" today for MultiPeriodDiD; included for
+    # symmetry with DiDResults and so summary() can gate the Variance label).
+    inference_method: str = field(default="analytical")
+    n_bootstrap: Optional[int] = field(default=None)
+    n_clusters: Optional[int] = field(default=None)
+    # Variance-covariance family and cluster column for summary() labeling.
+    vcov_type: Optional[str] = field(default=None)
+    cluster_name: Optional[str] = field(default=None)
 
     def __repr__(self) -> str:
         """Concise string representation."""
@@ -493,6 +503,17 @@ def summary(self, alpha: Optional[float] = None) -> str:
             sm = self.survey_metadata
             lines.extend(_format_survey_block(sm, 80))
 
+        # Variance family label (only when inference was analytical).
+        if self.vcov_type is not None and self.inference_method == "analytical":
+            label = _format_vcov_label(
+                self.vcov_type,
+                cluster_name=self.cluster_name,
+                n_clusters=self.n_clusters,
+                n_obs=self.n_obs,
+            )
+            if label is not None:
+                lines.append(f"{'Variance:':<25} {label:>50}")
+
         # Pre-period effects (parallel trends test)
         pre_effects = {p: pe for p, pe in self.period_effects.items() if p in self.pre_periods}
         if pre_effects:
diff --git a/diff_diff/twfe.py b/diff_diff/twfe.py
@@ -216,21 +216,22 @@ def fit(  # type: ignore[override]
         if self.rank_deficient_action == "error":
             reg = LinearRegression(
                 include_intercept=False,
-                robust=True,
+                robust=self.robust,
                 cluster_ids=survey_cluster_ids if self.inference != "wild_bootstrap" else None,
                 alpha=self.alpha,
                 rank_deficient_action="error",
                 weights=survey_weights,
                 weight_type=survey_weight_type,
                 survey_design=_lr_survey_twfe,
+                vcov_type=self.vcov_type,
             ).fit(X, y, df_adjustment=df_adjustment)
         else:
             # Suppress generic warning, TWFE provides context-specific messages below
             with warnings.catch_warnings():
                 warnings.filterwarnings("ignore", message="Rank-deficient design matrix")
                 reg = LinearRegression(
                     include_intercept=False,
-                    robust=True,
+                    robust=self.robust,
                     cluster_ids=(
                         survey_cluster_ids if self.inference != "wild_bootstrap" else None
                     ),
@@ -239,6 +240,7 @@ def fit(  # type: ignore[override]
                     weights=survey_weights,
                     weight_type=survey_weight_type,
                     survey_design=_lr_survey_twfe,
+                    vcov_type=self.vcov_type,
                 ).fit(X, y, df_adjustment=df_adjustment)
 
         coefficients = reg.coefficients_
@@ -362,6 +364,10 @@ def _refit_twfe(w_r):
             n_bootstrap_used = self._bootstrap_results.n_bootstrap
             n_clusters_used = self._bootstrap_results.n_clusters
 
+        # Cluster label for summary: TWFE auto-clusters at unit level when
+        # self.cluster is None, so report that explicitly.
+        _twfe_cluster_label = self.cluster if self.cluster is not None else unit
+
         self.results_ = DiDResults(
             att=att,
             se=se,
@@ -381,6 +387,8 @@ def _refit_twfe(w_r):
             n_bootstrap=n_bootstrap_used,
             n_clusters=n_clusters_used,
             survey_metadata=survey_metadata,
+            vcov_type=self.vcov_type,
+            cluster_name=_twfe_cluster_label,
         )
 
         self.is_fitted_ = True
diff --git a/tests/test_estimators_vcov_type.py b/tests/test_estimators_vcov_type.py