Fix methodology descriptions per PR review feedback

igerber · claude · igerber · commit 9cdfa17a139e · 2026-02-16T16:02:53.000-05:00
- CallawaySantAnna inference: clarify analytical influence-function SEs
  by default, optional multiplier bootstrap when n_bootstrap &gt; 0
- treatment_effects.weight: correct to 1/n_valid for finite tau_hat,
  0 for NaN rows (not 1/n_treated)
- Summary table: update CS variance description for consistency

Co-Authored-By: Claude Opus 4.6 &lt;noreply@anthropic.com&gt;
diff --git a/docs/tutorials/12_two_stage_did.ipynb b/docs/tutorials/12_two_stage_did.ipynb
@@ -96,17 +96,7 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "## Per-Observation Treatment Effects\n",
-    "\n",
-    "A feature unique to `TwoStageDiD` is the `treatment_effects` DataFrame, which contains one row per treated observation with:\n",
-    "- `tau_hat`: the residualized outcome (actual outcome minus estimated counterfactual)\n",
-    "- The unit and time columns (using the original column names from the input data, e.g., `unit` and `period`)\n",
-    "- `rel_time`: relative time since treatment\n",
-    "- `weight`: aggregation weight (1/n_treated)\n",
-    "\n",
-    "This enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes."
-   ]
+   "source": "## Per-Observation Treatment Effects\n\nA feature unique to `TwoStageDiD` is the `treatment_effects` DataFrame, which contains one row per treated observation with:\n- `tau_hat`: the residualized outcome (actual outcome minus estimated counterfactual)\n- The unit and time columns (using the original column names from the input data, e.g., `unit` and `period`)\n- `rel_time`: relative time since treatment\n- `weight`: aggregation weight — `1/n_valid` for observations with finite `tau_hat`, `0` for NaN rows (e.g., rank-deficient cases)\n\nThis enables granular analysis: examining which units or periods drive the aggregate effect, detecting outliers, or constructing custom aggregation schemes."
   },
   {
    "cell_type": "code",
@@ -125,15 +115,7 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "## Comparison with Other Estimators\n",
-    "\n",
-    "TwoStageDiD and ImputationDiD produce **identical point estimates** because both estimate fixed effects on untreated observations and use them to residualize outcomes. The key difference is the variance estimator: TwoStageDiD uses the GMM sandwich from Butts & Gardner (2022), while ImputationDiD uses the conservative variance from Borusyak et al. (2024, Theorem 3).\n",
-    "\n",
-    "CallawaySantAnna uses a fundamentally different estimation approach — computing group-time ATT(g,t) effects via outcome regression, IPW, or doubly robust methods, then aggregating — so point estimates may differ, especially under heterogeneous effects. Its standard errors come from an analytical multiplier bootstrap on the influence function.\n",
-    "\n",
-    "*Note: Tutorial 11 compared ImputationDiD against CallawaySantAnna and SunAbraham. Here we focus on the TwoStageDiD vs ImputationDiD point-estimate identity, with CallawaySantAnna as a widely used reference point. For SunAbraham comparisons, see Tutorial 11.*"
-   ]
+   "source": "## Comparison with Other Estimators\n\nTwoStageDiD and ImputationDiD produce **identical point estimates** because both estimate fixed effects on untreated observations and use them to residualize outcomes. The key difference is the variance estimator: TwoStageDiD uses the GMM sandwich from Butts & Gardner (2022), while ImputationDiD uses the conservative variance from Borusyak et al. (2024, Theorem 3).\n\nCallawaySantAnna uses a fundamentally different estimation approach — computing group-time ATT(g,t) effects via outcome regression, IPW, or doubly robust methods, then aggregating — so point estimates may differ, especially under heterogeneous effects. It uses analytical influence-function standard errors by default, with optional multiplier bootstrap when `n_bootstrap > 0`.\n\n*Note: Tutorial 11 compared ImputationDiD against CallawaySantAnna and SunAbraham. Here we focus on the TwoStageDiD vs ImputationDiD point-estimate identity, with CallawaySantAnna as a widely used reference point. For SunAbraham comparisons, see Tutorial 11.*"
   },
   {
    "cell_type": "code",
@@ -255,22 +237,7 @@
   {
    "cell_type": "markdown",
    "metadata": {},
-   "source": [
-    "## Summary\n",
-    "\n",
-    "| Feature | TwoStageDiD | ImputationDiD | CallawaySantAnna |\n",
-    "|---------|-------------|---------------|------------------|\n",
-    "| **Approach** | Residualize via FE, regress on treatment | Impute Y(0) via FE model | Group-time ATT(g,t) |\n",
-    "| **Point estimates** | Identical to ImputationDiD | Identical to TwoStageDiD | Different weighting |\n",
-    "| **Variance** | GMM sandwich (influence function) | Conservative (Theorem 3) | Analytical (influence function) |\n",
-    "| **Per-obs effects** | Yes (`treatment_effects`) | No | No |\n",
-    "| **Pre-trend test** | Via event study pre-periods | Yes (built-in F-test) | Via event study pre-periods |\n",
-    "| **Best for** | Robustness check, granular effects | Maximum efficiency under homogeneity | Heterogeneous effects |\n",
-    "\n",
-    "**References:**\n",
-    "- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*.\n",
-    "- Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. *R Journal*, 14(1), 162-173."
-   ]
+   "source": "## Summary\n\n| Feature | TwoStageDiD | ImputationDiD | CallawaySantAnna |\n|---------|-------------|---------------|------------------|\n| **Approach** | Residualize via FE, regress on treatment | Impute Y(0) via FE model | Group-time ATT(g,t) |\n| **Point estimates** | Identical to ImputationDiD | Identical to TwoStageDiD | Different weighting |\n| **Variance** | GMM sandwich (influence function) | Conservative (Theorem 3) | Analytical influence function (optional bootstrap) |\n| **Per-obs effects** | Yes (`treatment_effects`) | No | No |\n| **Pre-trend test** | Via event study pre-periods | Yes (built-in F-test) | Via event study pre-periods |\n| **Best for** | Robustness check, granular effects | Maximum efficiency under homogeneity | Heterogeneous effects |\n\n**References:**\n- Gardner, J. (2022). Two-stage differences in differences. *arXiv:2207.05943*.\n- Butts, K. & Gardner, J. (2022). did2s: Two-Stage Difference-in-Differences. *R Journal*, 14(1), 162-173."
   }
  ],
  "metadata": {
@@ -280,4 +247,4 @@
  },
  "nbformat": 4,
  "nbformat_minor": 4
-}
+}