Fix PT-All baseline misstatement: period 1 is not a valid baseline

igerber · claude · igerber · commit dba7d2f82f19 · 2026-03-15T13:09:22.000-04:00
The tutorial incorrectly stated "periods 1, 2, 3 can all serve as valid
baselines". Period 1 is the universal Y_1 reference and is excluded per
REGISTRY.md and efficient_did_weights.py. Corrected to "periods 2 and 3".

Co-Authored-By: Claude Opus 4.6 (1M context) &lt;noreply@anthropic.com&gt;
diff --git a/docs/tutorials/15_efficient_did.ipynb b/docs/tutorials/15_efficient_did.ipynb
@@ -63,7 +63,7 @@
    "cell_type": "markdown",
    "id": "4d734cd9",
    "metadata": {},
-   "source": "## What Makes EDiD Different?\n\nConsider a staggered adoption design with cohorts treated at periods 3, 5, and 7, plus a never-treated group. To estimate ATT(g=5, t=6), **Callaway-Sant'Anna** uses a single 2x2 comparison:\n\n> *Compare the outcome change from period 4 to 6 for cohort 5 versus the never-treated group.*\n\nBut under **PT-All** (parallel trends across all pre-treatment periods), there are *additional* valid comparisons. Cohort 7 is also untreated at period 6, so it can serve as a comparison group too. And periods 1, 2, 3 can all serve as valid baselines, not just period 4.\n\nEach of these comparisons provides an unbiased estimate of ATT(g=5, t=6), but with different variances. **EDiD finds the optimal linear combination** --- the one that minimizes variance --- by computing the inverse covariance matrix of these \"generated outcomes\" (the paper calls this $\\Omega^*$).\n\nThe result: **matching post-treatment ATT(g,t) with CS under PT-Post**, but **tighter standard errors under PT-All** because EDiD exploits the overidentification.\n\n> **Key equation (for the curious):** The efficient weight vector is $w^* = \\frac{\\mathbf{1}' \\Omega^{*-1}}{\\mathbf{1}' \\Omega^{*-1} \\mathbf{1}}$, where $\\Omega^*$ is the covariance matrix of the generated outcomes across all valid (comparison group, baseline) pairs. This is the classic GLS optimal weighting. See REGISTRY.md or the paper for full derivations."
+   "source": "## What Makes EDiD Different?\n\nConsider a staggered adoption design with cohorts treated at periods 3, 5, and 7, plus a never-treated group. To estimate ATT(g=5, t=6), **Callaway-Sant'Anna** uses a single 2x2 comparison:\n\n> *Compare the outcome change from period 4 to 6 for cohort 5 versus the never-treated group.*\n\nBut under **PT-All** (parallel trends across all pre-treatment periods), there are *additional* valid comparisons. Cohort 7 is also untreated at period 6, so it can serve as a comparison group too. And periods 2 and 3 can serve as additional valid baselines beyond CS's default period 4. (Period 1 is excluded --- it is the fixed $Y_1$ reference used in every comparison's differencing, so using it as a baseline adds no information.)\n\nEach of these comparisons provides an unbiased estimate of ATT(g=5, t=6), but with different variances. **EDiD finds the optimal linear combination** --- the one that minimizes variance --- by computing the inverse covariance matrix of these \"generated outcomes\" (the paper calls this $\\Omega^*$).\n\nThe result: **matching post-treatment ATT(g,t) with CS under PT-Post**, but **tighter standard errors under PT-All** because EDiD exploits the overidentification.\n\n> **Key equation (for the curious):** The efficient weight vector is $w^* = \\frac{\\mathbf{1}' \\Omega^{*-1}}{\\mathbf{1}' \\Omega^{*-1} \\mathbf{1}}$, where $\\Omega^*$ is the covariance matrix of the generated outcomes across all valid (comparison group, baseline) pairs. This is the classic GLS optimal weighting. See REGISTRY.md or the paper for full derivations."
   },
   {
    "cell_type": "markdown",

Original file line number	Diff line number	Diff line change
`@@ -63,7 +63,7 @@`
`63`	`63`	`"cell_type": "markdown",`
`64`	`64`	`"id": "4d734cd9",`
`65`	`65`	`"metadata": {},`
`66`		- "source": "## What Makes EDiD Different?\n\nConsider a staggered adoption design with cohorts treated at periods 3, 5, and 7, plus a never-treated group. To estimate ATT(g=5, t=6), Callaway-Sant'Anna uses a single 2x2 comparison:\n\n> Compare the outcome change from period 4 to 6 for cohort 5 versus the never-treated group.\n\nBut under PT-All (parallel trends across all pre-treatment periods), there are additional valid comparisons. Cohort 7 is also untreated at period 6, so it can serve as a comparison group too. And periods 1, 2, 3 can all serve as valid baselines, not just period 4.\n\nEach of these comparisons provides an unbiased estimate of ATT(g=5, t=6), but with different variances. EDiD finds the optimal linear combination --- the one that minimizes variance --- by computing the inverse covariance matrix of these \"generated outcomes\" (the paper calls this $\\Omega^$).\n\nThe result: matching post-treatment ATT(g,t) with CS under PT-Post, but tighter standard errors under PT-All* because EDiD exploits the overidentification.\n\n> Key equation (for the curious): The efficient weight vector is $w^* = \\frac{\\mathbf{1}' \\Omega^{-1}}{\\mathbf{1}' \\Omega^{-1} \\mathbf{1}}$, where $\\Omega^*$ is the covariance matrix of the generated outcomes across all valid (comparison group, baseline) pairs. This is the classic GLS optimal weighting. See REGISTRY.md or the paper for full derivations."
	`66`	+ "source": "## What Makes EDiD Different?\n\nConsider a staggered adoption design with cohorts treated at periods 3, 5, and 7, plus a never-treated group. To estimate ATT(g=5, t=6), Callaway-Sant'Anna uses a single 2x2 comparison:\n\n> Compare the outcome change from period 4 to 6 for cohort 5 versus the never-treated group.\n\nBut under PT-All (parallel trends across all pre-treatment periods), there are additional valid comparisons. Cohort 7 is also untreated at period 6, so it can serve as a comparison group too. And periods 2 and 3 can serve as additional valid baselines beyond CS's default period 4. (Period 1 is excluded --- it is the fixed $Y_1$ reference used in every comparison's differencing, so using it as a baseline adds no information.)\n\nEach of these comparisons provides an unbiased estimate of ATT(g=5, t=6), but with different variances. EDiD finds the optimal linear combination --- the one that minimizes variance --- by computing the inverse covariance matrix of these \"generated outcomes\" (the paper calls this $\\Omega^$).\n\nThe result: matching post-treatment ATT(g,t) with CS under PT-Post, but tighter standard errors under PT-All* because EDiD exploits the overidentification.\n\n> Key equation (for the curious): The efficient weight vector is $w^* = \\frac{\\mathbf{1}' \\Omega^{-1}}{\\mathbf{1}' \\Omega^{-1} \\mathbf{1}}$, where $\\Omega^*$ is the covariance matrix of the generated outcomes across all valid (comparison group, baseline) pairs. This is the classic GLS optimal weighting. See REGISTRY.md or the paper for full derivations."
`67`	`67`	`},`
`68`	`68`	`{`
`69`	`69`	`"cell_type": "markdown",`