Skip to content

Commit 768ceea

Browse files
igerberclaude
andcommitted
Use dedicated simultaneous-treatment dataset for MultiPeriodDiD tutorial
Instead of explaining rank deficiency behavior, create a clean dataset using generate_did_data() with simultaneous treatment timing. This shows MultiPeriodDiD working as intended without warnings or edge cases. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
1 parent 293e17f commit 768ceea

1 file changed

Lines changed: 2 additions & 2 deletions

File tree

docs/tutorials/02_staggered_did.ipynb

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -671,7 +671,7 @@
671671
{
672672
"cell_type": "markdown",
673673
"metadata": {},
674-
"source": "## 11. Comparing with MultiPeriodDiD\n\nFor comparison, here's how you would use `MultiPeriodDiD` which estimates period-specific effects. \n\n**Important**: `MultiPeriodDiD` assumes **simultaneous treatment timing** (all treated units get treated at the same time). For staggered adoption, always use `CallawaySantAnna` or `SunAbraham` instead.\n\nTo demonstrate `MultiPeriodDiD` properly, we'll filter to a single treatment cohort (cohort 3) plus never-treated units.\n\n**Note on rank deficiency**: `MultiPeriodDiD` creates a design matrix with period dummies and treatment interactions. Depending on the data structure, this can create linear dependencies. When this happens, the solver will:\n- Emit a warning listing the dropped columns\n- Set coefficients of dropped columns to NA\n- Compute valid estimates for the remaining (identified) parameters\n\nThis R-style handling ensures you get useful results while being warned about the structural issue."
674+
"source": "## 11. Comparing with MultiPeriodDiD\n\nFor comparison, here's how you would use `MultiPeriodDiD` which estimates period-specific effects. \n\n**Important**: `MultiPeriodDiD` assumes **simultaneous treatment timing** (all treated units get treated at the same time). For staggered adoption, always use `CallawaySantAnna` or `SunAbraham` instead.\n\nTo demonstrate `MultiPeriodDiD` properly, we'll create a simple dataset where all treated units receive treatment at the same time."
675675
},
676676
{
677677
"cell_type": "code",
@@ -685,7 +685,7 @@
685685
}
686686
},
687687
"outputs": [],
688-
"source": "# Filter to cohort 3 only (single treatment timing) plus never-treated\n# This is the appropriate data structure for MultiPeriodDiD\ncohort3_df = df[df['cohort'].isin([0, 3])].copy()\n\nmp_did = MultiPeriodDiD()\nresults_mp = mp_did.fit(\n cohort3_df,\n outcome=\"outcome\",\n treatment=\"treated\",\n time=\"period\",\n post_periods=[3, 4, 5, 6, 7]\n)\n\nprint(results_mp.summary())"
688+
"source": "# Create a simple dataset with simultaneous treatment timing\n# This is the appropriate data structure for MultiPeriodDiD\nfrom diff_diff import generate_did_data\n\n# Generate data with simultaneous treatment at period 4\nmp_data = generate_did_data(\n n_units=100,\n n_periods=8,\n treatment_period=4, # All treated units get treatment at period 4\n treatment_fraction=0.5,\n treatment_effect=2.5,\n seed=42\n)\n\nprint(f\"MultiPeriodDiD dataset: {len(mp_data)} obs\")\nprint(f\"Treatment starts at period 4 for all treated units\")\n\nmp_did = MultiPeriodDiD()\nresults_mp = mp_did.fit(\n mp_data,\n outcome=\"outcome\",\n treatment=\"treated\",\n time=\"period\",\n post_periods=[4, 5, 6, 7]\n)\n\nprint(results_mp.summary())"
689689
},
690690
{
691691
"cell_type": "code",

0 commit comments

Comments
 (0)