Refactored scikit-learn flavour of DifferenceInDifferences and allowed custom column names for post_treatment variable. #515

roesta07 · 2025-07-30T06:01:49Z

closes issues #390 and #514

causal impact calculation in scikit-learn flavour of DifferenceInDifferences
Allow the user to use whatever column name they want for 'post_treatment' variable while constructing DifferenceInDifferences object with new parameter post_treatment_variable_name . Also setting its default value to 'post_treatment' so that it does not break previously written codes.

📚 Documentation preview 📚: https://causalpy--515.org.readthedocs.build/en/515/

…y for did

codecov · 2025-07-30T06:36:36Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 95.29%. Comparing base (09adfd7) to head (e222e9b).
⚠️ Report is 13 commits behind head on main.

Additional details and impacted files

@@            Coverage Diff             @@
##             main     #515      +/-   ##
==========================================
+ Coverage   95.19%   95.29%   +0.09%     
==========================================
  Files          28       28              
  Lines        2457     2507      +50     
==========================================
+ Hits         2339     2389      +50     
  Misses        118      118

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

drbenvincent

Looks like the remote checks are failing. Sometimes you need to run the pre-commit checks locally twice - the interrogate thing is a bit fiddly.
And looks like we'll need to increase test coverage. So obvious ones would be to include tests where we use the default, or a user-provided post treatment variable name.

Overall, this is looking good. Thanks for the PR :)

Oh, remember to update from main regularly :)

drbenvincent · 2025-07-30T07:52:00Z

causalpy/experiments/diff_in_diff.py

-            )
+        # Check if post_treatment_variable_name is in formula
+        if self.post_treatment_variable_name not in self.formula:
+            if self.post_treatment_variable_name == "post_treatment":


I've got a minor preference to just give one generic exception message, rather than a custom one dependent on self.post_treatment_variable_name. That will also cut down on the number of tests required to achieve high test coverage.

Yeah absolutely!! More generic ones like "Missing required variable '{self.post_treatment_variable_name}' in formula" can be used

drbenvincent · 2025-07-30T07:53:49Z

causalpy/experiments/diff_in_diff.py

+
+        # Check if post_treatment_variable_name is in data columns
+        if self.post_treatment_variable_name not in self.data.columns:
+            if self.post_treatment_variable_name == "post_treatment":


Same comment as above. Just give one more generic exception message, regardless of what self.post_treatment_variable_name is.

drbenvincent · 2025-07-30T07:55:54Z

causalpy/experiments/diff_in_diff.py

+            # Store the coefficient into dictionary {intercept:value}
+            coef_map = dict(zip(self.labels, self.model.get_coeffs()))
+            # Create and find the interaction term based on the values user provided
+            interaction_term = (


Nice. We'll need more tests anyway to ensure test coverage, so when you do that can you add cases for when people specify formulas like post_treatment:a and post_treatment*b. It should work because we'll always get a coefficient for post_treatment:a, but it is worth adding the test

Yeah, will add some tests for a cases where a user provides post treatment variable name and check for FormulaExeption and DataException

but @drbenvincent can you elaborate on this specific test. Are we also checking the coefficient value where two interaction terms are used?

I'd not thought of that. I guess it's easy to find and interaction term of the post treatment variable and something else. But if there are two interaction terms, both including the post treatment variable, then that might get messy. Can we think of any situations where that be a good idea? If not, then maybe that could throw and exception and we just say we can't deal with a formula like that?

Since our users can write any formula freely—unlike other libraries that rely on closed systems—they could specify any formula like post_treatment * group + post_treatment * group * male which might be uncommon but it’s entirely possible in our setup.

The users can obtain estimates for exactly what they define in the formula. However, we’ve built this did object specifically for two-way Diff-in-diff with a single interaction term ?-- thus the other features might get messed up as you said.

So yeah @drbenvincent I agree that we could throw exception if we encounter any two interaction term with post_treatment to move forward

drbenvincent · 2025-07-30T07:58:52Z

causalpy/experiments/diff_in_diff.py

@@ -128,6 +130,12 @@ def __init__(
            }
            self.model.fit(X=self.X, y=self.y, coords=COORDS)
        elif isinstance(self.model, RegressorMixin):
+            # For scikit-learn models, automatically set fit_intercept=False


…teraction terms,more generic messages

roesta07 · 2025-08-04T19:13:39Z

Hi @drbenvincent, here is a draft with following changes:

setup validation for cases to catch invalid interaction term specifications (e.g., three-way interactions, multiple interaction terms, repeated interaction terms)
expanded test coverages to test those invalid interaction terms and custom post_treatment_variable_name
made error messages more generic as stated above
Also not sure how the pre-commits are failing

drbenvincent

Sorry about the late review - work has been rather busy!

Just a couple of requests/suggestions, and them I'm very happy to merge :)

drbenvincent · 2025-08-28T16:22:19Z

causalpy/experiments/diff_in_diff.py

@@ -84,6 +86,7 @@ def __init__(
        formula: str,
        time_variable_name: str,
        group_variable_name: str,
+        post_treatment_variable_name: str = "post_treatment",


Can you add post_treatment_variable_name into the docstring to make it ultra clear what it does

drbenvincent · 2025-08-28T16:27:54Z

causalpy/experiments/diff_in_diff.py

@@ -236,6 +262,61 @@ def input_validation(self):
                coded. Consisting of 0's and 1's only."""
            )

+    def _get_interaction_terms(self):


Suggestion (not requirement). This could be made a static method and you could just pass in the formula string as an argument. Or it could be a simple utility function that could go in utils.py. It might help making testing the function marginally simpler.

Ideally we'd add some test cases for _get_interaction_terms. As in, come up with a set of example formulas and the expected outputs of the function.

drbenvincent · 2025-08-28T16:40:39Z

PS. I pushed a small change to get the pre-commit checks to work. So remember to pull the latest.

Rojan Shrestha added 2 commits July 29, 2025 22:06

Added post_treatment_variable_name parameter and sklearn model summar…

4ebe1a7

…y for did

Refactor DiD validation: segregate FormulaException and DataException

7fbb27a

drbenvincent requested changes Jul 30, 2025

View reviewed changes

added validations for interactions, test coverage expanded to test in…

c232d89

…teraction terms,more generic messages

get pre-commit checks to pass

e222e9b

drbenvincent requested changes Aug 28, 2025

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Refactored scikit-learn flavour of DifferenceInDifferences and allowed custom column names for post_treatment variable. #515

Refactored scikit-learn flavour of DifferenceInDifferences and allowed custom column names for post_treatment variable. #515

Uh oh!

roesta07 commented Jul 30, 2025 •

edited by github-actions bot

Loading

Uh oh!

codecov bot commented Jul 30, 2025 •

edited

Loading

Uh oh!

drbenvincent left a comment •

edited

Loading

Uh oh!

drbenvincent Jul 30, 2025

Uh oh!

roesta07 Jul 30, 2025

Uh oh!

drbenvincent Jul 30, 2025

Uh oh!

drbenvincent Jul 30, 2025

Uh oh!

roesta07 Jul 30, 2025

Uh oh!

drbenvincent Jul 30, 2025

Uh oh!

roesta07 Jul 31, 2025

Uh oh!

drbenvincent Jul 30, 2025

Uh oh!

roesta07 commented Aug 4, 2025

Uh oh!

drbenvincent left a comment

Uh oh!

drbenvincent Aug 28, 2025

Uh oh!

drbenvincent Aug 28, 2025

Uh oh!

drbenvincent commented Aug 28, 2025

Uh oh!

Uh oh!

Refactored scikit-learn flavour of DifferenceInDifferences and allowed custom column names for post_treatment variable. #515

Are you sure you want to change the base?

Refactored scikit-learn flavour of DifferenceInDifferences and allowed custom column names for post_treatment variable. #515

Uh oh!

Conversation

roesta07 commented Jul 30, 2025 • edited by github-actions bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

codecov bot commented Jul 30, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

drbenvincent left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

roesta07 commented Aug 4, 2025

Uh oh!

drbenvincent left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

drbenvincent commented Aug 28, 2025

Uh oh!

Uh oh!

roesta07 commented Jul 30, 2025 •

edited by github-actions bot

Loading

codecov bot commented Jul 30, 2025 •

edited

Loading

drbenvincent left a comment •

edited

Loading