Skip to content

Conversation

batuhanovski
Copy link
Contributor

@batuhanovski batuhanovski commented Oct 1, 2025

Description

This PR updates the nuisance tuning procedure for the Sample Selection Model (SSM) in the nonignorable case.

Previously, the nuisance tuning routine did not mirror the estimation logic:

  • In estimation, a foldwise procedure was used where each outer fold was split into two inner halves.
  • On the first half, a preliminary propensity score (π hat) was estimated.
  • On the second half, these π̂ values were then used as additional covariates when tuning the nuisance functions m and g.

However, the original tuning function bypassed this step and directly tuned m and g on the baseline covariates.
This led to an inconsistency between how nuisance functions were tuned vs. how they were actually estimated.

This PR aligns both procedures by applying the same foldwise π hat preliminary logic in the tuning stage.
In particular:

  • The nuisance tuning function for the nonignorable case now generates preliminary π hats on inner0 folds, predicts on inner1, and uses them for m/g tuning.
  • This ensures that the feature sets and sample splits used in nuisance tuning are consistent with those in nuisance estimation.
  • As a result, the hyperparameter selection for m and g better reflects the actual estimation setting.

spl_method_dml_ssm


Tests and Utilities

  • test_ssm_tune.py: Updated to validate the new tuning procedure for the nonignorable case.
  • _utils_ssm_manual.py:
    • The previous generic tune_nuisance_ssm helper was split into two functions:
      • tune_nuisance_ssm_mar: unchanged, reproduces the MAR case with the same behavior as before.
      • tune_nuisance_ssm_nonignorable: new routine implementing the updated foldwise π̂-preliminary logic for the nonignorable case.
    • This separation ensures that manual tuning for MAR remains backward compatible, while the nonignorable case mirrors the revised estimation procedure.
  • Manual tuning utilities are now used in the tests to check consistency of fold structure and feature usage between tuning and estimation.

PR Checklist

  • The title of the pull request summarizes the changes made.
  • The PR contains a detailed description of all changes and additions.
  • References to related issues or PRs are added.
  • The code passes all (unit) tests.
  • Enhancements or new feature are equipped with unit tests.
  • The changes adhere to the PEP8 standards.

@batuhanovski batuhanovski marked this pull request as ready for review October 1, 2025 12:05
@SvenKlaassen
Copy link
Member

Thank you very much @batuhanovski!

This looks already really great. I just refactored some part of the tuning process, such that is was more easily readable for me (to help for future maintainability).
Could you check if you are fine with the changes I made?
Afterwards, I would merge the PR.

@batuhanovski
Copy link
Contributor Author

Thank you @SvenKlaassen! It looks great and the logic is fully preserved. The use of the tune_learner function definitely makes sense. From my side, everything looks good.

@SvenKlaassen SvenKlaassen merged commit d61c040 into DoubleML:main Oct 2, 2025
10 checks passed
@SvenKlaassen
Copy link
Member

Thank you. I will mention the changes in the next release.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants