Broaden dCDH by_path R-parser caveat to cover negative integers (re-audit follow-up to #401) by igerber · Pull Request #419 · igerber/diff-diff

igerber · 2026-05-12T19:16:51Z

Summary

Audit follow-up to PR dCDH by_path Wave 3 #8+#9: non-binary treatment + paths_of_interest #401. The restored CI reviewer's residual P2 noted that the R `did_multiplegt_by_path` parser caveat is documented only for `D >= 10`, even though dCDH by_path Wave 3 #8+#9: non-binary treatment + paths_of_interest #401 explicitly broadened support to all integer-coded discrete D - including negative integers.
R uses `substr(path, 1, 1)` to derive the per-path baseline from the comma-separated path string. For multi-character baselines this drops the rest of the value: `"12,12,..."` → `"1"`, and `"-1,-1,..."` → `"-"`. Both regimes mis-allocate R's per-path control-pool subset.
Python's tuple-key matching is correct in both regimes - the per-path point estimates we compute remain bit-correct. R-parity scenarios already stay in nonnegative single-digit `D in {0, 1, 2}` to avoid the R bug, and negative-integer baseline support is regression-tested on the Python side only.
Updated both surfaces carrying the original `D >= 10`-only caveat: CHANGELOG.md (unreleased) and the Phase 3 by_path Note in docs/methodology/REGISTRY.md.

Test plan

No code changed; pure documentation edit.
CI docs-snippet and existing by_path R-parity tests remain unaffected (no test references to the "D >= 10" string).

🤖 Generated with Claude Code

R's `did_multiplegt_by_path` uses `substr(path, 1, 1)` to derive the per-path baseline from the comma-separated path string. The caveat previously only documented `D >= 10` as the breaking regime (where `"12,12,..."` collapses to `"1"`), but the same single-character truncation also affects negative integer baselines such as `-1` (`"-1,-1,..."` collapses to `"-"`) - both regimes mis-allocate R's per-path control-pool subset. Python's tuple-key matching is correct in both regimes; the per-path point estimates we compute remain bit-correct. R-parity scenarios already stay in nonnegative single-digit `D in {0, 1, 2}` to avoid the R bug. Negative-integer baseline support is regression-tested on the Python side only. Updated both surfaces that carried the original `D >= 10`-only caveat: CHANGELOG.md (unreleased entry) and the Phase 3 by_path Note in docs/methodology/REGISTRY.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-12T19:20:06Z

Overall Assessment

✅ Looks good

Executive Summary

This is a documentation-only PR; it does not change estimator code, weighting, variance/SE, inference, or defaults.
Affected method: ChaisemartinDHaultfoeuille.by_path non-binary integer treatment / R-parity caveat for multi-character baseline states.
The broadened Deviation from R note is methodologically appropriate and matches the current estimator contract in code and tests.
No undocumented methodology deviation, assumption-check gap, or SE/inference defect is introduced.
One P3 documentation inconsistency remains: the registry still overstates parity coverage as D in {0..9} even though the cited parity fixture only exercises D in {0,1,2}.

Methodology

No P0/P1 findings. The broadened caveat in CHANGELOG.md:L15 and docs/methodology/REGISTRY.md:L643 matches the existing by_path contract in diff_diff/chaisemartin_dhaultfoeuille.py:L446-L460 and the negative-integer regression in tests/test_chaisemartin_dhaultfoeuille.py:L8228-L8268. Impact: no methodology defect; this is a documented R deviation and is correctly non-blocking. Concrete fix: none.
P3 — docs/methodology/REGISTRY.md:L643 still says “R-parity for D in {0..9} is asserted,” but the cited test only documents D in {0, 1, 2} at tests/test_chaisemartin_dhaultfoeuille_parity.py:L1226-L1229. Impact: the methodology registry overstates the actual parity evidence. Concrete fix: narrow that sentence to the tested set, or explicitly separate “single-digit states are string-compatible” from “shipped R-parity coverage is {0,1,2}.”

Code Quality

No findings. The diff is documentation-only.

Performance

No findings. No runtime paths changed.

Maintainability

No findings beyond the documentation consistency item above.

Tech Debt

No findings. The PR does not add untracked deferred work.

Security

No findings. No executable code, dependencies, or secret-bearing surfaces changed.

Documentation/Tests

P3 — The parity class docstring still uses the narrower “breaks for D >= 10” wording at tests/test_chaisemartin_dhaultfoeuille_parity.py:L1192-L1205. Impact: internal test documentation is slightly out of sync with the broadened public-facing caveat. Concrete fix: align the docstring with the updated “multi-character baseline states (D >= 10 or negative D)" wording used in the changelog and registry.

R0 review on the prior commit surfaced two actionable P3s: 1. The registry sentence "R-parity for D in {0..9} is asserted" overstated evidence: the cited scenario only exercises D in {0, 1, 2}. Narrow the claim to match what's actually tested and separately note that single-digit nonnegative D shares the same string encoding (no parity scenario currently runs outside {0, 1, 2}). 2. The TestDCDHDynRParityByPathNonBinary class docstring still used the narrower "breaks for D >= 10" wording. Align with the broader "multi-character baseline states (D >= 10 or negative D)" wording used in CHANGELOG and the registry. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-12T19:25:09Z

🔁 AI review rerun (requested by @igerber)

Head SHA: cf327e1a51460a2f01a1df071eb13f16bb45477e

Overall Assessment

✅ Looks good

Executive Summary

Re-review focus: the two prior P3 items are resolved. The registry now distinguishes shipped R-parity coverage on {0, 1, 2} from broader single-digit string compatibility, and the parity docstring now covers the negative-baseline caveat in docs/methodology/REGISTRY.md:L643-L643 and tests/test_chaisemartin_dhaultfoeuille_parity.py:L1192-L1206.
This PR is documentation-only. It does not change estimator code, weighting, variance/SE, identification checks, or defaults.
The broadened R-parser caveat is consistent with the shipped integer-coded by_path implementation in diff_diff/chaisemartin_dhaultfoeuille.py:L457-L460 and diff_diff/chaisemartin_dhaultfoeuille.py:L5787-L5832.
No P0/P1 findings.
One new minor P3 remains: the updated prose overstates what is Python-regression-tested for the negative-baseline regime.

Methodology

No findings. The changed text documents an existing Deviation from R rather than introducing a new estimator behavior, and it matches the current tuple-key path enumeration and integer-valued treatment contract in diff_diff/chaisemartin_dhaultfoeuille.py:L457-L460 and diff_diff/chaisemartin_dhaultfoeuille.py:L5787-L5832.

Code Quality

No findings. Documentation/test-docstring only.

Performance

No findings. No runtime path changed.

Maintainability

No findings. The prior re-review documentation mismatches were cleaned up in the changed registry and parity docstring.

Tech Debt

No findings. The PR does not introduce new deferred work that needs TODO.md tracking.

Security

No findings. No executable code, dependencies, or secret-bearing surfaces changed.

Documentation/Tests

Severity: P3. Impact: CHANGELOG.md:L15-L15 and docs/methodology/REGISTRY.md:L643-L643 now say that “negative-integer baseline support is regression-tested in Python only,” but the visible Python regression I found is tests/test_chaisemartin_dhaultfoeuille.py:L8228-L8268, which exercises paths (0, -1, -1, -1) and (0, 1, 1, 1). That covers negative treatment values in the path, not a negative baseline state such as (-1, 0, 0, 0), which is the exact regime behind the documented R substr(path, 1, 1) bug. Concrete fix: either narrow the prose to “negative treatment-state support is regression-tested in Python only,” or add a dedicated unit test with a genuinely negative baseline path and keep the current wording.

Review based on diff/source inspection only; I did not run tests.

R1 review on the prior commit flagged that the prose said "negative-integer baseline support is regression-tested in Python only" but the visible regression only exercises paths with negative D values in non-baseline positions (e.g. `(0, -1, -1, -1)`), not negative- baseline paths like `(-1, 0, 0, 0)` - which is the exact regime that would trigger R's `substr(path, 1, 1)` parser bug. Narrow the doc claim to match what's actually tested ("negative-D treatment-state support in non-baseline positions") and add a TODO entry to track the deferred negative-baseline regression so the gap is explicit and the doc claim is accurate. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-12T19:29:54Z

🔁 AI review rerun (requested by @igerber)

Head SHA: 053c195e1da91b2855e45ef835a4bdc6de817c7a

Overall Assessment

✅ Looks good

Executive Summary

Re-review focus: the prior documentation/test-coverage concern is addressed. The changed prose now distinguishes shipped Python coverage for negative values in non-baseline positions from the still-deferred negative-baseline regime in CHANGELOG.md:L15-L15, docs/methodology/REGISTRY.md:L643-L643, and tests/test_chaisemartin_dhaultfoeuille_parity.py:L1192-L1206.
This PR is documentation-only plus a tracked TODO entry; it does not change estimator code, weighting, variance/SE, identification checks, or defaults.
The updated caveat matches the implementation: per-path enumeration canonicalizes integer-valued windows generically via tuple casting in diff_diff/chaisemartin_dhaultfoeuille.py:L5752-L5833.
The currently shipped negative-D regression still only covers (0, -1, -1, -1)-style paths, not a negative baseline, exactly as the new docs now say in tests/test_chaisemartin_dhaultfoeuille.py:L8228-L8268.
The remaining negative-baseline regression gap is explicitly tracked under Tech Debt from Code Reviews in TODO.md:L51-L62, so it is mitigated and non-blocking.

Methodology

No findings. The affected method is ChaisemartinDHaultfoeuille by_path with non-binary integer treatment, but this PR only updates documentation for an already-shipped R deviation. The revised registry note is consistent with the implementation and existing test coverage in docs/methodology/REGISTRY.md:L643-L643, diff_diff/chaisemartin_dhaultfoeuille.py:L446-L460, diff_diff/chaisemartin_dhaultfoeuille.py:L5752-L5833, and tests/test_chaisemartin_dhaultfoeuille.py:L8228-L8268.

Code Quality

No findings. The diff changes prose/docstrings/TODO only.

Performance

No findings. No runtime path changed.

Maintainability

No findings. The PR reduces ambiguity by aligning the caveat across the changelog, methodology registry, parity docstring, and tracked TODO in CHANGELOG.md:L15-L15, docs/methodology/REGISTRY.md:L643-L643, tests/test_chaisemartin_dhaultfoeuille_parity.py:L1192-L1206, and TODO.md:L62-L62.

Tech Debt

Severity: P3 (informational, mitigated). Impact: A dedicated negative-baseline regression is still missing, but the limitation is now explicitly tracked under Tech Debt from Code Reviews, so it does not block approval. Concrete fix: None required in this PR; follow TODO.md:L62-L62 by adding a fixture with D_{g,1} = -1 and asserting the (-1, 0, 0, 0) path key in tests/test_chaisemartin_dhaultfoeuille.py.

Security

No findings. No executable code, dependency, or secret-bearing surface changed.

Documentation/Tests

No findings. The previous overstatement is resolved: the docs now point to the actual shipped negative-D regression, which only exercises non-baseline negative states in tests/test_chaisemartin_dhaultfoeuille.py:L8228-L8268, and they explicitly defer the negative-baseline case in CHANGELOG.md:L15-L15 and docs/methodology/REGISTRY.md:L643-L643.

Review based on diff/source inspection only; I did not run tests.

igerber added the ready-for-ci Triggers CI test workflows label May 12, 2026

igerber merged commit 7addf14 into main May 12, 2026
29 of 30 checks passed

igerber deleted the fix-audit-401 branch May 12, 2026 22:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Broaden dCDH by_path R-parser caveat to cover negative integers (re-audit follow-up to #401)#419

Broaden dCDH by_path R-parser caveat to cover negative integers (re-audit follow-up to #401)#419
igerber merged 3 commits into
mainfrom
fix-audit-401

igerber commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

igerber commented May 12, 2026

Summary

Test plan

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

github-actions Bot commented May 12, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant