Skip to content

Broaden dCDH by_path R-parser caveat to cover negative integers (re-audit follow-up to #401)#419

Merged
igerber merged 3 commits into
mainfrom
fix-audit-401
May 12, 2026
Merged

Broaden dCDH by_path R-parser caveat to cover negative integers (re-audit follow-up to #401)#419
igerber merged 3 commits into
mainfrom
fix-audit-401

Conversation

@igerber
Copy link
Copy Markdown
Owner

@igerber igerber commented May 12, 2026

Summary

  • Audit follow-up to PR dCDH by_path Wave 3 #8+#9: non-binary treatment + paths_of_interest #401. The restored CI reviewer's residual P2 noted that the R `did_multiplegt_by_path` parser caveat is documented only for `D >= 10`, even though dCDH by_path Wave 3 #8+#9: non-binary treatment + paths_of_interest #401 explicitly broadened support to all integer-coded discrete D - including negative integers.
  • R uses `substr(path, 1, 1)` to derive the per-path baseline from the comma-separated path string. For multi-character baselines this drops the rest of the value: `"12,12,..."` → `"1"`, and `"-1,-1,..."` → `"-"`. Both regimes mis-allocate R's per-path control-pool subset.
  • Python's tuple-key matching is correct in both regimes - the per-path point estimates we compute remain bit-correct. R-parity scenarios already stay in nonnegative single-digit `D in {0, 1, 2}` to avoid the R bug, and negative-integer baseline support is regression-tested on the Python side only.
  • Updated both surfaces carrying the original `D >= 10`-only caveat: CHANGELOG.md (unreleased) and the Phase 3 by_path Note in docs/methodology/REGISTRY.md.

Test plan

  • No code changed; pure documentation edit.
  • CI docs-snippet and existing by_path R-parity tests remain unaffected (no test references to the "D >= 10" string).

🤖 Generated with Claude Code

R's `did_multiplegt_by_path` uses `substr(path, 1, 1)` to derive the
per-path baseline from the comma-separated path string. The caveat
previously only documented `D >= 10` as the breaking regime (where
`"12,12,..."` collapses to `"1"`), but the same single-character
truncation also affects negative integer baselines such as `-1`
(`"-1,-1,..."` collapses to `"-"`) - both regimes mis-allocate R's
per-path control-pool subset.

Python's tuple-key matching is correct in both regimes; the per-path
point estimates we compute remain bit-correct. R-parity scenarios
already stay in nonnegative single-digit `D in {0, 1, 2}` to avoid
the R bug. Negative-integer baseline support is regression-tested
on the Python side only.

Updated both surfaces that carried the original `D >= 10`-only
caveat: CHANGELOG.md (unreleased entry) and the Phase 3 by_path
Note in docs/methodology/REGISTRY.md.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Overall Assessment

✅ Looks good

Executive Summary

  • This is a documentation-only PR; it does not change estimator code, weighting, variance/SE, inference, or defaults.
  • Affected method: ChaisemartinDHaultfoeuille.by_path non-binary integer treatment / R-parity caveat for multi-character baseline states.
  • The broadened Deviation from R note is methodologically appropriate and matches the current estimator contract in code and tests.
  • No undocumented methodology deviation, assumption-check gap, or SE/inference defect is introduced.
  • One P3 documentation inconsistency remains: the registry still overstates parity coverage as D in {0..9} even though the cited parity fixture only exercises D in {0,1,2}.

Methodology

Code Quality

  • No findings. The diff is documentation-only.

Performance

  • No findings. No runtime paths changed.

Maintainability

  • No findings beyond the documentation consistency item above.

Tech Debt

  • No findings. The PR does not add untracked deferred work.

Security

  • No findings. No executable code, dependencies, or secret-bearing surfaces changed.

Documentation/Tests

  • P3 — The parity class docstring still uses the narrower “breaks for D >= 10” wording at tests/test_chaisemartin_dhaultfoeuille_parity.py:L1192-L1205. Impact: internal test documentation is slightly out of sync with the broadened public-facing caveat. Concrete fix: align the docstring with the updated “multi-character baseline states (D >= 10 or negative D)" wording used in the changelog and registry.

R0 review on the prior commit surfaced two actionable P3s:

1. The registry sentence "R-parity for D in {0..9} is asserted"
   overstated evidence: the cited scenario only exercises D in
   {0, 1, 2}. Narrow the claim to match what's actually tested and
   separately note that single-digit nonnegative D shares the same
   string encoding (no parity scenario currently runs outside
   {0, 1, 2}).

2. The TestDCDHDynRParityByPathNonBinary class docstring still
   used the narrower "breaks for D >= 10" wording. Align with the
   broader "multi-character baseline states (D >= 10 or negative D)"
   wording used in CHANGELOG and the registry.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: cf327e1a51460a2f01a1df071eb13f16bb45477e


Overall Assessment

✅ Looks good

Executive Summary

Methodology

Code Quality

  • No findings. Documentation/test-docstring only.

Performance

  • No findings. No runtime path changed.

Maintainability

  • No findings. The prior re-review documentation mismatches were cleaned up in the changed registry and parity docstring.

Tech Debt

  • No findings. The PR does not introduce new deferred work that needs TODO.md tracking.

Security

  • No findings. No executable code, dependencies, or secret-bearing surfaces changed.

Documentation/Tests

  • Severity: P3. Impact: CHANGELOG.md:L15-L15 and docs/methodology/REGISTRY.md:L643-L643 now say that “negative-integer baseline support is regression-tested in Python only,” but the visible Python regression I found is tests/test_chaisemartin_dhaultfoeuille.py:L8228-L8268, which exercises paths (0, -1, -1, -1) and (0, 1, 1, 1). That covers negative treatment values in the path, not a negative baseline state such as (-1, 0, 0, 0), which is the exact regime behind the documented R substr(path, 1, 1) bug. Concrete fix: either narrow the prose to “negative treatment-state support is regression-tested in Python only,” or add a dedicated unit test with a genuinely negative baseline path and keep the current wording.

Review based on diff/source inspection only; I did not run tests.

R1 review on the prior commit flagged that the prose said
"negative-integer baseline support is regression-tested in Python only"
but the visible regression only exercises paths with negative D values
in non-baseline positions (e.g. `(0, -1, -1, -1)`), not negative-
baseline paths like `(-1, 0, 0, 0)` - which is the exact regime that
would trigger R's `substr(path, 1, 1)` parser bug.

Narrow the doc claim to match what's actually tested ("negative-D
treatment-state support in non-baseline positions") and add a TODO
entry to track the deferred negative-baseline regression so the gap
is explicit and the doc claim is accurate.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

🔁 AI review rerun (requested by @igerber)

Head SHA: 053c195e1da91b2855e45ef835a4bdc6de817c7a


Overall Assessment

✅ Looks good

Executive Summary

  • Re-review focus: the prior documentation/test-coverage concern is addressed. The changed prose now distinguishes shipped Python coverage for negative values in non-baseline positions from the still-deferred negative-baseline regime in CHANGELOG.md:L15-L15, docs/methodology/REGISTRY.md:L643-L643, and tests/test_chaisemartin_dhaultfoeuille_parity.py:L1192-L1206.
  • This PR is documentation-only plus a tracked TODO entry; it does not change estimator code, weighting, variance/SE, identification checks, or defaults.
  • The updated caveat matches the implementation: per-path enumeration canonicalizes integer-valued windows generically via tuple casting in diff_diff/chaisemartin_dhaultfoeuille.py:L5752-L5833.
  • The currently shipped negative-D regression still only covers (0, -1, -1, -1)-style paths, not a negative baseline, exactly as the new docs now say in tests/test_chaisemartin_dhaultfoeuille.py:L8228-L8268.
  • The remaining negative-baseline regression gap is explicitly tracked under Tech Debt from Code Reviews in TODO.md:L51-L62, so it is mitigated and non-blocking.

Methodology

  • No findings. The affected method is ChaisemartinDHaultfoeuille by_path with non-binary integer treatment, but this PR only updates documentation for an already-shipped R deviation. The revised registry note is consistent with the implementation and existing test coverage in docs/methodology/REGISTRY.md:L643-L643, diff_diff/chaisemartin_dhaultfoeuille.py:L446-L460, diff_diff/chaisemartin_dhaultfoeuille.py:L5752-L5833, and tests/test_chaisemartin_dhaultfoeuille.py:L8228-L8268.

Code Quality

  • No findings. The diff changes prose/docstrings/TODO only.

Performance

  • No findings. No runtime path changed.

Maintainability

  • No findings. The PR reduces ambiguity by aligning the caveat across the changelog, methodology registry, parity docstring, and tracked TODO in CHANGELOG.md:L15-L15, docs/methodology/REGISTRY.md:L643-L643, tests/test_chaisemartin_dhaultfoeuille_parity.py:L1192-L1206, and TODO.md:L62-L62.

Tech Debt

  • Severity: P3 (informational, mitigated). Impact: A dedicated negative-baseline regression is still missing, but the limitation is now explicitly tracked under Tech Debt from Code Reviews, so it does not block approval. Concrete fix: None required in this PR; follow TODO.md:L62-L62 by adding a fixture with D_{g,1} = -1 and asserting the (-1, 0, 0, 0) path key in tests/test_chaisemartin_dhaultfoeuille.py.

Security

  • No findings. No executable code, dependency, or secret-bearing surface changed.

Documentation/Tests

  • No findings. The previous overstatement is resolved: the docs now point to the actual shipped negative-D regression, which only exercises non-baseline negative states in tests/test_chaisemartin_dhaultfoeuille.py:L8228-L8268, and they explicitly defer the negative-baseline case in CHANGELOG.md:L15-L15 and docs/methodology/REGISTRY.md:L643-L643.

Review based on diff/source inspection only; I did not run tests.

@igerber igerber added the ready-for-ci Triggers CI test workflows label May 12, 2026
@igerber igerber merged commit 7addf14 into main May 12, 2026
29 of 30 checks passed
@igerber igerber deleted the fix-audit-401 branch May 12, 2026 22:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

ready-for-ci Triggers CI test workflows

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant