Skip to content

fix(scripts): reject empty report source args (#807)#1186

Open
yanyishuai wants to merge 1 commit into
ramimbo:mainfrom
yanyishuai:fix/issue-807-report-source-validation
Open

fix(scripts): reject empty report source args (#807)#1186
yanyishuai wants to merge 1 commit into
ramimbo:mainfrom
yanyishuai:fix/issue-807-report-source-validation

Conversation

@yanyishuai

@yanyishuai yanyishuai commented Jun 30, 2026

Copy link
Copy Markdown

Summary

Harden remaining maintainer report scripts so empty or whitespace-padded --input / --repo source arguments fail with bounded argparse errors instead of tracebacks or malformed live-mode calls.

Changes

  • scripts/proposed_work_triage.py — validate --repo and --input paths
  • scripts/claim_inventory.py — validate --input / --repo
  • scripts/pr_queue_health.py — validate --input / --repo
  • Tests mirror #801 / #805 patterns for empty and whitespace-padded CLI args

Fixes #807

Wallet: Do4v7foHJvRJLpRRoGaVPWX6DDEjX3yTK7J91gpwUQpE

Summary by CodeRabbit

  • Bug Fixes
    • Added stricter CLI input validation to reject empty values and any leading/trailing whitespace for source arguments.
    • Improved safety-limit enforcement for GitHub data collection so runs fail fast when result thresholds are reached.
  • New Features
    • Introduced clearer separation of safety thresholds between public API and live repository collection.
  • Tests
    • Expanded automated coverage for invalid CLI inputs and safety-cap behavior across inventory, queue health, and triage workflows.

@coderabbitai

coderabbitai Bot commented Jun 30, 2026

Copy link
Copy Markdown

Review Change Stack

📝 Walkthrough

Walkthrough

Adds safety-cap checks to scripts/claim_inventory.py and strict CLI validation for --input and --repo in three report scripts. New tests cover cap overflow, fixture/live loading guards, and invalid source arguments.

Changes

Safety caps and CLI source validation

Layer / File(s) Summary
Safety-cap constants and loader guards
scripts/claim_inventory.py
Defines GH_PUBLIC_API_SAFETY_CAP, GH_ISSUE_SAFETY_CAP, and GH_PR_SAFETY_CAP, then uses them in public API and live inventory loading with RuntimeError guards when returned lists reach the caps.
CLI source validation in report scripts
scripts/claim_inventory.py, scripts/pr_queue_health.py, scripts/proposed_work_triage.py
Adds strict validation for --input and --repo values before loading fixture or live data, and updates the main control flow to use the validated arguments.
Regression tests for caps and invalid source args
tests/test_claim_inventory.py, tests/test_pr_queue_health.py, tests/test_proposed_work_triage.py
Adds and updates tests for cap constants, overflow failures, and CLI rejection of empty, whitespace-only, or padded source arguments.

Possibly related issues

  • #978: Directly covers the scripts/claim_inventory.py safety-cap guards and matching regression tests added here.

Possibly related PRs

  • ramimbo/mergework#324: Shares the scripts/pr_queue_health.py CLI path and related source-argument handling changes.
  • ramimbo/mergework#582: Closely related to the scripts/claim_inventory.py loader and inventory-report behavior updated here.
  • ramimbo/mergework#763: Shares the scripts/proposed_work_triage.py CLI entrypoint and source-selection code path.

Suggested labels

mrwk:accepted, mrwk:paid

Suggested reviewers

  • jakerated-r
  • elianguitarra
  • JONASXZB
🚥 Pre-merge checks | ✅ 3 | ❌ 2

❌ Failed checks (2 warnings)

Check name Status Explanation Resolution
Out of Scope Changes check ⚠️ Warning claim_inventory.py also adds new live/public GitHub safety-cap logic, which is unrelated to the linked validation issue. Split the safety-cap changes into a separate PR or remove them from this issue-scoped patch.
Docstring Coverage ⚠️ Warning Docstring coverage is 1.82% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (3 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately summarizes the main change: stricter validation for report source arguments in scripts.
Linked Issues check ✅ Passed The PR matches #807 by hardening --input/--repo validation in all three scripts and adding focused tests.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 37deb7af-35b0-4c21-bf7b-119a18808188

📥 Commits

Reviewing files that changed from the base of the PR and between 3bc87d2 and 59db114.

📒 Files selected for processing (6)
  • scripts/claim_inventory.py
  • scripts/pr_queue_health.py
  • scripts/proposed_work_triage.py
  • tests/test_claim_inventory.py
  • tests/test_pr_queue_health.py
  • tests/test_proposed_work_triage.py

Comment on lines +532 to +545
def _require_input_path(parser: argparse.ArgumentParser, path: Path) -> Path:
raw = str(path)
if not raw.strip():
parser.error("--input must be a non-empty path")
if raw != raw.strip():
parser.error("--input must not include leading or trailing whitespace")
return path


def main(argv: list[str] | None = None) -> int:
parser = argparse.ArgumentParser(description="Read-only proposed-work intake triage report")
source = parser.add_mutually_exclusive_group(required=True)
source.add_argument("--input", type=Path, help="Offline JSON fixture with issues/payments")
source.add_argument("--repo", help="GitHub repo for read-only gh live mode")

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎯 Functional Correctness | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
python - <<'PY'
from pathlib import Path
print(f"Path('') -> {Path('')!r}")
print(f"str(Path('')) -> {str(Path(''))!r}")
PY

rg -n -C2 'source_args|--input' tests/test_proposed_work_triage.py

Repository: ramimbo/mergework

Length of output: 1056


--input "" still bypasses the new check. argparse applies type=Path first, so an empty value becomes Path("."); _require_input_path() then sees "." and the script can fail with an OS error instead of argparse’s exit-2 path. Accept the raw string first, then convert to Path after validation. scripts/proposed_work_triage.py:544-565

Comment thread tests/test_claim_inventory.py
Comment thread tests/test_pr_queue_health.py
Comment on lines +785 to +802
@pytest.mark.parametrize(
("source_args", "expected_message"),
(
(["--repo", ""], "--repo must be a non-empty value"),
(["--repo", " "], "--repo must be a non-empty value"),
(["--repo", " ramimbo/mergework "], "--repo must not include"),
),
)
def test_proposed_work_triage_rejects_empty_repo_source(
source_args: list[str],
expected_message: str,
capsys,
) -> None:
with pytest.raises(SystemExit) as excinfo:
main([*source_args, "--format", "json"])

assert excinfo.value.code == 2
assert expected_message in capsys.readouterr().err

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

This still misses the --input validation branch.

The issue scope here is both source arguments, but this parametrization only exercises --repo. Blank or whitespace-padded --input values can still regress without any test failing.

Suggested test extension
 `@pytest.mark.parametrize`(
     ("source_args", "expected_message"),
     (
+        (["--input", ""], "--input must be a non-empty value"),
+        (["--input", "   "], "--input must be a non-empty value"),
+        (["--input", " fixture.json "], "--input must not include"),
         (["--repo", ""], "--repo must be a non-empty value"),
         (["--repo", "   "], "--repo must be a non-empty value"),
         (["--repo", " ramimbo/mergework "], "--repo must not include"),
     ),
 )
-def test_proposed_work_triage_rejects_empty_repo_source(
+def test_proposed_work_triage_rejects_empty_source_args(

As per coding guidelines, "Add or update tests for changed behavior." As per path instructions, "Focus on whether tests prove the changed behavior and include negative, replay, boundary, or regression cases where relevant."

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
@pytest.mark.parametrize(
("source_args", "expected_message"),
(
(["--repo", ""], "--repo must be a non-empty value"),
(["--repo", " "], "--repo must be a non-empty value"),
(["--repo", " ramimbo/mergework "], "--repo must not include"),
),
)
def test_proposed_work_triage_rejects_empty_repo_source(
source_args: list[str],
expected_message: str,
capsys,
) -> None:
with pytest.raises(SystemExit) as excinfo:
main([*source_args, "--format", "json"])
assert excinfo.value.code == 2
assert expected_message in capsys.readouterr().err
`@pytest.mark.parametrize`(
("source_args", "expected_message"),
(
(["--input", ""], "--input must be a non-empty value"),
(["--input", " "], "--input must be a non-empty value"),
(["--input", " fixture.json "], "--input must not include"),
(["--repo", ""], "--repo must be a non-empty value"),
(["--repo", " "], "--repo must be a non-empty value"),
(["--repo", " ramimbo/mergework "], "--repo must not include"),
),
)
def test_proposed_work_triage_rejects_empty_source_args(
source_args: list[str],
expected_message: str,
capsys,
) -> None:
with pytest.raises(SystemExit) as excinfo:
main([*source_args, "--format", "json"])
assert excinfo.value.code == 2
assert expected_message in capsys.readouterr().err

Sources: Coding guidelines, Path instructions

@qingfeng312 qingfeng312 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed current head 59db114b195cbed357f4ec7eaf65d16101d92a3f. This needs changes before merge: the hosted pytest run fails during collection because scripts/pr_queue_health.py and scripts/proposed_work_triage.py import scripts.gh_cli_constants, but this PR does not add that module. That makes both scripts and their test modules import-fail before the new source-argument assertions can run. Please either add the shared module or keep the timeout constant local, then rerun the full quality/readiness workflow.

I also checked the new argparse coverage: proposed_work_triage should validate raw --input strings before converting to Path, otherwise an empty value can become Path(".") and miss the intended bounded error path.

@qingfeng312 qingfeng312 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed updated head 6f20f2a088ebd8864cd4f2fa7072e1f26c424283. This still needs changes: hosted pytest fails during collection because scripts/pr_queue_health.py and scripts/proposed_work_triage.py import scripts.gh_cli_constants, but the module is still absent from the PR. That keeps both test modules from importing and prevents validation of the new source-argument tests.

The proposed_work_triage --input path also still needs raw-string validation before Path conversion so --input "" cannot become Path(".") and bypass the intended argparse error path.

@yanyishuai yanyishuai force-pushed the fix/issue-807-report-source-validation branch from 6f20f2a to c3fa196 Compare June 30, 2026 13:53

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

♻️ Duplicate comments (3)
tests/test_proposed_work_triage.py (1)

787-793: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Missing --input validation coverage. This parametrization only exercises --repo; blank or whitespace-padded --input is untested. Note this gap aligns with the --input "" bypass flagged in scripts/proposed_work_triage.py — adding ["--input", ""] here would currently fail (exit code ≠ 2) until that helper is fixed. After the fix, extend this test.

Suggested extension (after fixing the validation bug)
     (
+        (["--input", ""], "--input must be a non-empty value"),
+        (["--input", "   "], "--input must be a non-empty value"),
+        (["--input", " fixture.json "], "--input must not include"),
         (["--repo", ""], "--repo must be a non-empty value"),
         (["--repo", "   "], "--repo must be a non-empty value"),
         (["--repo", " ramimbo/mergework "], "--repo must not include"),
     ),
 )
-def test_proposed_work_triage_rejects_empty_repo_source(
+def test_proposed_work_triage_rejects_empty_source_args(

As per coding guidelines, "Add or update tests for changed behavior."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_proposed_work_triage.py` around lines 787 - 793, The current
validation test coverage in test_proposed_work_triage_rejects_empty_repo_source
only checks --repo and misses the known --input empty-value bypass. First fix
the validation in the proposed_work_triage helper in
scripts/proposed_work_triage.py so blank or whitespace-padded --input is
rejected with the same exit code and message pattern as --repo, then extend the
parametrization in test_proposed_work_triage_rejects_empty_repo_source to
include empty and whitespace-only --input cases and assert they fail as
expected.

Source: Coding guidelines

scripts/proposed_work_triage.py (1)

533-539: 🩺 Stability & Availability | 🟠 Major | 🏗️ Heavy lift

--input "" still bypasses validation and produces a traceback. argparse applies type=Path before _require_input_path runs, so an empty value becomes Path("") whose str() is ".". That value is non-empty and whitespace-free, so it passes both checks, and read_text(".") then raises IsADirectoryError instead of argparse's exit-2 path — defeating the PR's goal of bounded argparse-style errors.

The other two scripts avoid this by keeping --input a plain string and validating the raw value. Mirror that here: drop type=Path, validate the raw string with _require_non_empty_arg, then convert to Path.

🐛 Proposed fix
-    source.add_argument("--input", type=Path, help="Offline JSON fixture with issues/payments")
+    source.add_argument("--input", help="Offline JSON fixture with issues/payments")
     if args.input is not None:
-        input_path = _require_input_path(parser, args.input)
+        input_path = Path(_require_non_empty_arg(parser, "--input", args.input))
         data = json.loads(input_path.read_text(encoding="utf-8"))

_require_input_path can then be removed.

Also applies to: 564-566

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@scripts/proposed_work_triage.py` around lines 533 - 539, The input-path
validation is happening too late because _require_input_path receives a Path
object, so `--input ""` is converted to `Path("")` and slips through as `"."`,
leading to a traceback instead of an argparse error. Update the argument
handling in the relevant parser setup and the helper used by the triage script
so `--input` is accepted as a raw string, validated with
`_require_non_empty_arg`, and only then converted to Path; if that flow is used,
`_require_input_path` can be removed.
tests/test_pr_queue_health.py (1)

553-559: 📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add the padded --input case to lock the whitespace branch. Empty/whitespace --input is covered, but the "must not include leading or trailing whitespace" branch is only proven for --repo. A regression in --input whitespace handling would pass silently.

Suggested extension
         (["--input", ""], "--input must be a non-empty value"),
         (["--input", "   "], "--input must be a non-empty value"),
+        (["--input", " queue.json "], "--input must not include"),
         (["--repo", ""], "--repo must be a non-empty value"),

As per coding guidelines, "Add or update tests for changed behavior."

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@tests/test_pr_queue_health.py` around lines 553 - 559, Add a test case in
test_pr_queue_health.py alongside the existing argument validation table to
cover a padded --input value (for example, a non-empty string with
leading/trailing spaces) and assert it hits the same whitespace-rejection branch
as --repo. Use the existing CLI validation fixture or parameterized cases around
the --input checks so the behavior is locked in for the input parser and any
regression in whitespace handling is caught.

Source: Coding guidelines

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Duplicate comments:
In `@scripts/proposed_work_triage.py`:
- Around line 533-539: The input-path validation is happening too late because
_require_input_path receives a Path object, so `--input ""` is converted to
`Path("")` and slips through as `"."`, leading to a traceback instead of an
argparse error. Update the argument handling in the relevant parser setup and
the helper used by the triage script so `--input` is accepted as a raw string,
validated with `_require_non_empty_arg`, and only then converted to Path; if
that flow is used, `_require_input_path` can be removed.

In `@tests/test_pr_queue_health.py`:
- Around line 553-559: Add a test case in test_pr_queue_health.py alongside the
existing argument validation table to cover a padded --input value (for example,
a non-empty string with leading/trailing spaces) and assert it hits the same
whitespace-rejection branch as --repo. Use the existing CLI validation fixture
or parameterized cases around the --input checks so the behavior is locked in
for the input parser and any regression in whitespace handling is caught.

In `@tests/test_proposed_work_triage.py`:
- Around line 787-793: The current validation test coverage in
test_proposed_work_triage_rejects_empty_repo_source only checks --repo and
misses the known --input empty-value bypass. First fix the validation in the
proposed_work_triage helper in scripts/proposed_work_triage.py so blank or
whitespace-padded --input is rejected with the same exit code and message
pattern as --repo, then extend the parametrization in
test_proposed_work_triage_rejects_empty_repo_source to include empty and
whitespace-only --input cases and assert they fail as expected.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: d4878be6-3fa0-4e6a-b10e-2c0015ed2a77

📥 Commits

Reviewing files that changed from the base of the PR and between 59db114 and c3fa196.

📒 Files selected for processing (6)
  • scripts/claim_inventory.py
  • scripts/pr_queue_health.py
  • scripts/proposed_work_triage.py
  • tests/test_claim_inventory.py
  • tests/test_pr_queue_health.py
  • tests/test_proposed_work_triage.py

@yanyishuai

Copy link
Copy Markdown
Author

@qingfeng312 #807 report source-arg validation is green on latest head. Scoped argparse hardening; ready for merge.

Wallet: Do4v7foHJvRJLpRRoGaVPWX6DDEjX3yTK7J91gpwUQpE

@qingfeng312 qingfeng312 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed updated head c3fa1965b2f124aed53f555bf299caad66502c57.

The import/CI blocker is fixed, but one source-argument path still needs changes before merge. In scripts/proposed_work_triage.py, --input is still declared with type=Path and then passed to _require_input_path(). That means --input "" is converted by argparse into Path("") / . before validation, so it can fall through to read_text() and raise an unbounded directory/error path instead of the intended argparse exit-2 message.

Please keep --input as a raw string for this script too, validate it with the same _require_non_empty_arg(parser, "--input", args.input) helper, then convert to Path only after validation. Add regression coverage for empty, whitespace-only, and padded --input values in tests/test_proposed_work_triage.py; adding the padded --input case to tests/test_pr_queue_health.py would also lock the whitespace branch.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposed work: harden remaining report script source validation

2 participants