Unify live GitHub collection safety-cap policy across report scripts#1165
Unify live GitHub collection safety-cap policy across report scripts#1165yanyishuai wants to merge 1 commit into
Conversation
|
Warning Review limit reached
Next review available in: 44 minutes Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available. How can I continue?After more reviews become available, a review can be triggered using the To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews. How do review limits work?CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability. For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window. Please refer docs for additional details. Review details⚙️ Run configurationConfiguration used: Path: .coderabbit.yaml Review profile: ASSERTIVE Plan: Pro Plus Run ID: 📒 Files selected for processing (5)
📝 WalkthroughBelowCap: len(items) < cap[*] --> AtOrAboveCap: len(items) >= cap |
There was a problem hiding this comment.
Actionable comments posted: 5
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro Plus
Run ID: 99e997bf-9ef0-4f4e-956b-343a96c32fa9
📒 Files selected for processing (7)
scripts/check_live_bounty_closing_refs.pyscripts/gh_collection_caps.pyscripts/pr_queue_health.pyscripts/review_bounty_candidates.pyscripts/submission_quality_gate.pytests/test_gh_collection_caps.pytests/test_pr_queue_health.py
| def load_live_data(repo: str, api_host: str, state: str, pr_numbers: list[int]) -> dict[str, Any]: | ||
| return { | ||
| "bounties": load_public_bounty_list(api_host, query="status=open&limit=200"), | ||
| "pull_requests": _load_pull_requests(repo, state, pr_numbers), |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Filter the public bounty query by repo.
analyze_closing_refs() only matches on issue numbers, but Line 164 now loads every open bounty from the public API. GitHub issue numbers are repo-scoped, so an open bounty #123 in another repo can collide with this repo’s #123 and trigger a false violation; the global 200-row cap can also crowd out this repo’s own bounties.
Suggested fix
def load_live_data(repo: str, api_host: str, state: str, pr_numbers: list[int]) -> dict[str, Any]:
return {
- "bounties": load_public_bounty_list(api_host, query="status=open&limit=200"),
+ "bounties": load_public_bounty_list(
+ api_host,
+ query=f"status=open&limit=200&repo={repo}",
+ ),
"pull_requests": _load_pull_requests(repo, state, pr_numbers),
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def load_live_data(repo: str, api_host: str, state: str, pr_numbers: list[int]) -> dict[str, Any]: | |
| return { | |
| "bounties": load_public_bounty_list(api_host, query="status=open&limit=200"), | |
| "pull_requests": _load_pull_requests(repo, state, pr_numbers), | |
| def load_live_data(repo: str, api_host: str, state: str, pr_numbers: list[int]) -> dict[str, Any]: | |
| return { | |
| "bounties": load_public_bounty_list( | |
| api_host, | |
| query=f"status=open&limit=200&repo={repo}", | |
| ), | |
| "pull_requests": _load_pull_requests(repo, state, pr_numbers), |
| mode: str = FAIL_ON_SATURATION, | ||
| issue_warn_detail: str | None = None, | ||
| ) -> str | None: | ||
| if len(items) < cap: | ||
| return None | ||
| message = saturation_message( | ||
| collection=collection, | ||
| cap=cap, | ||
| warn_only=(mode == WARN_ON_SATURATION), | ||
| issue_warn_detail=issue_warn_detail, | ||
| ) | ||
| if mode == FAIL_ON_SATURATION: | ||
| raise RuntimeError(message) | ||
| return message |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟠 Major | ⚡ Quick win
Reject unsupported saturation modes explicitly.
Any value other than "fail" skips the exception path here. A typo silently downgrades fail-fast callers into warn-only behavior and can let saturated gh output be treated as trustworthy. Validate mode up front and raise ValueError for unknown values.
Proposed fix
def check_collection_saturation(
items: list[Any],
*,
cap: int,
collection: str,
mode: str = FAIL_ON_SATURATION,
issue_warn_detail: str | None = None,
) -> str | None:
+ if mode not in {FAIL_ON_SATURATION, WARN_ON_SATURATION}:
+ raise ValueError(f"unsupported saturation mode: {mode}")
if len(items) < cap:
return None
message = saturation_message(
collection=collection,
cap=cap,📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| mode: str = FAIL_ON_SATURATION, | |
| issue_warn_detail: str | None = None, | |
| ) -> str | None: | |
| if len(items) < cap: | |
| return None | |
| message = saturation_message( | |
| collection=collection, | |
| cap=cap, | |
| warn_only=(mode == WARN_ON_SATURATION), | |
| issue_warn_detail=issue_warn_detail, | |
| ) | |
| if mode == FAIL_ON_SATURATION: | |
| raise RuntimeError(message) | |
| return message | |
| mode: str = FAIL_ON_SATURATION, | |
| issue_warn_detail: str | None = None, | |
| ) -> str | None: | |
| if mode not in {FAIL_ON_SATURATION, WARN_ON_SATURATION}: | |
| raise ValueError(f"unsupported saturation mode: {mode}") | |
| if len(items) < cap: | |
| return None | |
| message = saturation_message( | |
| collection=collection, | |
| cap=cap, | |
| warn_only=(mode == WARN_ON_SATURATION), | |
| issue_warn_detail=issue_warn_detail, | |
| ) | |
| if mode == FAIL_ON_SATURATION: | |
| raise RuntimeError(message) | |
| return message |
| def test_check_collection_saturation_passes_below_cap() -> None: | ||
| caps.check_collection_saturation([1, 2], cap=3, collection="pr") | ||
|
|
||
|
|
||
| def test_check_collection_saturation_fails_at_cap() -> None: | ||
| with pytest.raises(RuntimeError, match="pr list reached the 2 item safety cap"): | ||
| caps.check_collection_saturation([1, 2], cap=2, collection="pr") | ||
|
|
||
|
|
||
| def test_check_collection_saturation_warn_mode_returns_message() -> None: | ||
| message = caps.check_collection_saturation( | ||
| [1, 2], | ||
| cap=2, | ||
| collection="issue", | ||
| mode=caps.WARN_ON_SATURATION, | ||
| ) | ||
| assert message is not None | ||
| assert "referenced-issue checks may be incomplete" in message |
There was a problem hiding this comment.
🎯 Functional Correctness | 🟡 Minor | ⚡ Quick win
Cover the full helper contract in the regression tests.
The new suite never asserts the below-cap None return, and it does not exercise the issue_warn_detail override that scripts/submission_quality_gate.py:682-688 now depends on. A regression in either branch would still pass here.
Proposed test additions
def test_check_collection_saturation_passes_below_cap() -> None:
- caps.check_collection_saturation([1, 2], cap=3, collection="pr")
+ assert caps.check_collection_saturation([1, 2], cap=3, collection="pr") is None
@@
def test_check_collection_saturation_warn_mode_returns_message() -> None:
message = caps.check_collection_saturation(
[1, 2],
cap=2,
collection="issue",
mode=caps.WARN_ON_SATURATION,
)
assert message is not None
assert "referenced-issue checks may be incomplete" in message
+
+
+def test_check_collection_saturation_warn_mode_uses_custom_issue_detail() -> None:
+ message = caps.check_collection_saturation(
+ [1, 2],
+ cap=2,
+ collection="issue",
+ mode=caps.WARN_ON_SATURATION,
+ issue_warn_detail="bounty discovery may be incomplete",
+ )
+ assert (
+ message
+ == "gh issue list reached the 2 item safety cap; "
+ "bounty discovery may be incomplete"
+ )📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| def test_check_collection_saturation_passes_below_cap() -> None: | |
| caps.check_collection_saturation([1, 2], cap=3, collection="pr") | |
| def test_check_collection_saturation_fails_at_cap() -> None: | |
| with pytest.raises(RuntimeError, match="pr list reached the 2 item safety cap"): | |
| caps.check_collection_saturation([1, 2], cap=2, collection="pr") | |
| def test_check_collection_saturation_warn_mode_returns_message() -> None: | |
| message = caps.check_collection_saturation( | |
| [1, 2], | |
| cap=2, | |
| collection="issue", | |
| mode=caps.WARN_ON_SATURATION, | |
| ) | |
| assert message is not None | |
| assert "referenced-issue checks may be incomplete" in message | |
| def test_check_collection_saturation_passes_below_cap() -> None: | |
| assert caps.check_collection_saturation([1, 2], cap=3, collection="pr") is None | |
| def test_check_collection_saturation_fails_at_cap() -> None: | |
| with pytest.raises(RuntimeError, match="pr list reached the 2 item safety cap"): | |
| caps.check_collection_saturation([1, 2], cap=2, collection="pr") | |
| def test_check_collection_saturation_warn_mode_returns_message() -> None: | |
| message = caps.check_collection_saturation( | |
| [1, 2], | |
| cap=2, | |
| collection="issue", | |
| mode=caps.WARN_ON_SATURATION, | |
| ) | |
| assert message is not None | |
| assert "referenced-issue checks may be incomplete" in message | |
| def test_check_collection_saturation_warn_mode_uses_custom_issue_detail() -> None: | |
| message = caps.check_collection_saturation( | |
| [1, 2], | |
| cap=2, | |
| collection="issue", | |
| mode=caps.WARN_ON_SATURATION, | |
| issue_warn_detail="bounty discovery may be incomplete", | |
| ) | |
| assert ( | |
| message | |
| == "gh issue list reached the 2 item safety cap; " | |
| "bounty discovery may be incomplete" | |
| ) |
Sources: Coding guidelines, Path instructions
| def test_pr_queue_health_fails_fast_when_issue_fetch_hits_cap(monkeypatch) -> None: | ||
| def fake_run(args, **kwargs): | ||
| if args[:3] == ["gh", "pr", "list"]: | ||
| stdout = "[]" | ||
| elif args[:3] == ["gh", "issue", "list"]: | ||
| stdout = json.dumps( | ||
| [ | ||
| {"number": number, "title": "MRWK bounty: many", "state": "OPEN"} | ||
| for number in range(1, 202) | ||
| ] | ||
| ) | ||
| else: | ||
| raise AssertionError(args) | ||
| return subprocess.CompletedProcess(args=args, returncode=0, stdout=stdout, stderr="") | ||
| monkeypatch.setattr(pr_queue_health.subprocess, "run", fake_run) | ||
| with pytest.raises(RuntimeError, match="issue list reached the 200 item safety cap"): | ||
| pr_queue_health.load_live_queue("ramimbo/mergework") | ||
| def test_pr_queue_health_fails_fast_when_pr_fetch_hits_cap(monkeypatch) -> None: | ||
| def fake_run(args, **kwargs): | ||
| if args[:3] == ["gh", "pr", "list"]: | ||
| stdout = json.dumps( | ||
| [ | ||
| { | ||
| "number": number, | ||
| "title": "Open PR", | ||
| "body": "Refs #1", | ||
| "labels": [], | ||
| "mergeStateStatus": "clean", | ||
| } | ||
| for number in range(1, 202) | ||
| ] | ||
| ) | ||
| elif args[:3] == ["gh", "issue", "list"]: | ||
| stdout = "[]" | ||
| else: | ||
| raise AssertionError(args) | ||
| return subprocess.CompletedProcess(args=args, returncode=0, stdout=stdout, stderr="") | ||
| monkeypatch.setattr(pr_queue_health.subprocess, "run", fake_run) | ||
| with pytest.raises(RuntimeError, match="pr list reached the 200 item safety cap"): | ||
| pr_queue_health.load_live_queue("ramimbo/mergework") |
There was a problem hiding this comment.
📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win
Assert the requested gh --limit in these saturation regressions.
Both fakes always return 201 rows, so these tests only prove the updated error wording. They would still pass if load_live_queue() stopped sending --limit 200, which is part of this change. Capture the command args and assert the list call used --limit 200, or make the fake honor the requested limit so the boundary wiring is actually covered. As per path instructions, "Do not request docstrings. Focus on whether tests prove the changed behavior and include negative, replay, boundary, or regression cases where relevant."
🧰 Tools
🪛 ast-grep (0.44.0)
[info] 507-512: use jsonify instead of json.dumps for JSON output
Context: json.dumps(
[
{"number": number, "title": "MRWK bounty: many", "state": "OPEN"}
for number in range(1, 202)
]
)
Note: [CWE-116] Improper Encoding or Escaping of Output.
(use-jsonify)
[info] 526-537: use jsonify instead of json.dumps for JSON output
Context: json.dumps(
[
{
"number": number,
"title": "Open PR",
"body": "Refs #1",
"labels": [],
"mergeStateStatus": "clean",
}
for number in range(1, 202)
]
)
Note: [CWE-116] Improper Encoding or Escaping of Output.
(use-jsonify)
Source: Path instructions
qingfeng312
left a comment
There was a problem hiding this comment.
Reviewed current head 2bf1f7822ac5d120404ba867f2402fecd7da98b2.
This branch is not merge-ready because the required CI gate fails before the suite can run. In CI run 28320025428, pytest collection reports three import errors:
tests/test_check_live_bounty_closing_refs.py->ModuleNotFoundError: No module named 'scripts.gh_cli'tests/test_pr_queue_health.py->ModuleNotFoundError: No module named 'scripts.gh_cli'tests/test_review_bounty_candidates.py->ModuleNotFoundError: No module named 'scripts.gh_cli'
The PR introduces scripts/gh_collection_caps.py and updates multiple report scripts to import shared helpers, but the scripts.gh_cli module they now depend on is not part of this branch. A clean checkout therefore cannot collect tests.
Please include the missing shared gh_cli helper or avoid the dependency on it in this PR.
Scope checked: CI log, current PR metadata, CodeRabbit status, and changed-file list only. No wallet, treasury, payout, private data, credentials, or external mutation paths were exercised.
2bf1f78 to
8a834b9
Compare
qingfeng312
left a comment
There was a problem hiding this comment.
Reviewed current head 8a834b9200fe1d3b312af181f2733969b4bbd25c.
This is still not merge-ready. The PR body says it adds scripts/gh_collection_caps.py and tests/test_gh_collection_caps.py, but the current diff contains neither file; the scripts continue to carry local collection cap constants/policy. That means the shared cap-policy abstraction described by #1141 is not actually implemented yet. The hosted quality/readiness check is also failing.
Please add the shared policy module and targeted tests, wire the scripts to that shared module, and reduce the large line-ending/no-op rewrite so the reviewable diff is limited to the intended behavior.
Summary
Implements proposed work for #1141.
eview_bounty_candidates.py, check_live_bounty_closing_refs.py, and submission_quality_gate.py.
Test plan
Closes #1141
Summary by CodeRabbit
New Features
Bug Fixes