Skip to content

Standardize public API response-shape validation across maintenance scripts#1164

Open
yanyishuai wants to merge 1 commit into
ramimbo:mainfrom
yanyishuai:bounty-1143-public-api-json
Open

Standardize public API response-shape validation across maintenance scripts#1164
yanyishuai wants to merge 1 commit into
ramimbo:mainfrom
yanyishuai:bounty-1143-public-api-json

Conversation

@yanyishuai

@yanyishuai yanyishuai commented Jun 28, 2026

Copy link
Copy Markdown

Summary

Implements proposed work for #1143 (and shared fetch helper from #1144).

  • Add scripts/public_api_json.py with shared JSON fetch, list/object shape checks, bounty list loading, and activity validation.
  • Migrate check_bounty_issue_states.py, check_live_bounty_closing_refs.py, claim_inventory.py, and proposed_work_triage.py to use the shared helper instead of duplicated urllib blocks.
  • Add tests/test_public_api_json.py covering shape validation helpers.

Test plan

  • Module imports cleanly
  • pytest tests/test_public_api_json.py tests/test_check_bounty_issue_states.py tests/test_claim_inventory.py

Closes #1143

Summary by CodeRabbit

  • New Features

    • Added centralized handling for public API data used by multiple scripts.
  • Bug Fixes

    • Improved reliability when loading bounty, activity, and related issue data.
    • Added stronger validation and clearer error handling for unexpected or malformed responses.
  • Tests

    • Added coverage for JSON validation, data filtering, and normalized public API state extraction.

@coderabbitai

coderabbitai Bot commented Jun 28, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

@yanyishuai, you've reached your PR review limit, so we couldn't start this review.

Next review available in: 17 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: d6961596-0f3a-4a42-8a8b-da4c33a838ca

📥 Commits

Reviewing files that changed from the base of the PR and between 52c920f and e8ba33d.

📒 Files selected for processing (4)
  • scripts/check_bounty_issue_states.py
  • scripts/check_live_bounty_closing_refs.py
  • scripts/gh_cli.py
  • scripts/public_api_json.py
📝 Walkthrough

Walkthrough

Adds scripts/public_api_json.py with centralized HTTP JSON fetching, type-shape validators, and endpoint loaders for the public bounty and activity APIs. Migrates check_bounty_issue_states.py, check_live_bounty_closing_refs.py, claim_inventory.py, and proposed_work_triage.py to use these shared helpers, removing their local urllib and subprocess implementations.

Shared public API module and script migration

Layer / File(s) Summary
New public_api_json module and tests
scripts/public_api_json.py, tests/test_public_api_json.py
Defines fetch_public_json, ensure_json_list, ensure_json_object, dict_rows, load_public_bounty_list, validate_public_activity, load_public_activity, extract_public_api_state, and load_public_api_state. Tests cover type-assertion failures, dict_rows filtering, validate_public_activity rejection, and extract_public_api_state normalization.
check_bounty_issue_states migration
scripts/check_bounty_issue_states.py
Removes local _fetch_json, _load_public_bounties, and _run_gh_json; load_live_data calls load_public_bounty_list; GH_TIMEOUT_SECONDS imported as DEFAULT_GH_TIMEOUT_SECONDS; mutating _run_gh stays local.
check_live_bounty_closing_refs migration
scripts/check_live_bounty_closing_refs.py
Replaces local urllib fetch and subprocess wrapper with load_public_bounty_list and imported run_gh_json; analysis and reporting logic unchanged.
claim_inventory migration
scripts/claim_inventory.py
Removes _get_json and urllib imports; load_public_api_state delegates to fetch_public_api_state(api_host, limit=GH_LIMIT); inventory analysis, surface extraction, and formatting unchanged.
proposed_work_triage migration
scripts/proposed_work_triage.py
_load_public_bounty_issue calls fetch_public_json for list and detail URLs; error handling switches from urllib/JSON exceptions to RuntimeError; fallback and warning behavior preserved.

Possibly related issues

Possibly related PRs

  • ramimbo/mergework#582: Directly connected — claim_inventory.py in that PR uses fetch_public_json/load_public_api_state helpers introduced by this PR's new scripts/public_api_json.py.
  • ramimbo/mergework#763: Both PRs modify scripts/proposed_work_triage.py; the retrieved PR adds triage report logic that this PR's refactor touches at the HTTP fetch layer.
  • ramimbo/mergework#1021: Introduced check_bounty_issue_states.py and check_live_bounty_closing_refs.py, both of which this PR migrates to shared helpers.
🚥 Pre-merge checks | ✅ 5 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Description check ⚠️ Warning The description has a summary and partial test plan, but it omits the required Evidence, Test Evidence, and MRWK sections. Add the missing template sections: Evidence bullets, full Test Evidence checklist, and MRWK with the related bounty or issue.
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed The title clearly names the changed surface: public API response-shape validation across maintenance scripts.
Linked Issues check ✅ Passed The PR matches #1143 by centralizing JSON shape validation, migrating affected scripts, and adding focused regression tests.
Out of Scope Changes check ✅ Passed The diff stays within shared public-API validation and script migrations, with no unrelated feature or lifecycle changes.
Mergework Public Artifact Hygiene ✅ Passed No README/docs changes were present, and the PR description only describes helper migrations/tests—no investment, price, cash-out/off-ramp, payout, or security claims.
Bounty Pr Focus ✅ Passed PASS: Diff stays within the stated bounty/public-API maintenance scripts, shared helper, and tests; no unrelated subsystems surfaced.

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands.

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4


ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro Plus

Run ID: 44b91f10-03c4-4772-94c7-be845249b9b5

📥 Commits

Reviewing files that changed from the base of the PR and between 3bc87d2 and 52c920f.

📒 Files selected for processing (6)
  • scripts/check_bounty_issue_states.py
  • scripts/check_live_bounty_closing_refs.py
  • scripts/claim_inventory.py
  • scripts/proposed_work_triage.py
  • scripts/public_api_json.py
  • tests/test_public_api_json.py

Comment thread scripts/claim_inventory.py Outdated

from scripts.api_host_args import public_api_host
from scripts.bounty_refs import BOUNTY_REF_RE
from scripts.public_api_json import load_public_api_state as fetch_public_api_state

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟠 Major | ⚡ Quick win

Keep claim inventory on the best-effort public API path.

fetch_public_api_state() is the strict loader: it calls validate_public_activity() and now raises if /api/v1/activity contains a non-list contributors or recent. claim_inventory can still produce a useful live report from GitHub data plus any valid public rows, so this delegation turns a partial public-API shape problem into a hard failure for every --repo run.

Proposed fix
-from scripts.public_api_json import load_public_api_state as fetch_public_api_state
+from scripts.public_api_json import extract_public_api_state, fetch_public_json
...
 def load_public_api_state(api_host: str) -> dict[str, Any]:
-    return fetch_public_api_state(api_host, limit=GH_LIMIT)
+    host = api_host.rstrip("/")
+    bounties = fetch_public_json(f"{host}/api/v1/bounties?limit={GH_LIMIT}")
+    activity = fetch_public_json(f"{host}/api/v1/activity?limit={GH_LIMIT}")
+    return extract_public_api_state(bounties, activity)

Also applies to: 572-573

Comment thread scripts/proposed_work_triage.py Outdated
Comment on lines +429 to +455
try:
rows = fetch_public_json(list_url)
except RuntimeError as exc:
warnings.append(
"payment_state_incomplete: failed to load public bounty list "
f"for issue #{issue_number} ({type(exc).__name__})"
)
return [], warnings
bounties: list[dict[str, Any]] = []
for row in rows if isinstance(rows, list) else []:
if not isinstance(row, dict):
continue
bounty_id = row.get("id")
if not isinstance(bounty_id, int):
continue
detail_url = f"{api_host.rstrip('/')}/api/v1/bounties/{bounty_id}"
try:
detail = fetch_public_json(detail_url)
except RuntimeError as exc:
warnings.append(
"payment_state_incomplete: failed to load public bounty "
f"detail for bounty {bounty_id}; using list row only ({type(exc).__name__})"
)
detail = row
if isinstance(detail, dict):
bounties.append(detail)
return bounties, warnings

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

Validate the bounty list/detail shapes before deriving payment state.

This path now uses fetch_public_json() directly, which only guarantees “parseable JSON.” A non-list /api/v1/bounties?... response is silently treated as no rows, and a non-object /api/v1/bounties/{id} response is silently dropped. That turns malformed public API responses into missing payment data and can suppress accepted_pending_payout / proof_backed_paid warnings, which conflicts with the PR’s shared shape-validation goal.

Proposed fix
-from scripts.public_api_json import fetch_public_json
+from scripts.public_api_json import dict_rows, ensure_json_object, fetch_public_json
...
     try:
-        rows = fetch_public_json(list_url)
+        rows = dict_rows(fetch_public_json(list_url), url=list_url)
     except RuntimeError as exc:
         warnings.append(
             "payment_state_incomplete: failed to load public bounty list "
             f"for issue #{issue_number} ({type(exc).__name__})"
         )
         return [], warnings
     bounties: list[dict[str, Any]] = []
-    for row in rows if isinstance(rows, list) else []:
-        if not isinstance(row, dict):
-            continue
+    for row in rows:
         bounty_id = row.get("id")
         if not isinstance(bounty_id, int):
             continue
         detail_url = f"{api_host.rstrip('/')}/api/v1/bounties/{bounty_id}"
         try:
-            detail = fetch_public_json(detail_url)
+            detail = ensure_json_object(
+                fetch_public_json(detail_url), url=detail_url, label="bounty detail"
+            )
         except RuntimeError as exc:
             warnings.append(
                 "payment_state_incomplete: failed to load public bounty "
                 f"detail for bounty {bounty_id}; using list row only ({type(exc).__name__})"
             )
             detail = row
-        if isinstance(detail, dict):
-            bounties.append(detail)
+        bounties.append(detail)
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
try:
rows = fetch_public_json(list_url)
except RuntimeError as exc:
warnings.append(
"payment_state_incomplete: failed to load public bounty list "
f"for issue #{issue_number} ({type(exc).__name__})"
)
return [], warnings
bounties: list[dict[str, Any]] = []
for row in rows if isinstance(rows, list) else []:
if not isinstance(row, dict):
continue
bounty_id = row.get("id")
if not isinstance(bounty_id, int):
continue
detail_url = f"{api_host.rstrip('/')}/api/v1/bounties/{bounty_id}"
try:
detail = fetch_public_json(detail_url)
except RuntimeError as exc:
warnings.append(
"payment_state_incomplete: failed to load public bounty "
f"detail for bounty {bounty_id}; using list row only ({type(exc).__name__})"
)
detail = row
if isinstance(detail, dict):
bounties.append(detail)
return bounties, warnings
try:
rows = dict_rows(fetch_public_json(list_url), url=list_url)
except RuntimeError as exc:
warnings.append(
"payment_state_incomplete: failed to load public bounty list "
f"for issue #{issue_number} ({type(exc).__name__})"
)
return [], warnings
bounties: list[dict[str, Any]] = []
for row in rows:
bounty_id = row.get("id")
if not isinstance(bounty_id, int):
continue
detail_url = f"{api_host.rstrip('/')}/api/v1/bounties/{bounty_id}"
try:
detail = ensure_json_object(
fetch_public_json(detail_url), url=detail_url, label="bounty detail"
)
except RuntimeError as exc:
warnings.append(
"payment_state_incomplete: failed to load public bounty "
f"detail for bounty {bounty_id}; using list row only ({type(exc).__name__})"
)
detail = row
bounties.append(detail)
return bounties, warnings

Comment on lines +15 to +21
try:
with urllib.request.urlopen(request, timeout=timeout_seconds) as response:
return json.loads(response.read().decode("utf-8"))
except (TimeoutError, urllib.error.URLError) as exc:
raise RuntimeError(f"public API request failed: {url}") from exc
except json.JSONDecodeError as exc:
raise RuntimeError(f"public API returned invalid JSON from {url}") from exc

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🩺 Stability & Availability | 🟡 Minor | ⚡ Quick win

Normalize invalid UTF-8 into the RuntimeError path. response.read().decode("utf-8") can raise UnicodeDecodeError before the current handler, so callers that only treat RuntimeError as a soft failure will still see a raw exception here. Wrap decode/parse failures together.

🧰 Tools
🪛 ast-grep (0.44.0)

[warning] 15-15: Request-controlled URL passed to urlopen; validate against an allowlist to prevent SSRF.
Context: urllib.request.urlopen(request, timeout=timeout_seconds)
Note: [CWE-918] Server-Side Request Forgery (SSRF).

(urlopen-unsanitized-data)

Comment thread tests/test_public_api_json.py Outdated
Comment on lines +8 to +37
def test_ensure_json_list_rejects_object() -> None:
with pytest.raises(RuntimeError, match="expected a JSON list"):
pai.ensure_json_list({}, url="https://example.test/bounties")


def test_ensure_json_object_rejects_list() -> None:
with pytest.raises(RuntimeError, match="expected a JSON object"):
pai.ensure_json_object([], url="https://example.test/activity")


def test_validate_public_activity_rejects_bad_contributors() -> None:
with pytest.raises(RuntimeError, match="contributors list"):
pai.validate_public_activity({"contributors": "bad", "recent": []}, url="https://example.test/activity")


def test_dict_rows_skips_non_dict_entries() -> None:
rows = pai.dict_rows([{"id": 1}, "skip", {"id": 2}], url="https://example.test/bounties")
assert rows == [{"id": 1}, {"id": 2}]


def test_extract_public_api_state_keeps_valid_sections() -> None:
state = pai.extract_public_api_state(
[{"id": 1}],
{"contributors": [{"name": "a"}], "recent": [{"id": 2}], "extra": True},
)
assert state == {
"bounties": [{"id": 1}],
"contributors": [{"name": "a"}],
"recent": [{"id": 2}],
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

📐 Maintainability & Code Quality | 🟡 Minor | ⚡ Quick win

Add a regression for the shared fetch-helper failure contract.

This suite never exercises fetch_public_json, even though the migrated scripts now depend on it raising RuntimeError on transport/parse failures. A small monkeypatch-based test for URLError plus malformed UTF-8/JSON would lock that contract down and catch regressions in the shared path. As per coding guidelines, "Add or update tests for changed behavior." As per path instructions, "Focus on whether tests prove the changed behavior and include negative, replay, boundary, or regression cases where relevant."

Sources: Coding guidelines, Path instructions

@qingfeng312 qingfeng312 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reviewed current head 52c920feab49f49ed78373f03035fe908698bcfe.

The PR currently fails the required quality gate during pytest collection. CI run 28318397540 reports:

  • tests/test_check_bounty_issue_states.py cannot import scripts.check_bounty_issue_states because scripts.gh_cli is missing.
  • tests/test_check_live_bounty_closing_refs.py cannot import scripts.check_live_bounty_closing_refs because scripts.gh_cli is missing.

This PR adds scripts/public_api_json.py and migrates several maintenance scripts to shared public API loaders, but the migrated scripts also import scripts.gh_cli, which is not present in the branch. That makes the shared response-shape validation unusable in a clean CI checkout.

Please add the missing helper or keep this refactor self-contained before merge.

Scope checked: CI log, current PR metadata, CodeRabbit status, and changed-file list only. No wallet, treasury, payout, private data, credentials, or external mutation paths were exercised.

@yanyishuai yanyishuai force-pushed the bounty-1143-public-api-json branch from ae50578 to e8ba33d Compare July 1, 2026 03:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Proposed work: standardize public API response-shape validation across maintenance scripts

2 participants