fix(reward): add missing detail page patterns for 4 plugins#17
Conversation
|
@angosr please review this PR. thanks |
|
Hi, @angosr any updates for me, please. It took long time since I submitted this PR. |
angosr
left a comment
There was a problem hiding this comment.
Review: PR #17 — APPROVE
Significance Gate: PASS
Fixes a real bug: 4 of 8 active plugins had no detail page patterns, meaning agents received zero exploration reward for visiting detail pages on ArXiv, OpenLibrary, Open-Meteo, and HackerNews. This silently degrades RL training quality.
Independent verification (not relying on PR's own test claims)
1. Regex patterns — 10/10 correct (tested independently)
| URL | Expected | Result |
|---|---|---|
arxiv.org/abs/2603.16870 |
detail=True | ✅ |
arxiv.org/abs/2604.00005 (5-digit) |
detail=True | ✅ |
arxiv.org/abs/2603.16870v2 (version suffix) |
detail=True | ✅ |
arxiv.org/pdf/2603.16870 (PDF, not abs) |
detail=False | ✅ |
arxiv.org/list/cs.AI/new (listing) |
detail=False | ✅ |
openlibrary.org/works/OL103123W/Dune |
detail=True | ✅ |
openlibrary.org/search?q=dune |
detail=False | ✅ |
open-meteo.com/en/docs?latitude=-33.87&longitude=151.21 (negative coords) |
detail=True | ✅ |
open-meteo.com/en/docs (no params) |
detail=False | ✅ |
news.ycombinator.com/item?id=12345&goto=news (extra params) |
detail=True | ✅ |
2. URL normalization — preserves essential params, strips noise
- HN
?id=12345&p=2→?id=12345✅ - OpenMeteo
?latitude=35.68&longitude=139.65&hourly=temp→?latitude=35.68&longitude=139.65✅ - OpenMeteo missing longitude → falls through to default (strips all params) ✅
3. Asset extraction — correct for all edge cases
- ArXiv
v2suffix → extracts2603.16870(strips version) ✅ - OpenMeteo negative coords → extracts
-33.87,151.21✅
4. Existing tests: 30/30 passed, zero regressions
Minor note (non-blocking)
No new automated tests added for the 4 new plugins' patterns. The existing 30 tests only cover the original 4 plugins. Recommend adding parametrized tests in a follow-up, but the patterns are simple regex additions following the established pattern and verified above.
|
@angosr Thanks for approving. would you merge this PR? |
Problem
DETAIL_PAGE_PATTERNSincore/reward.pyonly covers the 4 original plugins (CoinGecko, Stooq, Taostats, Weather). The 4 newer plugins — ArXiv (#9), OpenLibrary (#2), Open-Meteo (#5), and HackerNews — have no detail page patterns at all.This means:
is_detail_page()always returnsFalsefor these pluginsDETAIL_PAGE_VISITreward signal (+0.03) never fires_extract_asset_from_url()returnsNone, so no asset confirmation tracking occurs_normalize_url()strips query params that identify resources on HackerNews (?id=) and Open-Meteo (?latitude=&longitude=), causing false "repeated URL" penaltiesImpact: Agents browsing detail pages on 4 out of 8 active plugins receive no exploration reward for drilling into content. This silently degrades RL training quality for half the plugin portfolio.
Summary
DETAIL_PAGE_PATTERNSfor all 4 missing plugins_extract_asset_from_url()for the same plugins_normalize_url()for HackerNews and Open-MeteoWhat changed
/abs/2603.168702603.16870/works/OL103123W/Duneol103123w/en/docs?latitude=35.68&longitude=139.6535.68,139.65/item?id=1234512345Each pattern was verified against the actual plugin URL parsing logic (
arxiv.py,openlibrary.py,openmeteo.py,hackernews.py) and live website URL structures.Test plan
is_detail_page()returns True for detail URLs, False for list/search/homepage URLs across all 4 plugins_extract_asset_from_url()correctly extracts asset IDs from all new URL formats_normalize_url()preserves query params that identify resources (HNid, Open-Meteolatitude/longitude)