Skip to content

feat: add Wikipedia plugin with 2 question templates#21

Open
dataCenter430 wants to merge 1 commit intoAffineFoundation:mainfrom
dataCenter430:feat/wikipedia-plugin
Open

feat: add Wikipedia plugin with 2 question templates#21
dataCenter430 wants to merge 1 commit intoAffineFoundation:mainfrom
dataCenter430:feat/wikipedia-plugin

Conversation

@dataCenter430
Copy link
Copy Markdown

Summary

Adds a Wikipedia plugin that fills two capability gaps identified in
CLAUDE.md that no existing plugin covers:

Gap How it's addressed
Nested structure navigation edit_count: article → "View history" tab → count dated revisions
Search-driven interaction category_count: main page → navigate to Category page → read count

Plugin (wikipedia)

  • Allowed domain: en.wikipedia.org- Blocked: */w/api.php* and */api/rest_v1/metrics* — forces agents to use the web interface, not the raw API
  • needs_api_data fires only for category pages and ?action=history pages; all other Wikipedia pages are navigation-only (no wasted API calls)
  • Session reuse via shared aiohttp.ClientSession, 1 req/s rate limit

Templates

wikipedia_category_count

How many articles are in the Wikipedia category "[X]"?
How many subcategories does the Wikipedia category "[X]" have?

  • Agent navigates to Category:X and reads the count from the page header
    ("The following N pages are in this category")
  • GT: categoryinfo.pages / categoryinfo.subcats — exact same number
    shown on page
  • Variant space: 251 categories × 2 metrics = 502 variants

wikipedia_edit_count

How many times has the Wikipedia article "[Title]" been edited in the
past 7/14/30 days?

  • Agent opens the article, clicks "View history", counts timestamped revisions within the window
  • GT: revision timestamps fetched over the past 35 days; count uses fetched_at as the reference "now" so GT reflects what the agent saw, not a later wall-clock time
  • Variant space: 201 articles × 3 time windows = 603 variants
  • Answers change every few hours for active articles

Red Team Review (all 6 checks pass)

Check category_count edit_count
1. API semantic categoryinfo.pages = page header count ✓ rvprop=timestamp = history timestamps ✓
2. World knowledge Exact counts unknown to LLM (<60%) ✓ Edit counts not in training data ✓
3. Variant space 502 ≥ 500 ✓ 603 ≥ 500 ✓
4. Answer stability Time-based categories change weekly ✓ Changes every edit (hours) ✓
5. Random baseline Wide integer range, ~0% ✓ Wide integer range, ~0% ✓
6. Cross-param collapse Different categories → different counts ✓ Different articles → different rates ✓

Tests

tests/plugins/wikipedia/test_wikipedia_templates.py69 tests, all passing:

  • Template registration, generation determinism, JSON-serialisability
  • category_count GT: found, fuzzy/underscore title match, not_collected, no-collector system_error, ignores non-category data
  • edit_count GT: 7/14/30-day window counts, zero-edit case, malformed fetched_at → system_error, ignores non-history data
  • Plugin URL routing: 11 needs_api_data cases, dispatch mocked correctly
  • titles_match / normalize_title helpers
  • Pool size and duplicate assertions

Checklist

  • eval.py tested with multiple seeds (10-minute timeout)
  • Red team self-attack pass completed (all 6 checks documented above)
  • All files under 500 lines
  • No silent fallbacks (or 0, bare except: pass)
  • needs_api_data returns False for navigation-only pages
  • 69 unit tests passing, no regressions in existing suite

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant