Champion challenge scoring (winner-takes-all) by angosr · Pull Request #336 · AffineFoundation/affine-cortex

angosr · 2026-04-02T09:42:51Z

New Scoring Mechanism: Champion Challenge

Replaces the ELO + weight distribution system with a winner-takes-all champion challenge. Only the champion receives 100% weight.

Core Rules

Param	Value	Meaning
N	10	Challenger must Pareto-dominate the champion for N consecutive rounds to take the crown
M	3	Accumulate M rounds without domination → terminate sampling
M-1	2	M-1 consecutive rounds without domination → also terminate sampling
X	3	Champion offline for X consecutive rounds → reselect by geometric mean

Scoring Flow

Stage 1: Data collection (unchanged)
Stage 2: Champion challenge
  ├── Locate champion (hotkey+revision identity)
  ├── Each active challenger vs champion via Pareto comparison
  ├── Domination → consecutive wins +1 / No domination → total losses +1
  ├── Consecutive wins reach N → new champion (all states reset)
  ├── Losses reach M → terminate sampling
  └── Weight assignment: champion = 1.0, all others = 0.0

Champion Identity & Offline Handling

hotkey + revision defines identity — changing revision counts as going offline
Weight stays on the champion's UID during the offline grace period
After X rounds offline → reselect from current window by highest geometric mean

Challenger Recovery

Terminated challengers can only resume sampling by submitting a new model revision
Scheduler reads challenge_status to skip terminated miners

Cold Start

On first deployment, inherits the highest ELO-rated miner as initial champion. Falls back to geometric mean selection if that miner is absent from current scoring data.

`af get-rank` Display

New Status and Challenge columns showing each miner's challenge progress:

Hotkey   | UID | Model        | env_a      | env_b      |     Status |  Challenge
hk_champ |   1 | best/model   | 75.23/500  | 82.10/500  | ★ CHAMPION |          —
hk_stron |   2 | strong/model | 78.50/480  | 85.20/490  |   sampling |     W:5/10
hk_weak  |   3 | weak/model   | 40.20/500  | 35.80/500  | TERMINATED |      L:3/3

Removed

ELO rating system (elo.py, stage3_subset.py, stage4_weights.py)
OpenSkill shadow scoring (openskill_scorer.py, openskill_config.py, associated DAOs and DB schemas)
ELO-related fields from API models and database layer

Simplify the 4-stage scoring pipeline (Pareto→ELO→Weights→Normalize) into a 2-stage champion challenge system (Collect→ChampionChallenge). Only the champion gets 100% weight. Challengers must Pareto-dominate the champion for N=10 consecutive rounds to take the crown. Challengers that accumulate M=3 total losses or M-1=2 consecutive losses are terminated to save sampling resources. Key changes: - New: champion_challenge.py — core challenge logic - Remove: elo.py, stage3_subset.py, stage4_weights.py, openskill_* - Remove: OpenSkill DB schemas and DAOs - Clean: ELO fields from API models, scores DAO, miner rank display - Add: challenge_info to scores table for af get-rank display - Add: terminated miner check in sampling scheduler - Add: 77 tests covering all code paths and edge cases Champion offline handling: - Weight preserved on old champion UID during grace period (X=3 rounds) - Offline counter persisted to DB each round - After threshold: reselect by geometric mean from current window Cold start: inherit highest ELO-rated miner as initial champion Recovery: only by changing model revision (resets challenge state)

Tests the full scorer pipeline with mocked DAO boundaries: - save_results verifies correct DB calls (no ELO fields, has challenge_info) - Multi-round state persistence via mock (3 rounds to dethrone champion) - Offline counter saved to DB via system_config.set_param - Backward compat: old DB records without challenge_* fields - Scheduler terminated check (skip/allow/missing-record) - 10-miner 10-round realistic scenario with invariant checks: at-most-one-champion, weights-sum-to-1, terminated-have-enough-losses

Bug 9: Cold start ELO miner not in scoring_data → X rounds zero weight Fix: Verify ELO champion exists in scoring_data before using; fall back to geometric mean selection if absent. Bug 10: Per-miner challenge state loading wraps entire loop in try/except Fix: Per-miner try/except so one DB error only affects that miner, not all miners' accumulated challenge progress. Bug 11: since_block overwritten every round even when champion unchanged Fix: Read existing champion from DB, preserve since_block if same hotkey+revision. Only update on actual champion change. Also: consolidated two get_param_value calls into one read, fixed async mock warning in integration tests.

champion_challenge.py: - Restructure into 6 clear phases with private methods - Extract _record_win/_record_loss, _best_by_geo_mean, _find_grace_uid - Remove inline comments for obvious logic - Single _empty_output for guard clauses Tests: 8 files → 2 files (34 tests covering same behaviors) - test_champion_challenge.py: unit tests by behavior category - test_scorer_e2e.py: multi-round simulations, DB persistence, utils

A comparison only fires when the challenger's common task count with the champion crosses a new window-size boundary (checkpoint). This ensures each comparison uses a meaningfully different dataset instead of repeating on nearly identical data every scoring round. checkpoint = min(common_tasks across envs) // min(sampling_count across envs) When current_checkpoint > prev_checkpoint → comparison fires. Otherwise the challenger is skipped this round. New field: challenge_checkpoints_passed (persisted in miner_stats + scores) New parameter: env_sampling_counts passed through main → scorer → challenge

… one per round

- SubsetInfo, Stage2Output, Stage2ParetoFilter.filter() — unused - filtered_subsets, filter_reasons on MinerData — old Pareto filter state - filter_info from scores DAO, API model, API router — replaced by challenge_info - MAX_LAYERS config — old subset layer system - _get_filter_reason_from_api in rank.py — old Pareto filter display - Simplify stage2_pareto.py to only expose _compare_miners

…mary

angosr added 4 commits April 2, 2026 08:50

Tobias10086 requested review from Tobias10086 and catoneone and removed request for Tobias10086 and catoneone April 2, 2026 09:46

angosr added 2 commits April 2, 2026 10:03

Simplify checkpoint: Kth comparison needs common >= K*window, advance…

1a13748

… one per round

angosr marked this pull request as draft April 2, 2026 10:08

angosr added 2 commits April 2, 2026 10:11

Minor: deduplicate DAO, warn on window_size=0, show checkpoint in sum…

3150472

…mary

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Champion challenge scoring (winner-takes-all)#336

Champion challenge scoring (winner-takes-all)#336
angosr wants to merge 8 commits intomainfrom
champion-challenge-scoring

angosr commented Apr 2, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

angosr commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

New Scoring Mechanism: Champion Challenge

Core Rules

Scoring Flow

Champion Identity & Offline Handling

Challenger Recovery

Cold Start

af get-rank Display

Removed

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

angosr commented Apr 2, 2026 •

edited

Loading

`af get-rank` Display