Skip to content

Champion challenge scoring (winner-takes-all)#336

Draft
angosr wants to merge 8 commits intomainfrom
champion-challenge-scoring
Draft

Champion challenge scoring (winner-takes-all)#336
angosr wants to merge 8 commits intomainfrom
champion-challenge-scoring

Conversation

@angosr
Copy link
Copy Markdown
Collaborator

@angosr angosr commented Apr 2, 2026

New Scoring Mechanism: Champion Challenge

Replaces the ELO + weight distribution system with a winner-takes-all champion challenge. Only the champion receives 100% weight.

Core Rules

Param Value Meaning
N 10 Challenger must Pareto-dominate the champion for N consecutive rounds to take the crown
M 3 Accumulate M rounds without domination → terminate sampling
M-1 2 M-1 consecutive rounds without domination → also terminate sampling
X 3 Champion offline for X consecutive rounds → reselect by geometric mean

Scoring Flow

Stage 1: Data collection (unchanged)
Stage 2: Champion challenge
  ├── Locate champion (hotkey+revision identity)
  ├── Each active challenger vs champion via Pareto comparison
  ├── Domination → consecutive wins +1 / No domination → total losses +1
  ├── Consecutive wins reach N → new champion (all states reset)
  ├── Losses reach M → terminate sampling
  └── Weight assignment: champion = 1.0, all others = 0.0

Champion Identity & Offline Handling

  • hotkey + revision defines identity — changing revision counts as going offline
  • Weight stays on the champion's UID during the offline grace period
  • After X rounds offline → reselect from current window by highest geometric mean

Challenger Recovery

  • Terminated challengers can only resume sampling by submitting a new model revision
  • Scheduler reads challenge_status to skip terminated miners

Cold Start

On first deployment, inherits the highest ELO-rated miner as initial champion. Falls back to geometric mean selection if that miner is absent from current scoring data.

af get-rank Display

New Status and Challenge columns showing each miner's challenge progress:

Hotkey   | UID | Model        | env_a      | env_b      |     Status |  Challenge
hk_champ |   1 | best/model   | 75.23/500  | 82.10/500  | ★ CHAMPION |          —
hk_stron |   2 | strong/model | 78.50/480  | 85.20/490  |   sampling |     W:5/10
hk_weak  |   3 | weak/model   | 40.20/500  | 35.80/500  | TERMINATED |      L:3/3

Removed

  • ELO rating system (elo.py, stage3_subset.py, stage4_weights.py)
  • OpenSkill shadow scoring (openskill_scorer.py, openskill_config.py, associated DAOs and DB schemas)
  • ELO-related fields from API models and database layer

angosr added 4 commits April 2, 2026 08:50
Simplify the 4-stage scoring pipeline (Pareto→ELO→Weights→Normalize)
into a 2-stage champion challenge system (Collect→ChampionChallenge).

Only the champion gets 100% weight. Challengers must Pareto-dominate
the champion for N=10 consecutive rounds to take the crown. Challengers
that accumulate M=3 total losses or M-1=2 consecutive losses are
terminated to save sampling resources.

Key changes:
- New: champion_challenge.py — core challenge logic
- Remove: elo.py, stage3_subset.py, stage4_weights.py, openskill_*
- Remove: OpenSkill DB schemas and DAOs
- Clean: ELO fields from API models, scores DAO, miner rank display
- Add: challenge_info to scores table for af get-rank display
- Add: terminated miner check in sampling scheduler
- Add: 77 tests covering all code paths and edge cases

Champion offline handling:
- Weight preserved on old champion UID during grace period (X=3 rounds)
- Offline counter persisted to DB each round
- After threshold: reselect by geometric mean from current window

Cold start: inherit highest ELO-rated miner as initial champion
Recovery: only by changing model revision (resets challenge state)
Tests the full scorer pipeline with mocked DAO boundaries:
- save_results verifies correct DB calls (no ELO fields, has challenge_info)
- Multi-round state persistence via mock (3 rounds to dethrone champion)
- Offline counter saved to DB via system_config.set_param
- Backward compat: old DB records without challenge_* fields
- Scheduler terminated check (skip/allow/missing-record)
- 10-miner 10-round realistic scenario with invariant checks:
  at-most-one-champion, weights-sum-to-1, terminated-have-enough-losses
Bug 9: Cold start ELO miner not in scoring_data → X rounds zero weight
  Fix: Verify ELO champion exists in scoring_data before using;
  fall back to geometric mean selection if absent.

Bug 10: Per-miner challenge state loading wraps entire loop in try/except
  Fix: Per-miner try/except so one DB error only affects that miner,
  not all miners' accumulated challenge progress.

Bug 11: since_block overwritten every round even when champion unchanged
  Fix: Read existing champion from DB, preserve since_block if same
  hotkey+revision. Only update on actual champion change.

Also: consolidated two get_param_value calls into one read,
fixed async mock warning in integration tests.
champion_challenge.py:
- Restructure into 6 clear phases with private methods
- Extract _record_win/_record_loss, _best_by_geo_mean, _find_grace_uid
- Remove inline comments for obvious logic
- Single _empty_output for guard clauses

Tests: 8 files → 2 files (34 tests covering same behaviors)
- test_champion_challenge.py: unit tests by behavior category
- test_scorer_e2e.py: multi-round simulations, DB persistence, utils
@Tobias10086 Tobias10086 requested review from Tobias10086 and catoneone and removed request for Tobias10086 and catoneone April 2, 2026 09:46
angosr added 2 commits April 2, 2026 10:03
A comparison only fires when the challenger's common task count with the
champion crosses a new window-size boundary (checkpoint). This ensures
each comparison uses a meaningfully different dataset instead of repeating
on nearly identical data every scoring round.

checkpoint = min(common_tasks across envs) // min(sampling_count across envs)

When current_checkpoint > prev_checkpoint → comparison fires.
Otherwise the challenger is skipped this round.

New field: challenge_checkpoints_passed (persisted in miner_stats + scores)
New parameter: env_sampling_counts passed through main → scorer → challenge
@angosr angosr marked this pull request as draft April 2, 2026 10:08
angosr added 2 commits April 2, 2026 10:11
- SubsetInfo, Stage2Output, Stage2ParetoFilter.filter() — unused
- filtered_subsets, filter_reasons on MinerData — old Pareto filter state
- filter_info from scores DAO, API model, API router — replaced by challenge_info
- MAX_LAYERS config — old subset layer system
- _get_filter_reason_from_api in rank.py — old Pareto filter display
- Simplify stage2_pareto.py to only expose _compare_miners
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant