Draft
Conversation
Simplify the 4-stage scoring pipeline (Pareto→ELO→Weights→Normalize) into a 2-stage champion challenge system (Collect→ChampionChallenge). Only the champion gets 100% weight. Challengers must Pareto-dominate the champion for N=10 consecutive rounds to take the crown. Challengers that accumulate M=3 total losses or M-1=2 consecutive losses are terminated to save sampling resources. Key changes: - New: champion_challenge.py — core challenge logic - Remove: elo.py, stage3_subset.py, stage4_weights.py, openskill_* - Remove: OpenSkill DB schemas and DAOs - Clean: ELO fields from API models, scores DAO, miner rank display - Add: challenge_info to scores table for af get-rank display - Add: terminated miner check in sampling scheduler - Add: 77 tests covering all code paths and edge cases Champion offline handling: - Weight preserved on old champion UID during grace period (X=3 rounds) - Offline counter persisted to DB each round - After threshold: reselect by geometric mean from current window Cold start: inherit highest ELO-rated miner as initial champion Recovery: only by changing model revision (resets challenge state)
Tests the full scorer pipeline with mocked DAO boundaries: - save_results verifies correct DB calls (no ELO fields, has challenge_info) - Multi-round state persistence via mock (3 rounds to dethrone champion) - Offline counter saved to DB via system_config.set_param - Backward compat: old DB records without challenge_* fields - Scheduler terminated check (skip/allow/missing-record) - 10-miner 10-round realistic scenario with invariant checks: at-most-one-champion, weights-sum-to-1, terminated-have-enough-losses
Bug 9: Cold start ELO miner not in scoring_data → X rounds zero weight Fix: Verify ELO champion exists in scoring_data before using; fall back to geometric mean selection if absent. Bug 10: Per-miner challenge state loading wraps entire loop in try/except Fix: Per-miner try/except so one DB error only affects that miner, not all miners' accumulated challenge progress. Bug 11: since_block overwritten every round even when champion unchanged Fix: Read existing champion from DB, preserve since_block if same hotkey+revision. Only update on actual champion change. Also: consolidated two get_param_value calls into one read, fixed async mock warning in integration tests.
champion_challenge.py: - Restructure into 6 clear phases with private methods - Extract _record_win/_record_loss, _best_by_geo_mean, _find_grace_uid - Remove inline comments for obvious logic - Single _empty_output for guard clauses Tests: 8 files → 2 files (34 tests covering same behaviors) - test_champion_challenge.py: unit tests by behavior category - test_scorer_e2e.py: multi-round simulations, DB persistence, utils
A comparison only fires when the challenger's common task count with the champion crosses a new window-size boundary (checkpoint). This ensures each comparison uses a meaningfully different dataset instead of repeating on nearly identical data every scoring round. checkpoint = min(common_tasks across envs) // min(sampling_count across envs) When current_checkpoint > prev_checkpoint → comparison fires. Otherwise the challenger is skipped this round. New field: challenge_checkpoints_passed (persisted in miner_stats + scores) New parameter: env_sampling_counts passed through main → scorer → challenge
- SubsetInfo, Stage2Output, Stage2ParetoFilter.filter() — unused - filtered_subsets, filter_reasons on MinerData — old Pareto filter state - filter_info from scores DAO, API model, API router — replaced by challenge_info - MAX_LAYERS config — old subset layer system - _get_filter_reason_from_api in rank.py — old Pareto filter display - Simplify stage2_pareto.py to only expose _compare_miners
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
New Scoring Mechanism: Champion Challenge
Replaces the ELO + weight distribution system with a winner-takes-all champion challenge. Only the champion receives 100% weight.
Core Rules
Scoring Flow
Champion Identity & Offline Handling
hotkey + revisiondefines identity — changing revision counts as going offlineChallenger Recovery
challenge_statusto skip terminated minersCold Start
On first deployment, inherits the highest ELO-rated miner as initial champion. Falls back to geometric mean selection if that miner is absent from current scoring data.
af get-rankDisplayNew Status and Challenge columns showing each miner's challenge progress:
Removed
elo.py,stage3_subset.py,stage4_weights.py)openskill_scorer.py,openskill_config.py, associated DAOs and DB schemas)