Skip to content

Conversation

baptistecolle
Copy link
Collaborator

Changes

  • Added capability to rerun individual benchmarks

    • Modularized backend architecture to support selective benchmark reruns
    • Addresses cases where benchmarks produced erroneous results due to outdated dependencies
  • Implemented observability dashboard for LLM Performance Leaderboard

    • Provides monitoring of benchmark execution status
    • Tracks failed benchmark configurations to facilitate debugging
    • Enhances visibility into the overall health of the leaderboard system

Motivation

These changes improve the maintainability and reliability of the LLM Performance Leaderboard by enabling operators to:

  1. Quickly identify and rerun problematic benchmarks (previously old data would just stay in the leaderboard)
  2. Monitor benchmark execution status through a centralized dashboard
  3. Understand failed configurations better in order to find root cause of a bug

@baptistecolle baptistecolle added the all_benchmarks [CI] Requires and enables running all benchmark workflows label Feb 6, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
all_benchmarks [CI] Requires and enables running all benchmark workflows
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant