Deploy evaluation stack in our Hugging Face space #25

deanwampler · 2024-12-12T23:18:12Z

Configure a HF space with the evaluation stack (lm-eval-harness + unitxt). Most likely baseline is the HF demo here: https://huggingface.co/demo-leaderboard-backend

bnayahu added this to Trust and Safety Evaluations Dec 9, 2024

deanwampler converted this from a draft issue Dec 12, 2024

deanwampler removed this from Trust and Safety Evaluations Dec 12, 2024

deanwampler added this to FA2: TSEI Tasks Dec 12, 2024

deanwampler changed the title ~~Evaluation stack~~ Deploy evaluation stack in our Hugging Face space Dec 12, 2024

deanwampler moved this to Todo in FA2: TSEI Tasks Dec 12, 2024

deanwampler added reference stack All tools for the reference stack. evaluators Implementations of evaluations, including benchmarks and datasets leaderboards Leaderboards deployed to HF or other places labels Jan 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Deploy evaluation stack in our Hugging Face space #25

Deploy evaluation stack in our Hugging Face space #25

deanwampler commented Dec 12, 2024

Deploy evaluation stack in our Hugging Face space #25

Deploy evaluation stack in our Hugging Face space #25

Comments

deanwampler commented Dec 12, 2024