Skip to content

Pull requests: EleutherAI/lm-evaluation-harness

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Sort

Pull requests list

openai: better error messages; fix greedy matching
#2327 opened Sep 20, 2024 by baberabb Loading…
3
Fix float limit override
#2325 opened Sep 19, 2024 by cjluo-omniml Loading…
Mathvista
#2321 opened Sep 18, 2024 by baberabb Draft
change group to tags in task eus_exams task configs
#2320 opened Sep 18, 2024 by baberabb Loading…
add batch_size to get_sample_size
#2311 opened Sep 17, 2024 by baberabb Draft
Scrolls branch
#2309 opened Sep 16, 2024 by blitzionic Loading…
Fix missing key in custom task loading.
#2304 opened Sep 16, 2024 by giuliolovisotto Loading…
add new truncation strategy
#2300 opened Sep 15, 2024 by artemorloff Draft
fix some bugs of mmlu
#2299 opened Sep 14, 2024 by eyuansu62 Loading…
Added TurkishMMLU to LM Evaluation Harness
#2283 opened Sep 6, 2024 by ArdaYueksel Loading…
add mmlu readme
#2282 opened Sep 6, 2024 by baberabb Loading…
Gen Prefix
#2274 opened Sep 2, 2024 by baberabb Loading…
Nvidia TensorRT-LLM
#2271 opened Sep 1, 2024 by abhishekvijeev Draft
Add Yue-Benchmark and update tasks description
#2270 opened Aug 31, 2024 by cpa2001 Loading…
Ifeval: Dowload punkt_tab on rank 0
#2267 opened Aug 30, 2024 by baberabb Loading…
[Draft] llm-as-judge
#2251 opened Aug 25, 2024 by baberabb Draft
Minor features
#2249 opened Aug 25, 2024 by artemorloff Loading…
Add MBPP
#2247 opened Aug 23, 2024 by hjlee1371 Loading…
Add GPTQModel support for inferencing GPTQ models
#2217 opened Aug 16, 2024 by Qubitium Loading…
add option for custom aggregation
#2209 opened Aug 12, 2024 by lintangsutawika Loading…
Add KoCommonGEN v2 benchmark
#2208 opened Aug 12, 2024 by metterian Loading…
CoverBench
#2207 opened Aug 11, 2024 by ysjprojects Loading…
ProTip! Exclude everything labeled bug with -label:bug.