-
Notifications
You must be signed in to change notification settings - Fork 166
Open
Description
DSBenchEvaluator (nemo_skills/evaluation/evaluator/dsbench.py) is currently a subclass of MathEvaluator that adds a relaxed_equal fallback that handles MCQ, dict, and list answer types. The relaxed_equal logic is general enough to be useful beyond DSBench (e.g. any benchmark with MCQ or structured answers), and there's no strong reason to gate it behind a separate evaluator class.
Proposed change:
- Add relaxed_comparison as an option to MathEvaluatorConfig - either as default or by overloading relaxed_extraction config
- Apply the relaxed_equal fallback in MathEvaluator.eval_single when relaxed_comparison=True
- This would remove the need for DSBenchEvaluator entirely - so can dsbench.py and update init.py and DSBench dataset config accordingly
This keeps the evaluator hierarchy flat and makes the relaxed comparison logic reusable for other benchmarks that have MCQ or structured (dict/list) answers.
Reactions are currently unavailable
Metadata
Metadata
Assignees
Labels
No labels