Skip to content

Merge DSBenchEvaluator into MathEvaluator #1268

@sgunasekar

Description

@sgunasekar

DSBenchEvaluator (nemo_skills/evaluation/evaluator/dsbench.py) is currently a subclass of MathEvaluator that adds a relaxed_equal fallback that handles MCQ, dict, and list answer types. The relaxed_equal logic is general enough to be useful beyond DSBench (e.g. any benchmark with MCQ or structured answers), and there's no strong reason to gate it behind a separate evaluator class.

Proposed change:

  • Add relaxed_comparison as an option to MathEvaluatorConfig - either as default or by overloading relaxed_extraction config
  • Apply the relaxed_equal fallback in MathEvaluator.eval_single when relaxed_comparison=True
  • This would remove the need for DSBenchEvaluator entirely - so can dsbench.py and update init.py and DSBench dataset config accordingly

This keeps the evaluator hierarchy flat and makes the relaxed comparison logic reusable for other benchmarks that have MCQ or structured (dict/list) answers.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions