Skip to content

[Task Submission] mmlusr (mmlusr)#3

Open
SkySuperCat wants to merge 1 commit intoGenBench:mainfrom
SkySuperCat:mmlusr
Open

[Task Submission] mmlusr (mmlusr)#3
SkySuperCat wants to merge 1 commit intoGenBench:mainfrom
SkySuperCat:mmlusr

Conversation

@SkySuperCat
Copy link

@SkySuperCat SkySuperCat commented Oct 22, 2024

MMLU-SR

mmlusr aims to measure the true comprehension abilities of Large Language Models (LLMs) by challenging their performance in question-answering tasks with modified terms.

Authors

  • Wentian Wang, wwang834@usc.edu
  • Sarthak Jain
  • Paul Kantor
  • Jacob Feldman
  • Lazaros Gallos
  • Hao Wang

Implementation

We have task.py under mmlusr folder, which is a custom method to load answer choices from HuggingFace.

Usage

We need to figure out a way to run all tasks on Genbench. In our Git repo, it's easily to run all tasks and we specifically made every single task a config file line so that it's simple to pick any task user wants. But the loading strategy I see here, for now we have to manually change the task name in config.jsonnet. We cannot change on the huggingface side as it's already used in lm-eval-harness repo.

Checklist:

  • [ √] I and my co-authors agree that, if this PR is merged, the code will be available under the same license as the genbench_cbt repository.
  • [√ ] Prior to submitting, I have ran the GenBench CBT test suite using the genbench-cli test-task tool.
  • [ √] I have read the description of what should be in the doc.md of my task, and have added the required arguments.
  • [ √] I have submitted or will submit an accompanying paper to the GenBench workshop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant