Skip to content

Conversation

@yawen-d
Copy link
Collaborator

@yawen-d yawen-d commented Oct 13, 2024

Evaluation script for LLM RuLES task

Notes:

  • In rules.py, four methods _single_input, _single_eval_message, _single_eval_postprocess, run_pipeline were overwritten.
  • Most entries in the dataset are multi-round dialogues that only contain user messages - where the assistant messages were generated on the fly. This is the reason why tasks/RuLES.py#L100 takes in the LLM client to generate the intermediate assistant messages.
  • I haven't fully tested the script, but only tested it under debug mode.

@yawen-d yawen-d requested a review from haonan-li October 13, 2024 17:08
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants