Adding RuLES task #3

yawen-d · 2024-10-13T17:08:12Z

Evaluation script for LLM RuLES task

Notes:

In rules.py, four methods _single_input, _single_eval_message, _single_eval_postprocess, run_pipeline were overwritten.
Most entries in the dataset are multi-round dialogues that only contain user messages - where the assistant messages were generated on the fly. This is the reason why tasks/RuLES.py#L100 takes in the LLM client to generate the intermediate assistant messages.
I haven't fully tested the script, but only tested it under debug mode.

yawen-d added 3 commits October 14, 2024 00:57

add RuLEs dataset

ba4ccfe

add llm_rules task and utils

ef43380

remove unused line in base

cfc20c7

yawen-d requested a review from haonan-li October 13, 2024 17:08

Provide feedback