GitHub - OpenDFM/SciEval: [AAAI 2024] SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research

Description

SciEval is an evaluation benchmark for large language models in the scientific domain. It consists of approximately 18,000 objective evaluation questions and few subjective questions, covering the fundamental scientific fields of chemistry, physics, and biology. This benchmark assesses the understanding and generation capabilities of large language models in scientific content from four aspects: basic knowledge, knowledge application, scientific calculation, and research ability.

Files Description

scieval-dev.json is the dev set, containing 5 samples for each $t a s k n a m e$ , each $a b i l i t y$ and each $c a t e g o r y$ , which is specially used for few shot.
scieval-valid.json is the valid set, containing the answer for each question.
scieval-test.json is the test set.
scieval-test-local.json is the test set with ground-truth answers, you can use it for local evaluation.
make_few_shot.py is the code for generating the few shot data, you can modify it as you need.
eval.py is the evaluation code for the valid set, which is the same as the one we used for the test set. Note the the prediction should follow the format:

[{
    "id": "5534a4ef45aea8a6f1835750a54c01d0",
    "pred": "C",
}]

dynamic_chem.json and dynamic_phy.json is the dynamic data, which is a re-generated version and is different from the data we used in the leaderboard. We will update it regularly.
eval_dynamic.py is the evalution code for the dynamic data. To use this script, you need to add "pred" key directly to the original dynamic data.

Reference

If you use any source codes or datasets included in this repository in your work, please cite the corresponding papers. The bibtex are listed below:

@article{sun2023scieval,
  title={SciEval: A Multi-Level Large Language Model Evaluation Benchmark for Scientific Research},
  author={Sun, Liangtai and Han, Yang and Zhao, Zihan and Ma, Da and Shen, Zhennan and Chen, Baocai and Chen, Lu and Yu, Kai},
  journal={arXiv preprint arXiv:2308.13149},
  year={2023}
}

Name	Name	Last commit message	Last commit date
Latest commit Sunliangtai Update README.md Aug 6, 2024 aa61a9d · Aug 6, 2024 History 25 Commits
assets	assets	Delete assets/title.png	Dec 13, 2023
README.md	README.md	Update README.md	Aug 6, 2024
dynamic_chem.json	dynamic_chem.json	initialize	Jul 27, 2023
dynamic_phy.json	dynamic_phy.json	initialize	Jul 27, 2023
eval.py	eval.py	Update eval.py	Jun 28, 2024
eval_dynamic.py	eval_dynamic.py	initialize	Jul 27, 2023
make_few_shot.py	make_few_shot.py	initialize	Jul 27, 2023
scieval-dev.json	scieval-dev.json	Rename bai-scieval-dev.json to scieval-dev.json	Dec 12, 2023
scieval-test-local.json	scieval-test-local.json	Add files via upload	Aug 6, 2024
scieval-test.json	scieval-test.json	Add files via upload	Jun 28, 2024
scieval-valid.json	scieval-valid.json	Rename bai-scieval-valid.json to scieval-valid.json	Dec 12, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Description

Files Description

Reference

About

Releases

Packages

Languages

OpenDFM/SciEval

Folders and files

Latest commit

History

Repository files navigation

Description

Files Description

Reference

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages