Skip to content

benchflow-ai/paperbench

Repository files navigation

Preparedness Evals

This repository contains the code for multiple Preparedness evals that use nanoeval and alcatraz.

System requirements

  1. Python 3.11 (3.12 is untested; 3.13 will break chz)

Install pre-requisites

for proj in nanoeval alcatraz nanoeval_alcatraz; do
    pip install -e project/"$proj"
done

Evals

  • PaperBench
  • SWELancer (Forthcoming)
  • MLE-bench (Forthcoming)

About

No description, website, or topics provided.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published