Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

work dirs #1307

Open
wants to merge 21 commits into
base: main
Choose a base branch
from
Open

work dirs #1307

wants to merge 21 commits into from

Conversation

laanak08
Copy link
Collaborator

@laanak08 laanak08 commented Feb 20, 2025

Invocation

  • help: cargo run --bin goose -- bench --help
  • cargo run --bin goose -- bench to run the "core" suite of bencharks
  • cargo run --bin goose -- bench -s $suite_name1,$suite_name2,...,etc
  • cargo run --bin goose -- bench --repeat 3 to run the evals 3 times
  • cargo run --bin goose -- bench -i "some_dir,some_other_dir to have some_dir & some_other_dir copied into the relevant workdir that needs it.
  • add new benchmark-suites to crates/goose-bench/src/eval_suites

How Work-Dirs...work

  • the purpose of the work-dir is to have a place to read-write files, that can be referenced as the "current directory" from within the evaluation code
  • each invocation of goose bench will create if not exists, a dir for the provider under which will have
  • a date-time dir for the run, under which,
  • a dir per eval-suite, under which,
  • a dir for the eval-itself
  • multiple runs for the same provider will result in a tree like the following.
Screenshot 2025-02-20 at 12 03 56 PM

Semantics [DO NOT SKIP READING]

  • there is a core suite of evaluations that runs by default if the --suites cli flag is not set
    • differently stated, any evaluation not included in core will not run
  • if --suites is supplied, only the items in that list will run, so if core isnt part of the list of suites passed to --suites, it will not run.

Individual Evals

  • example can be examined here: crates/goose-bench/src/eval_suites/core/example.rs
  • groups of related evals can be placed together in a rust module representing the suite crates/goose-bench/src/eval_suites/core
    • In this example core is the $suite_name
    • where each eval is in its own file at crates/goose-bench/src/eval_suites/core/$eval_name
    • register new evals to the suite-name.
      • ex. suite name core, which has one eval example so its registered as follows:
      • register_evaluation!("core", ExampleEval)

Limitations

  • no namespacing until this PR is merged in.
    • until then, wherever its run, and whatever its allowed to do (via exts), it will, without isolating its work to a tmp env
    • copy files needed for eval into eval work-dir
  • bug: building with --release affects which eval suites are run. To Be Debugged
  • summary/run-report/errors-report
  • tracing. maybe it works, maybe it doesnt, havent checked.
  • does not handle configuring ollama. still necessary to manually config before running bench
  • test multiple configs easily.
    • currently runs tests for the agent/config thats active in the environment its run.
  • parallelize at evals-level, or suite-level, or goose-bench
    struck items are outside the scope of current bench-work.

@laanak08 laanak08 mentioned this pull request Feb 20, 2025
7 tasks
Copy link

github-actions bot commented Feb 20, 2025

PR Preview Action v1.6.0

🚀 View preview at
https://block.github.io/goose/pr-preview/pr-1307/

Built to branch gh-pages at 2025-02-21 03:19 UTC.
Preview will be ready when the GitHub Pages deployment is complete.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants