Skip to content

Conversation

@ZackMitchell910
Copy link

Summary

  • add a replay-only RunLedger eval suite (suite/case/schema/cassette + stub agent)
  • add a baseline file for regression gating
  • add a GitHub Actions workflow using runledger/[email protected]
  • add a small README note + ignore runledger_out/

How to run locally

runledger run evals/runledger --mode replay --baseline baselines/runledger-demo.json

Notes

  • no external calls; replay-only cassette
  • feel free to remove the suite/workflow if it is not desired

@ZackMitchell910
Copy link
Author

Thanks for taking a look! This PR adds a replay-only RunLedger gate. The workflow runs are currently waiting on fork approval (action_required) or have not started yet for forks. If you are open to it, please approve/authorize the workflow run so CI can complete. Happy to adjust anything.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant