feat(eval): add verbose CLI reporting and CI failure annotations by thegovind · Pull Request #107 · microsoft/skills

thegovind · 2026-02-07T00:41:10Z

Summary

tests/harness/runner.ts: Add detailed verbose output for scenario evaluations and ralph loop results — prints pattern checks, acceptance criteria matches, individual findings with severity/suggestion/code-snippet, and per-skill failure summaries in the CLI. Enrich the markdown report's Failed Scenarios section with severity labels, suggestions, and matched/incorrect acceptance-criteria sections.
.github/workflows/skill-evaluation.yml: Add a new step that emits ::error:: annotations for every failed scenario (surfacing top-3 errors inline in the Actions UI). Set artifact retention-days: 7 and if-no-files-found: warn on the results upload step.

The goal is to make evaluation failures visible directly in the CLI and in the GitHub Actions summary without needing to download artifacts.

Testing

pnpm typecheck — passed locally.
Note: LSP diagnostics could not run because typescript-language-server is not installed in this environment.
No runtime tests were executed; changes are additive console output and workflow steps.

- Print scenario pattern checks, acceptance criteria, and findings in verbose mode - Add per-skill failure summaries with error details - Show ralph loop iteration score trails - Enrich markdown report with severity labels, suggestions, and matched sections - Add GitHub Actions error annotations for failed scenarios - Set artifact retention to 7 days with if-no-files-found warning Ultraworked with [Sisyphus](https://github.com/code-yeongyu/oh-my-opencode) Co-authored-by: Sisyphus <clio-agent@sisyphuslabs.ai>

thegovind merged commit af29b8b into main Feb 7, 2026
2 checks passed

thegovind deleted the improve-eval-reporting branch February 7, 2026 00:42

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(eval): add verbose CLI reporting and CI failure annotations#107

feat(eval): add verbose CLI reporting and CI failure annotations#107
thegovind merged 1 commit intomainfrom
improve-eval-reporting

thegovind commented Feb 7, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

thegovind commented Feb 7, 2026

Summary

Testing

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant