Skip to content

Redact eval artifact secrets and clarify quickstart polling snippets#977

Merged
leggetter merged 4 commits into
mainfrom
fix/eval-artifact-secret-redaction
Jun 25, 2026
Merged

Redact eval artifact secrets and clarify quickstart polling snippets#977
leggetter merged 4 commits into
mainfrom
fix/eval-artifact-secret-redaction

Conversation

@leggetter

Copy link
Copy Markdown
Collaborator

Summary

Follow-up to #976 addressing Copilot review comments:

  • Redact known secrets when writing transcript.json, llm-score.json, llm-judge-failure.json, and eval failure sidecars
  • Run scripts/redact-eval-artifacts.ts in CI before uploading results/runs artifacts
  • Clarify that optional events-API polling snippets in quickstarts continue variables from the main script

Test plan

  • npm run typecheck in docs/agent-evaluation/
  • Docs agent eval CI on this branch (artifact upload path)

Made with Cursor

Address Copilot review on #976: redact known secrets when writing
transcripts and judge failures, and scan results/runs before CI upload.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Copilot AI review requested due to automatic review settings June 24, 2026 16:06
Cover pattern and literal env redaction, JSON artifact output, and wire
npm run test:redact-secrets into npm run test.

Co-Authored-By: Claude <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR tightens handling of sensitive data in agent-evaluation artifacts (transcripts/judge outputs/CI uploads) and clarifies quickstart “poll events API” snippets so readers understand they depend on variables defined earlier in the walkthrough.

Changes:

  • Introduces shared best-effort redaction utilities and applies them when writing eval artifacts (transcript.json, llm-score.json, llm-judge-failure.json, eval failure sidecars).
  • Adds a CI redaction pass over docs/agent-evaluation/results/runs before uploading artifacts.
  • Updates TypeScript/Python/curl quickstarts to clarify the optional polling snippets continue variables from the main script.

Reviewed changes

Copilot reviewed 12 out of 12 changed files in this pull request and generated 2 comments.

Show a summary per file
File Description
docs/content/quickstarts/hookdeck-outpost-typescript.mdoc Clarifies the optional polling snippet depends on published from the main script.
docs/content/quickstarts/hookdeck-outpost-python.mdoc Clarifies the optional polling snippet continues the earlier variables (published, client, tenant_id, topic).
docs/content/quickstarts/hookdeck-outpost-curl.mdoc Clarifies the optional polling snippet reuses env vars from the publish section.
docs/agent-evaluation/src/transcript-trajectory.ts Reuses shared redaction helper for trajectory previews.
docs/agent-evaluation/src/run-agent-eval.ts Applies JSON redaction when writing key eval artifacts and failure sidecars.
docs/agent-evaluation/src/redact-secrets.ts Adds centralized redaction helpers for patterns + env-literal replacement.
docs/agent-evaluation/src/llm-judge.ts Redacts raw judge attempt text before writing llm-judge-failure.json.
docs/agent-evaluation/scripts/redact-eval-artifacts.ts Adds a CI-friendly in-place redaction walker for results/runs/**.json.
docs/agent-evaluation/README.md Documents new redaction behavior and CI pre-upload scan.
.github/workflows/docs-agent-eval-ci.yml Runs the new redaction script before artifact upload.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/agent-evaluation/src/redact-secrets.ts Outdated
Comment on lines +86 to +88
- name: Redact secrets in eval artifacts (best effort)
if: always()
run: node --import tsx scripts/redact-eval-artifacts.ts
Add EVAL_TEST_DESTINATION_URL and OUTPOST_TEST_WEBHOOK_URL to literal
redaction; pass secrets into the CI redact step so re-scan can replace
plain JSON echoes (Copilot review on #977).

Co-Authored-By: Claude <noreply@anthropic.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Deep-walk artifact objects and redact string leaves before serialization so
transcript.json stays parseable for heuristic scoring.

Co-authored-by: Cursor <cursoragent@cursor.com>
@leggetter leggetter merged commit 79d32ac into main Jun 25, 2026
2 checks passed
@leggetter leggetter deleted the fix/eval-artifact-secret-redaction branch June 25, 2026 08:59
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants