-
Notifications
You must be signed in to change notification settings - Fork 319
[cli-tools-test] Behavior fingerprint data inconsistency between logs and audit tools for same run #23418
Description
Exploratory testing discovered that the logs and audit tools return different behavior_fingerprint values for the exact same workflow run, indicating a data consistency issue.
Problem Description
When comparing the behavior_fingerprint for the same run ID (23701814578) between the logs and audit tools, the values differ significantly:
| Field | logs tool output |
audit tool output |
|---|---|---|
execution_style |
"directed" |
"exploratory" |
resource_profile |
"lean" |
"heavy" |
agentic_fraction |
0 |
0.5 |
tool_breadth |
"narrow" |
"narrow" ✅ |
actuation_style |
"read_only" |
"read_only" ✅ |
dispatch_mode |
"standalone" |
"standalone" ✅ |
Three out of six fields differ for the same run. This suggests the fingerprint computation is not deterministic or uses different data sources depending on which tool is called.
Tool
- Tool:
auditandlogs(both affected) - Run: §23701814578 — "GPL Dependency Cleaner (gpclean)"
Steps to Reproduce
- Call
logstool withworkflow_name: "GPL Dependency Cleaner (gpclean)"andcount: 1 - Observe
behavior_fingerprintfor run23701814578:{"execution_style":"directed","tool_breadth":"narrow","actuation_style":"read_only","resource_profile":"lean","dispatch_mode":"standalone","agentic_fraction":0} - Call
audittool withrun_id_or_url: "23701814578" - Observe
behavior_fingerprintfor the same run:{"execution_style":"exploratory","tool_breadth":"narrow","actuation_style":"read_only","resource_profile":"heavy","dispatch_mode":"standalone","agentic_fraction":0.5}
Expected Behavior
Both tools should return identical behavior_fingerprint values for the same run, since the fingerprint should be a deterministic function of the run's actual execution data.
Actual Behavior
Three fields (execution_style, resource_profile, agentic_fraction) differ between the two tools for the same run.
Impact
- Severity: High — users who use
logsfor quick scanning andauditfor deep-dives will see conflicting signals about a run's execution behavior - Frequency: Observed consistently for the same run ID
- Affected: Any analysis or alerting built on top of
behavior_fingerprintdata - Workaround: None — the values differ per tool call
Hypothesis
The logs tool may be computing the fingerprint from a cached/lightweight summary, while the audit tool recomputes from raw log data. The discrepancy could be due to:
- Different data sources (cached vs. live computation)
- Different algorithms or weighting for field calculations
- A caching issue where the
logssummary was generated before full log processing completed
Additional Observations from Testing Session
Other observations from the same testing session (Run ID: 23702146707):
auditwith invalid run ID returns a raw MCP error-32603: failed to fetch run metadatainstead of a user-friendly error messagelogswith non-existent workflow name returns MCP error-32602instead of a structured empty response with an informative messagecompiletool requires.mdsuffix (e.g.,ace-editor.md) whilelogsandstatustools use display names or IDs without extension — inconsistent naming convention across tools- Failed run audit (
run 23701844529) does not surface the specific error message (Authentication failed) in theerrorsfield, even though it's visible in the downloadeddetection.log
Environment
- Repository: github/gh-aw
- Testing Run ID: 23702146707
- Date: 2026-03-29
- Affected Run: §23701814578
Generated by Daily CLI Tools Exploratory Tester · ◷
- expires on Apr 5, 2026, 5:26 AM UTC