Skip to content

[cli-tools-test] Behavior fingerprint data inconsistency between logs and audit tools for same run #23418

@github-actions

Description

@github-actions

Exploratory testing discovered that the logs and audit tools return different behavior_fingerprint values for the exact same workflow run, indicating a data consistency issue.

Problem Description

When comparing the behavior_fingerprint for the same run ID (23701814578) between the logs and audit tools, the values differ significantly:

Field logs tool output audit tool output
execution_style "directed" "exploratory"
resource_profile "lean" "heavy"
agentic_fraction 0 0.5
tool_breadth "narrow" "narrow"
actuation_style "read_only" "read_only"
dispatch_mode "standalone" "standalone"

Three out of six fields differ for the same run. This suggests the fingerprint computation is not deterministic or uses different data sources depending on which tool is called.

Tool

  • Tool: audit and logs (both affected)
  • Run: §23701814578 — "GPL Dependency Cleaner (gpclean)"

Steps to Reproduce

  1. Call logs tool with workflow_name: "GPL Dependency Cleaner (gpclean)" and count: 1
  2. Observe behavior_fingerprint for run 23701814578:
    {"execution_style":"directed","tool_breadth":"narrow","actuation_style":"read_only","resource_profile":"lean","dispatch_mode":"standalone","agentic_fraction":0}
  3. Call audit tool with run_id_or_url: "23701814578"
  4. Observe behavior_fingerprint for the same run:
    {"execution_style":"exploratory","tool_breadth":"narrow","actuation_style":"read_only","resource_profile":"heavy","dispatch_mode":"standalone","agentic_fraction":0.5}

Expected Behavior

Both tools should return identical behavior_fingerprint values for the same run, since the fingerprint should be a deterministic function of the run's actual execution data.

Actual Behavior

Three fields (execution_style, resource_profile, agentic_fraction) differ between the two tools for the same run.

Impact

  • Severity: High — users who use logs for quick scanning and audit for deep-dives will see conflicting signals about a run's execution behavior
  • Frequency: Observed consistently for the same run ID
  • Affected: Any analysis or alerting built on top of behavior_fingerprint data
  • Workaround: None — the values differ per tool call

Hypothesis

The logs tool may be computing the fingerprint from a cached/lightweight summary, while the audit tool recomputes from raw log data. The discrepancy could be due to:

  • Different data sources (cached vs. live computation)
  • Different algorithms or weighting for field calculations
  • A caching issue where the logs summary was generated before full log processing completed

Additional Observations from Testing Session

Other observations from the same testing session (Run ID: 23702146707):

  1. audit with invalid run ID returns a raw MCP error -32603: failed to fetch run metadata instead of a user-friendly error message
  2. logs with non-existent workflow name returns MCP error -32602 instead of a structured empty response with an informative message
  3. compile tool requires .md suffix (e.g., ace-editor.md) while logs and status tools use display names or IDs without extension — inconsistent naming convention across tools
  4. Failed run audit (run 23701844529) does not surface the specific error message (Authentication failed) in the errors field, even though it's visible in the downloaded detection.log

Environment

  • Repository: github/gh-aw
  • Testing Run ID: 23702146707
  • Date: 2026-03-29
  • Affected Run: §23701814578

Generated by Daily CLI Tools Exploratory Tester ·

  • expires on Apr 5, 2026, 5:26 AM UTC

Metadata

Metadata

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions