Skip to content

bug: ls/find directory listings collapsed to "N modified" — destroys the payload and inverts token economics #148

@jhonatanjunio

Description

@jhonatanjunio

squeez version

1.24.0

OS / arch

Windows 11 x86_64 (reproduced locally) — bug is platform-independent (the original report is from a Linux host running Claude Code).

Install method

built from source

Command that triggered the bug

ls /home/fernando/onfly/jailer/datamodel/ 2>/dev/null | head -50
ls -1 /home/fernando/onfly/jailer/datamodel/ 2>/dev/null; ls -1 /home/fernando/onfly/jailer/datamodel/Booking/ 2>/dev/null | grep -i -E "segment|leg|flight"
find /home/fernando/onfly/jailer/datamodel -maxdepth 1 -type d -printf '%f\n' | sort

Expected behaviour

ls/find exist to answer "what files/dirs are here?". The file/dir names are the payload — they are exactly what the caller asked for. When the listing is short it should pass through verbatim; when it is genuinely long it should be bounded by truncation (keep the first N names + a [squeez: +M more] marker), so the names are still visible.

Actual behaviour

# squeez wrap "ls -1 <dir>"   (8 files in the dir)
# squeez [ls] 18→8 tokens (-56%)
./  8 modified  [squeez grouped]

# squeez wrap "find <dir> -type f"   (8 files)
# squeez [find] 104→18 tokens (-83%)
<dir>/  8 modified  [squeez grouped]

Every file name is discarded and replaced by a single <dir>/ N modified [squeez grouped] line. Two distinct defects:

  1. Total information loss. The listing — the whole reason the command was run — is gone. The model cannot see any file names, so the output is useless for its purpose.
  2. Factually wrong verb. Nothing was "modified". ls/find only list. The word modified is borrowed from the git-status use case and actively misleads the model into reading a plain listing as a change-set.

Minimal reproduction

mkdir -p /tmp/sqtest && cd /tmp/sqtest
for f in segment leg flight booking voyage trip route fare; do touch "$f.rs"; done

squeez wrap "ls -1 /tmp/sqtest"
# => ./  8 modified  [squeez grouped]      (all 8 names lost)

squeez wrap "find /tmp/sqtest -type f"
# => /tmp/sqtest/  8 modified  [squeez grouped]

Boundary confirmed — the collapse is gated by a hardcoded threshold of 5 entries sharing one parent dir:

mkdir -p /tmp/sqtest4 && cd /tmp/sqtest4 && touch a.rs b.rs c.rs d.rs
squeez wrap "ls -1 /tmp/sqtest4"
# => a.rs / b.rs / c.rs / d.rs   (4 < 5: preserved)

In practice almost every real directory has ≥5 entries, so the listing is destroyed for essentially every non-trivial ls/find.


Root cause

FsHandler::compress runs grouping::group_files_by_dir(lines, 5) for every non-viewer fs command (src/commands/fs.rs:411-413):

let lines = if is_viewer {
    lines
} else {
    grouping::group_files_by_dir(lines, 5)
};

group_files_by_dir (src/strategies/grouping.rs) buckets lines by parent directory and, for any bucket with count >= threshold, replaces the whole bucket with "{prefix}{dir}/ {count} modified [squeez grouped]".

This strategy was designed for git-status-style output, where the path list is incidental context and "collapse N siblings in src/ to src/ N modified" is a genuine, useful compression. git.rs rightly uses it (threshold 4).

For ls/find the same call is inappropriate:

  • A flat ls <dir> lists one directory, so every entry shares the same parent (.). The bucket therefore contains the entire listing and collapses to a single count line — maximal information loss by construction.
  • For ls/find the path list is the answer, not incidental context. Collapsing it to a count answers a different question than the one asked.

The "adaptive" layer does not (and cannot) save this

The reporter's hypothesis was that adaptive intensity should prevent this. It can't, as currently wired:

  • intensity::scale (src/context/intensity.rs) only scales numeric limits (max_lines, find_max_results, git_diff_max_lines, …) by ×0.6 (Full) / ×0.3 (Ultra).
  • It never touches the grouping threshold (hardcoded 5) and never gates whether grouping runs at all.

So the collapse fires identically at [adaptive: Full] and [adaptive: Ultra]. (The original screenshot shows it at Full; the repro above shows it at Ultra — same result.) Worse, grouping runs before truncation, so raising find_max_results has no effect: the listing is already gone by the time truncation would act.

Impact: inverted token economics

The header advertises 18→8 tokens (-56%) / 104→18 tokens (-83%), but the local saving (~10-86 tokens) triggers a far more expensive recovery:

  1. The model receives a useless count and cannot find the file it needs.
  2. It reasons about the failure ("squeez is compressing the ls/find output to the point of making it useless").
  3. It works around squeez — e.g. redirects the command to a file and Reads it back — costing hundreds of tokens and several extra tool round-trips.

Net effect: squeez spends tokens here instead of saving them, and trains the model to distrust and route around the tool — defeating its entire purpose. (This recovery sequence is visible in the original report's transcript.)

Proposed fix directions

The core principle: for listing commands, the names are payload — bound them with truncation, don't collapse them to a count. Grouping-to-a-count should stay reserved for status-style output (git) where the path list is incidental.

Concrete options (not mutually exclusive):

  • A — Don't group listings; truncate them. In FsHandler, route ls/find/du/etc. straight to truncation::apply(lines, find_max_results, Keep::Head) with a [squeez: +M more entries — narrow with a glob or -maxdepth] marker. Names stay visible; size stays bounded. This is the smallest, highest-value change.
  • B — Make grouping non-destructive even when it does fire. When a bucket collapses, keep a sample of names: dir/ 9 entries: segment.rs leg.rs flight.rs … (+6) instead of dir/ 9 modified. A count-only line is what makes the model give up.
  • C — Only group when it actually helps (multi-dir trees). Skip grouping when it would produce ≤1-2 buckets (the flat-ls case is exactly this). Collapse only deep find subtrees that exceed threshold, while still emitting a per-subtree sample.
  • D — Fix the misleading verb regardless. modified must not appear for listing commands; use entries/files. (Necessary but insufficient on its own — the information loss is the real defect.)
  • E — Let intensity gate grouping. At Full (low context pressure) listings should never be collapsed at all; reserve aggressive collapse for genuine Ultra pressure.

Recommendation: A + D as the minimal fix (stop grouping listings, kill the wrong verb), with B/C as the richer follow-up so large multi-dir find output still compresses without becoming a dead end.

Confirmations

  • Searched existing issues — no duplicate (no open issues; nothing in closed history about ls/find grouping collapse).
  • Reproduces on the latest release / main (91adeb8, v1.24.0).

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions