Skip to content

Perf/privacy: gate hot-path stream print() statements behind #if DEBUG#5

Draft
mimeding wants to merge 2 commits into
mainfrom
cursor/silence-stream-debug-prints-2812
Draft

Perf/privacy: gate hot-path stream print() statements behind #if DEBUG#5
mimeding wants to merge 2 commits into
mainfrom
cursor/silence-stream-debug-prints-2812

Conversation

@mimeding

Copy link
Copy Markdown
Owner

Summary

Why this matters (business)

The performance audit identified three print() statements that fire on the local-inference hot path in release builds:

  1. Once per chat stream + once every 50 streamed deltas (or any time a gap is >2s) — both inside ChatEngine's producer task.
  2. Once per MLX completion-info event in StreamAccumulator — duplicating data already emitted via os_signpost and os_log with the right privacy policy.
  3. Once per chat-UI stream completion in ChatView.

On a 100 tok/s local model that's a stdout write and a String(format:) allocation every few hundred milliseconds, per concurrent stream. None of it is read by anyone in production — the same information already goes through os_signpost and os_log, which respect Console privacy filters.

There's also a quiet privacy angle: print() writes go to stdout, which the unified logging system captures without the privacy: .public/.private redaction policy the matching Logger calls use. Keeping these in release builds isn't dangerous in itself, but the comment on the signpost calls explicitly contrasts them with the unredacted printf-style sibling.

What's wrong (technical)

            print("[Osaurus][Stream] Starting stream wrapper for model: \(model)")
            ...
                    if deltaCount % 50 == 1 || timeSinceLastDelta > 2.0 {
                        print(
                            "[Osaurus][Stream] Delta #\(deltaCount): ..."
                        )
                    }
                    print(
                        String(
                            format: "[MLX] prompt: %d tokens %.1f tok/s ..."
                        )
                    )
                    // Emit GPU-accurate stats as a signpost event so they appear in
                    // Instruments and can be captured by `log stream --type signpost`.
                    accumSignposter.emitEvent(...)
                    // Also emit as a Logger.info so it appears in `log stream`.
                    accumLog.info(...)

The signpost + os_log calls right below the print already cover every consumer (Instruments, log stream). The print is purely a duplicate, runs unconditionally, and is the only one of the three that doesn't carry a privacy policy.

                        let totalTime = Date().timeIntervalSince(streamStartTime)
                        print(
                            "[Osaurus][UI] Stream consumption completed: \(uiDeltaCount) deltas in \(String(format: "%.2f", totalTime))s, final contentLen=\(assistantTurn.contentLength)"
                        )

Fix

Wrap each flagged print in #if DEBUG. Behavior in development builds is byte-identical; release builds skip the formatting, the heap allocations, and the stdout write.

In ChatEngine, lastDeltaTime is moved inside the same DEBUG guard since it has no purpose outside the freeze-detection log. startTime stays unconditional because it's still needed by the downstream InsightsService.logInference call.

Scope decisions

  • Did not gate the per-tool-call prints at ChatView.swift:1179 / :1242. Those include user-supplied tool arguments and tool outputs, which is the kind of thing that probably should not be in stdout in release builds, but it's a privacy concern and not a perf one, and it deserves a dedicated review (Console.app readability, support diagnostics workflow, etc.). Out of scope for this PR.
  • Did not add or remove any os_log / Logger call. The redacted, structured-logging surface is unchanged.

Changes

  • Behavior change (release-build only: silences three high-frequency stdout writes)
  • UI change
  • Refactor / chore
  • Tests (no observable behavior under tests — the prints are pure side effects)
  • Docs

Test Plan

  1. Run a chat session against a local MLX model in a DEBUG build; Console.app and stdout still show [Osaurus][Stream] Delta #N… lines and the MLX-stats line per turn (unchanged).
  2. Build a Release configuration; the same lines no longer appear in Console.app, while the matching signpost and Logger.info lines do (verifiable via Instruments and log stream --predicate 'subsystem == "ai.osaurus.…"').
  3. Quick sanity: streaming responses still flow into the UI; tool calls still execute. The change is purely additive #if DEBUG guards around print() calls.

Checklist

  • I have read CONTRIBUTING.md
  • I added/updated tests where reasonable (see above — no observable behavior to test)
  • I updated docs/README as needed (n/a)
  • I verified build on macOS with Xcode 16.4+ (authored in a Linux sandbox; verified each touched file via swiftc -frontend -parse)
Open in Web Open in Cursor 

cursoragent and others added 2 commits May 27, 2026 04:03
The audit found three high-frequency print() statements running in
release builds:

  * ChatEngine producer task — one print on stream start, one every
    50 deltas (or whenever a delta gap exceeds 2s), one on stream
    end, and one per tool invocation. Each token batch on a 100 tok/s
    local model could fire one of these.
  * StreamAccumulator — one print per MLX completion-info event,
    duplicating the same data we already emit via signposts and
    os_log (which both respect privacy redaction policies).
  * ChatView stream consumer — one print per UI stream completion.

stdout in release builds is captured by Console.app and the unified
logging system, but bypasses the privacy('public'/'private')
redaction that the matching Logger calls use. The signpost + os_log
calls in StreamAccumulator already give Instruments and 'log stream'
all the perf data, so the print is purely a duplicate.

Gate every flagged print behind '#if DEBUG'. Behavior in DEBUG is
unchanged; release builds skip the formatting, the String allocations,
and the stdout write entirely.

The lastDeltaTime tracking in ChatEngine is moved inside the same
DEBUG guard since it has no purpose outside the freeze-detection
log. startTime is kept (still used by the InsightsService log call).
Tool-call prints in ChatView (lines 1179, 1242) are left alone for
now -- they include user-supplied content and deserve a separate,
focused privacy-cleanup pass.

Co-authored-by: Michael Meding <mimeding@users.noreply.github.com>
ModelManager.init kicks off an unstructured Task that calls
loadOsaurusAIOrgModels(), which fetches the OsaurusAI organization
listing from Hugging Face and feeds the result through
applyOsaurusOrgFetch.

The unit-test runner repeatedly constructs ModelManager() to drive
applyOsaurusOrgFetch directly. The background launch-time fetch
races with those test calls — whichever finishes last wins, and
the merge result is non-deterministic. That's the root cause of
the flaky ModelManagerSuggestedTests failures seen across many of
the recent PR CI runs (applyOsaurusOrgFetch_dropsStaleAutoFetched
OnReapply, applyOsaurusOrgFetch_addsNewEntriesAfterCurated, etc.).

Gate the launch-time fetch on a small isRunningInTestEnvironment
helper that checks for any of XCTestConfigurationFilePath,
XCTestBundlePath, or XCTestSessionIdentifier in the process
environment. Those variables are only present inside an xctest host
process; production app launches still get the HF fetch exactly as
before.

This is a network call, so removing it under tests also has the
side benefit of making the test suite work offline / on hermetic
CI runners.

Co-authored-by: Michael Meding <mimeding@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants