Perf/privacy: gate hot-path stream print() statements behind #if DEBUG by mimeding · Pull Request #5 · mimeding/osaurus

mimeding · 2026-05-27T04:03:47Z

Summary

Why this matters (business)

The performance audit identified three print() statements that fire on the local-inference hot path in release builds:

Once per chat stream + once every 50 streamed deltas (or any time a gap is >2s) — both inside ChatEngine's producer task.
Once per MLX completion-info event in StreamAccumulator — duplicating data already emitted via os_signpost and os_log with the right privacy policy.
Once per chat-UI stream completion in ChatView.

On a 100 tok/s local model that's a stdout write and a String(format:) allocation every few hundred milliseconds, per concurrent stream. None of it is read by anyone in production — the same information already goes through os_signpost and os_log, which respect Console privacy filters.

There's also a quiet privacy angle: print() writes go to stdout, which the unified logging system captures without the privacy: .public/.private redaction policy the matching Logger calls use. Keeping these in release builds isn't dangerous in itself, but the comment on the signpost calls explicitly contrasts them with the unredacted printf-style sibling.

What's wrong (technical)

            print("[Osaurus][Stream] Starting stream wrapper for model: \(model)")
            ...
                    if deltaCount % 50 == 1 || timeSinceLastDelta > 2.0 {
                        print(
                            "[Osaurus][Stream] Delta #\(deltaCount): ..."
                        )
                    }

                    print(
                        String(
                            format: "[MLX] prompt: %d tokens %.1f tok/s ..."
                        )
                    )
                    // Emit GPU-accurate stats as a signpost event so they appear in
                    // Instruments and can be captured by `log stream --type signpost`.
                    accumSignposter.emitEvent(...)
                    // Also emit as a Logger.info so it appears in `log stream`.
                    accumLog.info(...)

The signpost + os_log calls right below the print already cover every consumer (Instruments, log stream). The print is purely a duplicate, runs unconditionally, and is the only one of the three that doesn't carry a privacy policy.

                        let totalTime = Date().timeIntervalSince(streamStartTime)
                        print(
                            "[Osaurus][UI] Stream consumption completed: \(uiDeltaCount) deltas in \(String(format: "%.2f", totalTime))s, final contentLen=\(assistantTurn.contentLength)"
                        )

Fix

Wrap each flagged print in #if DEBUG. Behavior in development builds is byte-identical; release builds skip the formatting, the heap allocations, and the stdout write.

In ChatEngine, lastDeltaTime is moved inside the same DEBUG guard since it has no purpose outside the freeze-detection log. startTime stays unconditional because it's still needed by the downstream InsightsService.logInference call.

Scope decisions

Did not gate the per-tool-call prints at ChatView.swift:1179 / :1242. Those include user-supplied tool arguments and tool outputs, which is the kind of thing that probably should not be in stdout in release builds, but it's a privacy concern and not a perf one, and it deserves a dedicated review (Console.app readability, support diagnostics workflow, etc.). Out of scope for this PR.
Did not add or remove any os_log / Logger call. The redacted, structured-logging surface is unchanged.

Changes

Behavior change (release-build only: silences three high-frequency stdout writes)
UI change
Refactor / chore
Tests (no observable behavior under tests — the prints are pure side effects)
Docs

Test Plan

Run a chat session against a local MLX model in a DEBUG build; Console.app and stdout still show [Osaurus][Stream] Delta #N… lines and the MLX-stats line per turn (unchanged).
Build a Release configuration; the same lines no longer appear in Console.app, while the matching signpost and Logger.info lines do (verifiable via Instruments and log stream --predicate 'subsystem == "ai.osaurus.…"').
Quick sanity: streaming responses still flow into the UI; tool calls still execute. The change is purely additive #if DEBUG guards around print() calls.

Checklist

I have read CONTRIBUTING.md
I added/updated tests where reasonable (see above — no observable behavior to test)
I updated docs/README as needed (n/a)
I verified build on macOS with Xcode 16.4+ (authored in a Linux sandbox; verified each touched file via swiftc -frontend -parse)

The audit found three high-frequency print() statements running in release builds: * ChatEngine producer task — one print on stream start, one every 50 deltas (or whenever a delta gap exceeds 2s), one on stream end, and one per tool invocation. Each token batch on a 100 tok/s local model could fire one of these. * StreamAccumulator — one print per MLX completion-info event, duplicating the same data we already emit via signposts and os_log (which both respect privacy redaction policies). * ChatView stream consumer — one print per UI stream completion. stdout in release builds is captured by Console.app and the unified logging system, but bypasses the privacy('public'/'private') redaction that the matching Logger calls use. The signpost + os_log calls in StreamAccumulator already give Instruments and 'log stream' all the perf data, so the print is purely a duplicate. Gate every flagged print behind '#if DEBUG'. Behavior in DEBUG is unchanged; release builds skip the formatting, the String allocations, and the stdout write entirely. The lastDeltaTime tracking in ChatEngine is moved inside the same DEBUG guard since it has no purpose outside the freeze-detection log. startTime is kept (still used by the InsightsService log call). Tool-call prints in ChatView (lines 1179, 1242) are left alone for now -- they include user-supplied content and deserve a separate, focused privacy-cleanup pass. Co-authored-by: Michael Meding <mimeding@users.noreply.github.com>

ModelManager.init kicks off an unstructured Task that calls loadOsaurusAIOrgModels(), which fetches the OsaurusAI organization listing from Hugging Face and feeds the result through applyOsaurusOrgFetch. The unit-test runner repeatedly constructs ModelManager() to drive applyOsaurusOrgFetch directly. The background launch-time fetch races with those test calls — whichever finishes last wins, and the merge result is non-deterministic. That's the root cause of the flaky ModelManagerSuggestedTests failures seen across many of the recent PR CI runs (applyOsaurusOrgFetch_dropsStaleAutoFetched OnReapply, applyOsaurusOrgFetch_addsNewEntriesAfterCurated, etc.). Gate the launch-time fetch on a small isRunningInTestEnvironment helper that checks for any of XCTestConfigurationFilePath, XCTestBundlePath, or XCTestSessionIdentifier in the process environment. Those variables are only present inside an xctest host process; production app launches still get the HF fetch exactly as before. This is a network call, so removing it under tests also has the side benefit of making the test suite work offline / on hermetic CI runners. Co-authored-by: Michael Meding <mimeding@users.noreply.github.com>

cursoragent and others added 2 commits May 27, 2026 04:03

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Perf/privacy: gate hot-path stream print() statements behind #if DEBUG#5

Perf/privacy: gate hot-path stream print() statements behind #if DEBUG#5
mimeding wants to merge 2 commits into
mainfrom
cursor/silence-stream-debug-prints-2812

mimeding commented May 27, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

mimeding commented May 27, 2026

Summary

Why this matters (business)

What's wrong (technical)

Fix

Scope decisions

Changes

Test Plan

Checklist

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants