feat: integrate Qwen2.5-Omni models with native audio input support by ramkrishna2910 · Pull Request #1581 · lemonade-sdk/lemonade

ramkrishna2910 · 2026-04-08T04:20:20Z

Add Qwen2.5-Omni-7B and Qwen2.5-Omni-3B to the model registry as
llamacpp models with a new "omni" label. Omni models activate
experience mode in the UI (unified chat with image upload and mic)
but are treated as single models rather than composites.

For omni models, the experience mic records raw audio via
useAudioCapture, builds a WAV, and sends it as a multimodal
content part (image_url with data:audio/wav;base64 URI) in the
chat completions request. This lets the omni model process audio
natively rather than transcribing it first with Whisper.

Add Qwen2.5-Omni-7B and Qwen2.5-Omni-3B to the model registry as llamacpp models with a new "omni" label. Omni models activate experience mode in the UI (unified chat with image upload and mic) but are treated as single models rather than composites. For omni models, the experience mic records raw audio via useAudioCapture, builds a WAV, and sends it as a multimodal content part (image_url with data:audio/wav;base64 URI) in the chat completions request. This lets the omni model process audio natively rather than transcribing it first with Whisper. Changes: - server_models.json: Add Qwen2.5-Omni-7B/3B with omni+vision labels - experienceModels.ts: isOmniModel() helper; isExperienceModel() returns true for omni; download/load checks handle single-model omni - ChatWindow.tsx: Compute and pass isOmni prop to LLMChatPanel - LLMChatPanel.tsx: Raw audio recording, WAV encoding, audio preview, audio file upload, and audio content parts in chat messages - ModelManager.tsx: Show 'omni' label; skip composite-only paths - backendInstaller.ts: Skip composite experience path for omni models - styles.css: Audio preview, recording indicator, message audio styles https://claude.ai/code/session_01Pwbj1PmPJCpGMTSoba4ML4

ramkrishna2910 · 2026-04-08T04:43:58Z

@claude review

github-actions · 2026-04-08T04:44:10Z

Claude finished @ramkrishna2910's task in 2m 44s —— View job

PR Review: feat: integrate Qwen2.5-Omni models with native audio input support

Read changed files and gather context
Analyze security vulnerabilities flagged by CodeQL
Review model registry changes
Review frontend/UI changes
Review audio capture and multimodal content implementation
Post comprehensive review

Summary

Good overall approach — native audio-to-model without Whisper is the right architecture for an omni model. The implementation is mostly clean, but there are two security issues that need to be addressed before merge, plus several design concerns worth discussing.

🔴 Security Issues (Must Fix)

1. XSS + URL redirect via unvalidated `src` attribute (`LLMChatPanel.tsx:801`)

CodeQL is correct here. The url variable comes from item.image_url.url, which is populated from both user uploads and assistant/server responses. A compromised or malicious server response could inject javascript:alert(1) or a redirect URI.

// Current — line 801
return <img key={index} src={url} className="message-image" />;

The data:audio/ prefix check on line 798–799 already shows the right idea. Apply the same pattern to the image case:

if (item.type === 'image_url') {
  const url = item.image_url.url;
  if (url.startsWith('data:audio/')) {
    return <audio key={index} src={url} controls preload="metadata" className="message-audio" />;
  }
  // Only render images from safe sources
  if (url.startsWith('data:image/') || url.startsWith('https://') || url.startsWith('http://')) {
    return <img key={index} src={url} className="message-image" />;
  }
  return null;
}

Fix this →

🟡 Design Concerns

2. `isExperienceModel` returning `true` for omni models is fragile

In experienceModels.ts, omni models return true from isExperienceModel, but then 3 different call sites need && !isOmniModel() guards to opt out of the composite-model behavior:

ModelManager.tsx:862
ModelManager.tsx:935
backendInstaller.ts:530

There are also unguarded call sites that happen to work coincidentally (e.g. ModelManager.tsx:476 — getModelSize — works because getExperienceComponents returns [] for omni, triggering the components.length === 0 fallback). This is an accident-waiting-to-happen.

Suggested approach: Don't conflate the two. Keep isExperienceModel for true composite experiences, and handle isOmni separately where experience-mode UI behaviour is needed (e.g. ChatWindow.tsx:81 for isExperienceSelected).

3. Audio sent as `type: 'image_url'` in the chat completions payload

contentArray.push({ type: 'image_url', image_url: { url: omniAudioDataUrl } });

This reuses the vision multimodal structure for audio. The Qwen2.5-Omni model's expected multimodal format via llama.cpp may be different — for example, a top-level audio_url content type. This may silently fail or be ignored by the model. It's worth confirming whether llama.cpp's Qwen2.5-Omni implementation specifically handles data:audio/wav;base64,... via image_url before shipping.

4. No recording duration limit

omniAudioChunksRef accumulates raw PCM16 indefinitely. At 16 kHz mono × 2 bytes = ~32 KB/s. A 10-minute accidental recording would allocate ~19 MB in the React tree, and the btoa conversion would be proportionally slow. Consider capping recording at e.g. 60 seconds and showing a timer in the audio-recording-indicator.

5. `buildWavDataUrl` — slow `btoa` encoding for large buffers

let binary = '';
const bytes = new Uint8Array(buffer);
for (let i = 0; i < bytes.length; i++) binary += String.fromCharCode(bytes[i]);
return `data:audio/wav;base64,${btoa(binary)}`;

String concatenation over large arrays is O(n²) in most JS engines due to repeated reallocation. For non-trivial recordings, prefer:

// Process in chunks to avoid stack overflow and improve perf
const CHUNK_SIZE = 8192;
let binary = '';
for (let i = 0; i < bytes.length; i += CHUNK_SIZE) {
  binary += String.fromCharCode(...bytes.subarray(i, i + CHUNK_SIZE));
}

🟢 Minor Notes

6. Audio upload button uses the microphone icon

LLMChatPanel.tsx:1098 — The audio file upload button uses <MicrophoneIcon active={false} />, making it visually identical to the mic-record button. Users have no way to distinguish "upload audio file" from "record audio". An upload/attachment icon would be clearer.

7. `omni` not in `getCategoryLabel` map (`ModelManager.tsx:512`)

The label map doesn't include 'omni', so it falls back to the capitalization logic and shows "Omni". This is fine as-is, but if you want consistency with other labels (e.g. "Vision", "Experience"), it's worth adding explicitly.

8. Model registry — `mmproj` path resolves correctly

The legacy_mmproj_to_checkpoint function in model_manager.cpp derives the mmproj HuggingFace path as {repo_from_checkpoint}:{mmproj_filename} — so for Qwen2.5-Omni-7B-GGUF, it resolves to ggml-org/Qwen2.5-Omni-7B-GGUF:mmproj-Qwen2.5-Omni-7B-f16.gguf. Please verify that file actually exists in that HF repo at the expected path before merge.

Bottom Line

The security issue (#1) must be fixed. Issue #2 (fragile isExperienceModel expansion) is a meaningful design risk that will likely cause a follow-up bug. Issues #3 (audio content type) needs verification against actual llama.cpp behavior. The rest are polish items.

1. Fix XSS + URL redirect (CodeQL): validate URL schemes before rendering <img>/<audio> src attributes — only allow data:image/, data:audio/, https://, and http:// schemes. 2. Refactor experience/omni separation: revert isExperienceModel to its original pure form (composite experience models only). Add new isExperienceOrOmni() helper used exclusively in the UI layer (ChatWindow, ModelSelector) where experience-mode behavior is needed. This eliminates the fragile !isOmniModel() guards that were scattered across ModelManager and backendInstaller. 3. Add 60-second recording duration limit with a live timer display in the recording indicator. Auto-stops and attaches audio when the limit is reached. 4. Fix O(n²) btoa encoding: use chunk-based String.fromCharCode with subarray slices (8KB chunks) instead of per-byte string concatenation. https://claude.ai/code/session_01Pwbj1PmPJCpGMTSoba4ML4

+                return <audio key={index} src={url} controls preload="metadata" className="message-audio" />;
+              }
+              if (url.startsWith('data:image/') || url.startsWith('https://') || url.startsWith('http://')) {
+                return <img key={index} src={url} alt="Uploaded" className="message-image" />;


+                return <audio key={index} src={url} controls preload="metadata" className="message-audio" />;
+              }
+              if (url.startsWith('data:image/') || url.startsWith('https://') || url.startsWith('http://')) {
+                return <img key={index} src={url} alt="Uploaded" className="message-image" />;


github-advanced-security AI found potential problems Apr 8, 2026

View reviewed changes

Comment thread src/app/src/renderer/components/panels/LLMChatPanel.tsx Fixed

Comment thread src/app/src/renderer/components/panels/LLMChatPanel.tsx Fixed

ramkrishna2910 marked this pull request as draft April 8, 2026 04:44

github-advanced-security AI found potential problems Apr 8, 2026

View reviewed changes

ramkrishna2910 self-assigned this Apr 12, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat: integrate Qwen2.5-Omni models with native audio input support#1581

feat: integrate Qwen2.5-Omni models with native audio input support#1581
ramkrishna2910 wants to merge 2 commits intomainfrom
claude/integrate-qwen-omni-models-LXY26

ramkrishna2910 commented Apr 8, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

ramkrishna2910 commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

ramkrishna2910 commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

ramkrishna2910 commented Apr 8, 2026

Uh oh!

github-actions bot commented Apr 8, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

PR Review: feat: integrate Qwen2.5-Omni models with native audio input support

Summary

🔴 Security Issues (Must Fix)

1. XSS + URL redirect via unvalidated src attribute (LLMChatPanel.tsx:801)

🟡 Design Concerns

2. isExperienceModel returning true for omni models is fragile

3. Audio sent as type: 'image_url' in the chat completions payload

4. No recording duration limit

5. buildWavDataUrl — slow btoa encoding for large buffers

🟢 Minor Notes

6. Audio upload button uses the microphone icon

7. omni not in getCategoryLabel map (ModelManager.tsx:512)

8. Model registry — mmproj path resolves correctly

Bottom Line

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

ramkrishna2910 commented Apr 8, 2026 •

edited

Loading

github-actions bot commented Apr 8, 2026 •

edited

Loading

1. XSS + URL redirect via unvalidated `src` attribute (`LLMChatPanel.tsx:801`)

2. `isExperienceModel` returning `true` for omni models is fragile

3. Audio sent as `type: 'image_url'` in the chat completions payload

5. `buildWavDataUrl` — slow `btoa` encoding for large buffers

7. `omni` not in `getCategoryLabel` map (`ModelManager.tsx:512`)

8. Model registry — `mmproj` path resolves correctly