feat: integrate Qwen2.5-Omni models with native audio input support#1581
feat: integrate Qwen2.5-Omni models with native audio input support#1581ramkrishna2910 wants to merge 2 commits intomainfrom
Conversation
Add Qwen2.5-Omni-7B and Qwen2.5-Omni-3B to the model registry as llamacpp models with a new "omni" label. Omni models activate experience mode in the UI (unified chat with image upload and mic) but are treated as single models rather than composites. For omni models, the experience mic records raw audio via useAudioCapture, builds a WAV, and sends it as a multimodal content part (image_url with data:audio/wav;base64 URI) in the chat completions request. This lets the omni model process audio natively rather than transcribing it first with Whisper. Changes: - server_models.json: Add Qwen2.5-Omni-7B/3B with omni+vision labels - experienceModels.ts: isOmniModel() helper; isExperienceModel() returns true for omni; download/load checks handle single-model omni - ChatWindow.tsx: Compute and pass isOmni prop to LLMChatPanel - LLMChatPanel.tsx: Raw audio recording, WAV encoding, audio preview, audio file upload, and audio content parts in chat messages - ModelManager.tsx: Show 'omni' label; skip composite-only paths - backendInstaller.ts: Skip composite experience path for omni models - styles.css: Audio preview, recording indicator, message audio styles https://claude.ai/code/session_01Pwbj1PmPJCpGMTSoba4ML4
|
@claude review |
|
Claude finished @ramkrishna2910's task in 2m 44s —— View job PR Review: feat: integrate Qwen2.5-Omni models with native audio input support
SummaryGood overall approach — native audio-to-model without Whisper is the right architecture for an omni model. The implementation is mostly clean, but there are two security issues that need to be addressed before merge, plus several design concerns worth discussing. 🔴 Security Issues (Must Fix)1. XSS + URL redirect via unvalidated
|
1. Fix XSS + URL redirect (CodeQL): validate URL schemes before rendering <img>/<audio> src attributes — only allow data:image/, data:audio/, https://, and http:// schemes. 2. Refactor experience/omni separation: revert isExperienceModel to its original pure form (composite experience models only). Add new isExperienceOrOmni() helper used exclusively in the UI layer (ChatWindow, ModelSelector) where experience-mode behavior is needed. This eliminates the fragile !isOmniModel() guards that were scattered across ModelManager and backendInstaller. 3. Add 60-second recording duration limit with a live timer display in the recording indicator. Auto-stops and attaches audio when the limit is reached. 4. Fix O(n²) btoa encoding: use chunk-based String.fromCharCode with subarray slices (8KB chunks) instead of per-byte string concatenation. https://claude.ai/code/session_01Pwbj1PmPJCpGMTSoba4ML4
| return <audio key={index} src={url} controls preload="metadata" className="message-audio" />; | ||
| } | ||
| if (url.startsWith('data:image/') || url.startsWith('https://') || url.startsWith('http://')) { | ||
| return <img key={index} src={url} alt="Uploaded" className="message-image" />; |
| return <audio key={index} src={url} controls preload="metadata" className="message-audio" />; | ||
| } | ||
| if (url.startsWith('data:image/') || url.startsWith('https://') || url.startsWith('http://')) { | ||
| return <img key={index} src={url} alt="Uploaded" className="message-image" />; |
Add Qwen2.5-Omni-7B and Qwen2.5-Omni-3B to the model registry as
llamacpp models with a new "omni" label. Omni models activate
experience mode in the UI (unified chat with image upload and mic)
but are treated as single models rather than composites.
For omni models, the experience mic records raw audio via
useAudioCapture, builds a WAV, and sends it as a multimodal
content part (image_url with data:audio/wav;base64 URI) in the
chat completions request. This lets the omni model process audio
natively rather than transcribing it first with Whisper.