Skip to content

升级后使用openclaw调用oMLX小模型跑不了了。Qwen3.5-4B-MLX-4bit、Qwen3.5-9B-MLX-4bit、Qwen2.5-7B-Instruct-4bit、Qwen2.5-14B-Instruct-4bit #755

@misterluck

Description

@misterluck

2026-04-14 20:54:57,359 - omlx.engine_pool - INFO - [-] - Loaded model: Qwen3.5-4B-MLX-4bit (estimated: 2.97GB, total: 2.97GB)
2026-04-14 20:56:57,402 - omlx.engine.vlm - INFO - [-] - [vlm_stream_generate] Aborting request ad839abe-2d3e-40ad-ac70-fc9e7eff03f8
2026-04-14 20:57:01,271 - omlx.scheduler - INFO - [-] - Prefill interrupted at 22528/24919 tokens: 1 request(s) aborted
2026-04-14 20:57:01,274 - omlx.scheduler - INFO - [-] - Rescheduled 1 requests for re-prefill
2026-04-14 20:58:57,788 - omlx.engine.vlm - INFO - [-] - [vlm_stream_generate] Aborting request dfefaf72-77a0-4fc6-b0bd-470787189f81
2026-04-14 20:59:04,620 - omlx.scheduler - INFO - [-] - Prefill interrupted at 22528/25037 tokens: 1 request(s) aborted
2026-04-14 20:59:04,623 - omlx.scheduler - INFO - [-] - Rescheduled 1 requests for re-prefill
2026-04-14 20:59:29,730 - omlx.admin.routes - INFO - [-] - Max model memory changed: 11.52GB -> 11.52GB
2026-04-14 20:59:29,730 - omlx.process_memory_enforcer - INFO - [-] - Process memory limit changed: 12.8GB -> 12.8GB
2026-04-14 20:59:29,730 - omlx.admin.routes - INFO - [-] - Process memory limit updated to 12.8GB
2026-04-14 20:59:29,730 - omlx.engine_pool - INFO - [-] - Unloading model: Qwen3.5-4B-MLX-4bit (immediate abort)
2026-04-14 20:59:29,731 - omlx.engine_core - INFO - [-] - Engine stopped
2026-04-14 20:59:29,731 - omlx.scheduler - INFO - [-] - Scheduler shutdown initiated...
2026-04-14 20:59:29,731 - omlx.cache.paged_ssd_cache - INFO - [-] - Shutting down PagedSSDCacheManager...
2026-04-14 20:59:29,731 - omlx.scheduler - INFO - [-] - Scheduler shutdown completed
2026-04-14 20:59:29,733 - omlx.cache.paged_cache - INFO - [-] - PagedCacheManager cleared (reset to 256 initial blocks)
2026-04-14 20:59:29,795 - omlx.scheduler - INFO - [-] - Deep reset completed - all caches cleared
2026-04-14 20:59:29,808 - omlx.engine.vlm - INFO - [-] - VLMBatchedEngine stopped
2026-04-14 20:59:29,883 - omlx.engine_pool - INFO - [-] - Unloaded model: Qwen3.5-4B-MLX-4bit, memory usage: 0.00B
2026-04-14 20:59:29,884 - omlx.admin.routes - INFO - [-] - Cache settings updated. Unloaded 1 models.
2026-04-14 20:59:29,884 - omlx.admin.routes - INFO - [-] - Sampling defaults updated: max_context_window=327680, max_tokens=327680, temperature=1.0, top_p=0.95, top_k=0, repetition_penalty=1.0
2026-04-14 20:59:29,884 - omlx.admin.routes - INFO - [-] - API key updated via admin settings
2026-04-14 20:59:29,884 - omlx.settings - INFO - [-] - Saved settings to /Users/zhaolei/.omlx/settings.json
2026-04-14 21:11:03,791 - omlx.model_settings - INFO - [-] - Loaded settings for 1 models
2026-04-14 21:11:03,794 - omlx.model_discovery - INFO - [-] - Discovered model: DeepSeek-R1-Distill-Qwen-14B-4bit (type: llm, engine: batched, size: 8.13GB)
2026-04-14 21:11:03,795 - omlx.model_discovery - INFO - [-] - Discovered model: Llama-3.1-8B-Instruct-4bit (type: llm, engine: batched, size: 4.42GB)
2026-04-14 21:11:03,796 - omlx.model_discovery - INFO - [-] - Discovered model: MLX-Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-4bit (type: llm, engine: batched, size: 4.93GB)
2026-04-14 21:11:03,797 - omlx.model_discovery - INFO - [-] - Discovered model: Qwen2.5-14B-Instruct-4bit (type: llm, engine: batched, size: 8.13GB)
2026-04-14 21:11:03,798 - omlx.model_discovery - INFO - [-] - Discovered model: Qwen2.5-7B-Instruct-4bit (type: llm, engine: batched, size: 4.19GB)
2026-04-14 21:11:03,799 - omlx.model_discovery - INFO - [-] - Discovered model: Qwen2.5-Coder-14B-Instruct-4bit (type: llm, engine: batched, size: 8.13GB)
2026-04-14 21:11:03,800 - omlx.model_discovery - INFO - [-] - Discovered model: Qwen3.5-4B-MLX-4bit (type: vlm, engine: vlm, size: 2.97GB)
2026-04-14 21:11:03,801 - omlx.model_discovery - INFO - [-] - Discovered model: Qwen3.5-9B-MLX-4bit (type: vlm, engine: vlm, size: 5.82GB)
2026-04-14 21:11:03,802 - omlx.model_discovery - INFO - [-] - Discovered model: gemma-4-e4b-it-4bit (type: vlm, engine: vlm, size: 5.10GB)
2026-04-14 21:11:03,802 - omlx.engine_pool - INFO - [-] - Discovered 9 models, max memory: 11.52GB
2026-04-14 21:11:48,699 - omlx.engine_pool - INFO - [-] - Loading model: Qwen3.5-9B-MLX-4bit
2026-04-14 21:11:52,220 - omlx.scheduler - INFO - [-] - Enlarging paged cache block_size=256 to 2048 for ArraysCache hybrid model (reduces boundary snapshot overhead)
2026-04-14 21:11:52,220 - omlx.scheduler - INFO - [-] - paged SSD-only mode: max_blocks=100000, block_size=2048 tokens
2026-04-14 21:11:52,220 - omlx.cache.paged_cache - INFO - [-] - PagedCacheManager initialized: block_size=2048, initial_blocks=256, max_blocks=100000, max_tokens=204800000
2026-04-14 21:11:52,220 - omlx.cache.paged_ssd_cache - INFO - [-] - Scanning SSD cache directory: /Users/zhaolei/.omlx/cache
2026-04-14 21:11:52,221 - omlx.cache.paged_ssd_cache - INFO - [-] - SSD cache scan complete: scanned=0, indexed=0, errors=0, total_size=0 B
2026-04-14 21:11:52,221 - omlx.cache.paged_ssd_cache - INFO - [-] - PagedSSDCacheManager initialized: dir=/Users/zhaolei/.omlx/cache, max_size=46.00 GB, existing_files=0
2026-04-14 21:11:52,221 - omlx.cache.paged_cache - INFO - [-] - paged SSD cache manager connected to PagedCacheManager
2026-04-14 21:11:52,221 - omlx.cache.prefix_cache - INFO - [-] - PagedSSDCacheManager connected to BlockAwarePrefixCache
2026-04-14 21:11:52,222 - omlx.scheduler - INFO - [-] - paged SSD cache enabled: cache_dir=/Users/zhaolei/.omlx/cache, max_size=46.00 GB, block_size=2048 tokens
2026-04-14 21:11:52,222 - omlx.scheduler - INFO - [-] - paged SSD cache enabled: /Users/zhaolei/.omlx/cache, block_size=2048, max_blocks=100000
2026-04-14 21:11:52,222 - omlx.engine_core - INFO - [-] - Engine started
2026-04-14 21:11:52,304 - omlx.engine.vlm - INFO - [-] - VLM tool calling enabled: parser=qwen3_coder
2026-04-14 21:11:52,307 - omlx.engine.vlm - INFO - [-] - VLMBatchedEngine loaded: /Users/zhaolei/.omlx/models/Qwen3.5-9B-MLX-4bit
2026-04-14 21:11:52,307 - omlx.engine_pool - INFO - [-] - Loaded model: Qwen3.5-9B-MLX-4bit (estimated: 5.82GB, total: 5.82GB)
2026-04-14 21:11:53,612 - omlx.server - INFO - [-] - Chat completion: 1 tokens in 1.30s (0.8 tok/s)
2026-04-14 21:13:54,314 - omlx.engine.vlm - INFO - [-] - [vlm_stream_generate] Aborting request 51bb6eef-0c29-4c8b-98cf-44d7ecd54b75
2026-04-14 21:13:58,165 - omlx.scheduler - INFO - [-] - Prefill interrupted at 2048/20292 tokens: 1 request(s) aborted
2026-04-14 21:13:58,165 - omlx.scheduler - INFO - [-] - Rescheduled 1 requests for re-prefill
2026-04-14 21:15:57,318 - omlx.engine.vlm - INFO - [-] - [vlm_stream_generate] Aborting request 2bf2ed3c-b310-4b89-9b7f-3fe399a81006
2026-04-14 21:16:14,007 - omlx.scheduler - INFO - [-] - Prefill interrupted at 14336/24837 tokens: 1 request(s) aborted
2026-04-14 21:16:14,032 - omlx.scheduler - INFO - [-] - Rescheduled 1 requests for re-prefill
2026-04-14 21:17:57,709 - omlx.engine.vlm - INFO - [-] - [vlm_stream_generate] Aborting request 69ca0ad6-2530-4e96-8d66-6189e041c280
2026-04-14 21:18:08,907 - omlx.scheduler - INFO - [-] - Prefill interrupted at 12288/24837 tokens: 1 request(s) aborted
2026-04-14 21:18:08,926 - omlx.scheduler - INFO - [-] - Rescheduled 1 requests for re-prefill
2026-04-14 21:22:44,247 - omlx.engine_pool - INFO - [-] - Unloading model: Qwen3.5-9B-MLX-4bit (immediate abort)
2026-04-14 21:22:44,249 - omlx.engine_core - INFO - [-] - Engine stopped
2026-04-14 21:22:44,249 - omlx.scheduler - INFO - [-] - Scheduler shutdown initiated...
2026-04-14 21:22:44,249 - omlx.cache.paged_ssd_cache - INFO - [-] - Shutting down PagedSSDCacheManager...
2026-04-14 21:22:44,251 - omlx.scheduler - INFO - [-] - Scheduler shutdown completed
2026-04-14 21:22:44,252 - omlx.cache.paged_cache - INFO - [-] - PagedCacheManager cleared (reset to 256 initial blocks)
2026-04-14 21:22:44,328 - omlx.scheduler - INFO - [-] - Deep reset completed - all caches cleared
2026-04-14 21:22:44,336 - omlx.engine.vlm - INFO - [-] - VLMBatchedEngine stopped
2026-04-14 21:22:44,799 - omlx.engine_pool - INFO - [-] - Unloaded model: Qwen3.5-9B-MLX-4bit, memory usage: 0.00B
2026-04-14 21:22:44,799 - omlx.admin.routes - INFO - [-] - Manually unloaded model: Qwen3.5-9B-MLX-4bit
2026-04-14 21:23:34,058 - omlx.engine_pool - INFO - [-] - Loading model: Qwen2.5-7B-Instruct-4bit
2026-04-14 21:23:35,698 - omlx.scheduler - INFO - [-] - paged SSD-only mode: max_blocks=100000, block_size=256 tokens
2026-04-14 21:23:35,698 - omlx.cache.paged_cache - INFO - [-] - PagedCacheManager initialized: block_size=256, initial_blocks=256, max_blocks=100000, max_tokens=25600000
2026-04-14 21:23:35,698 - omlx.cache.paged_ssd_cache - INFO - [-] - Scanning SSD cache directory: /Users/zhaolei/.omlx/cache
2026-04-14 21:23:35,699 - omlx.cache.paged_ssd_cache - INFO - [-] - SSD cache scan complete: scanned=0, indexed=0, errors=0, total_size=0 B
2026-04-14 21:23:35,699 - omlx.cache.paged_ssd_cache - INFO - [-] - PagedSSDCacheManager initialized: dir=/Users/zhaolei/.omlx/cache, max_size=46.00 GB, existing_files=0
2026-04-14 21:23:35,699 - omlx.cache.paged_cache - INFO - [-] - paged SSD cache manager connected to PagedCacheManager
2026-04-14 21:23:35,699 - omlx.cache.prefix_cache - INFO - [-] - PagedSSDCacheManager connected to BlockAwarePrefixCache
2026-04-14 21:23:35,699 - omlx.scheduler - INFO - [-] - paged SSD cache enabled: cache_dir=/Users/zhaolei/.omlx/cache, max_size=46.00 GB, block_size=256 tokens
2026-04-14 21:23:35,699 - omlx.scheduler - INFO - [-] - paged SSD cache enabled: /Users/zhaolei/.omlx/cache, block_size=256, max_blocks=100000
2026-04-14 21:23:35,700 - omlx.engine_core - INFO - [-] - Engine started
2026-04-14 21:23:35,700 - omlx.engine.batched - INFO - [-] - BatchedEngine loaded: /Users/zhaolei/.omlx/models/Qwen2.5-7B-Instruct-4bit
2026-04-14 21:23:35,700 - omlx.engine_pool - INFO - [-] - Loaded model: Qwen2.5-7B-Instruct-4bit (estimated: 4.19GB, total: 4.19GB)
2026-04-14 21:23:36,274 - omlx.server - INFO - [-] - Chat completion: 1 tokens in 0.57s (1.8 tok/s)
2026-04-14 21:27:18,473 - omlx.engine.batched - INFO - [-] - [stream_generate] Aborting request af6bcee3-4a99-4562-903c-8afd79850fd1 (finished_normally=False)
2026-04-14 21:27:28,479 - omlx.scheduler - INFO - [-] - Prefill interrupted at 20480/24305 tokens: 1 request(s) aborted
2026-04-14 21:27:28,480 - omlx.scheduler - INFO - [-] - Rescheduled 1 requests for re-prefill
2026-04-14 21:29:18,839 - omlx.engine.batched - INFO - [-] - [stream_generate] Aborting request 8ad59c38-f1db-4e21-8d64-5dd84e3f50c4 (finished_normally=False)
2026-04-14 21:29:22,283 - omlx.scheduler - INFO - [-] - Prefill interrupted at 18432/24305 tokens: 1 request(s) aborted
2026-04-14 21:29:22,283 - omlx.scheduler - INFO - [-] - Rescheduled 1 requests for re-prefill
2026-04-14 21:29:34,413 - omlx.server - WARNING - [-] - GET /admin/api/stats → 401: Admin authentication required
2026-04-14 21:34:28,770 - omlx.engine_pool - INFO - [-] - Unloading model: Qwen2.5-7B-Instruct-4bit (immediate abort)
2026-04-14 21:34:28,772 - omlx.engine_core - INFO - [-] - Engine stopped
2026-04-14 21:34:28,773 - omlx.scheduler - INFO - [-] - Scheduler shutdown initiated...
2026-04-14 21:34:28,773 - omlx.cache.paged_ssd_cache - INFO - [-] - Shutting down PagedSSDCacheManager...
2026-04-14 21:34:28,773 - omlx.scheduler - INFO - [-] - Scheduler shutdown completed
2026-04-14 21:34:28,775 - omlx.cache.paged_cache - INFO - [-] - PagedCacheManager cleared (reset to 256 initial blocks)
2026-04-14 21:34:28,833 - omlx.scheduler - INFO - [-] - Deep reset completed - all caches cleared
2026-04-14 21:34:28,848 - omlx.engine.batched - INFO - [-] - BatchedEngine stopped
2026-04-14 21:34:28,911 - omlx.engine_pool - INFO - [-] - Unloaded model: Qwen2.5-7B-Instruct-4bit, memory usage: 0.00B
2026-04-14 21:34:28,911 - omlx.admin.routes - INFO - [-] - Manually unloaded model: Qwen2.5-7B-Instruct-4bit
2026-04-14 21:47:00,332 - omlx.engine_pool - INFO - [-] - Loading model: Qwen2.5-14B-Instruct-4bit
2026-04-14 21:47:03,799 - omlx.scheduler - INFO - [-] - paged SSD-only mode: max_blocks=100000, block_size=256 tokens
2026-04-14 21:47:03,805 - omlx.cache.paged_cache - INFO - [-] - PagedCacheManager initialized: block_size=256, initial_blocks=256, max_blocks=100000, max_tokens=25600000
2026-04-14 21:47:03,806 - omlx.cache.paged_ssd_cache - INFO - [-] - Scanning SSD cache directory: /Users/zhaolei/.omlx/cache
2026-04-14 21:47:03,807 - omlx.cache.paged_ssd_cache - INFO - [-] - SSD cache scan complete: scanned=0, indexed=0, errors=0, total_size=0 B
2026-04-14 21:47:03,807 - omlx.cache.paged_ssd_cache - INFO - [-] - PagedSSDCacheManager initialized: dir=/Users/zhaolei/.omlx/cache, max_size=46.00 GB, existing_files=0
2026-04-14 21:47:03,807 - omlx.cache.paged_cache - INFO - [-] - paged SSD cache manager connected to PagedCacheManager
2026-04-14 21:47:03,807 - omlx.cache.prefix_cache - INFO - [-] - PagedSSDCacheManager connected to BlockAwarePrefixCache
2026-04-14 21:47:03,807 - omlx.scheduler - INFO - [-] - paged SSD cache enabled: cache_dir=/Users/zhaolei/.omlx/cache, max_size=46.00 GB, block_size=256 tokens
2026-04-14 21:47:03,807 - omlx.scheduler - INFO - [-] - paged SSD cache enabled: /Users/zhaolei/.omlx/cache, block_size=256, max_blocks=100000
2026-04-14 21:47:03,808 - omlx.engine_core - INFO - [-] - Engine started
2026-04-14 21:47:03,808 - omlx.engine.batched - INFO - [-] - BatchedEngine loaded: /Users/zhaolei/.omlx/models/Qwen2.5-14B-Instruct-4bit
2026-04-14 21:47:03,812 - omlx.engine_pool - INFO - [-] - Loaded model: Qwen2.5-14B-Instruct-4bit (estimated: 8.13GB, total: 8.13GB)
2026-04-14 21:47:05,833 - omlx.server - INFO - [-] - Chat completion: 1 tokens in 2.01s (0.5 tok/s)
2026-04-14 21:50:53,045 - omlx.engine.batched - INFO - [-] - [stream_generate] Aborting request a50e3d6e-6689-49b2-ba3e-02df6575b883 (finished_normally=False)
2026-04-14 21:50:54,949 - omlx.scheduler - INFO - [-] - Prefill interrupted at 10240/24319 tokens: 1 request(s) aborted
2026-04-14 21:50:54,950 - omlx.scheduler - INFO - [-] - Rescheduled 1 requests for re-prefill
2026-04-14 21:52:53,583 - omlx.engine.batched - INFO - [-] - [stream_generate] Aborting request 14e51312-9c5b-495d-9e27-b4c8d06c11a5 (finished_normally=False)
2026-04-14 21:52:57,220 - omlx.scheduler - INFO - [-] - Prefill interrupted at 10240/24380 tokens: 1 request(s) aborted
2026-04-14 21:52:57,221 - omlx.scheduler - INFO - [-] - Rescheduled 1 requests for re-prefill

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions