Forward GGML_METAL_NO_RESIDENCY env var to llama-server on macOS by Geramy · Pull Request #1526 · lemonade-sdk/lemonade

Geramy · 2026-04-03T20:42:18Z

Summary

Explicitly forwards GGML_METAL_NO_RESIDENCY from lemond's environment to the llama-server subprocess on macOS
When lemond is started by launchd (e.g. after .pkg install), it does not inherit the shell environment, so the env var never reaches llama-server
This fixes llama-server b8648 crashing on macOS CI runners (MTLGPUFamilyApple5 paravirtualized GPU) due to unsupported Metal residency sets

Test plan

Verify macOS CI tests pass with GGML_METAL_NO_RESIDENCY=1 set in workflow
Verify no regression on macOS when the env var is not set (normal user machines)

When lemond is started by launchd (e.g. after .pkg install), it does not inherit the shell environment. This explicitly forwards the GGML_METAL_NO_RESIDENCY env var to the llama-server subprocess so Metal residency sets can be disabled on paravirtualized GPUs like GitHub Actions macOS runners (MTLGPUFamilyApple5).

Always set GGML_METAL_NO_RESIDENCY=1 when launching llama-server unless the user has explicitly set the variable themselves. Residency sets crash on paravirtualized GPUs (e.g. GitHub Actions macOS runners with MTLGPUFamilyApple5).

Geramy · 2026-04-04T00:45:57Z

There may not be a fix for this until we get a m series processor as a runner. See below.

- ggml-org/llama.cpp#16266 is the closest issue — Metal crashes on limited/older hardware after the "make backend async" commit. It was    
  closed with "upgrade your macOS" as the resolution.
  - PR #18738 attempted a page-alignment fix but was closed without merge. The maintainer (ggerganov) wants to find the root cause rather    
  than add a workaround, but can't reproduce it on modern hardware.                                                                          
  - No one has reported the specific issue with GitHub Actions paravirtualized GPUs (MTLGPUFamilyApple5). These aren't real Apple Silicon —
  they're VM-emulated GPUs with very limited capabilities.                                                                                   
                             
  The llama.cpp team isn't actively working on this because it only affects old macOS / limited virtualized GPUs that real users wouldn't run
   inference on. The CI runners just happen to have these fake GPUs.

Geramy added 4 commits April 3, 2026 11:42

Merge branch 'main' into macos_llamacp_apr3_26_fix

d43af6c

Bump llama.cpp to b8648/b1228 to test Metal residency fix

68f56f2

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Forward GGML_METAL_NO_RESIDENCY env var to llama-server on macOS#1526

Forward GGML_METAL_NO_RESIDENCY env var to llama-server on macOS#1526
Geramy wants to merge 4 commits intomainfrom
macos_llamacp_apr3_26_fix

Geramy commented Apr 3, 2026

Uh oh!

Geramy commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

Geramy commented Apr 3, 2026

Summary

Test plan

Uh oh!

Geramy commented Apr 4, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant