fix: prevent apps from crashing when LLMs are loaded#1063
Conversation
…m/LLM.cpp Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>
NorbertKlockiewicz
left a comment
There was a problem hiding this comment.
I wasn't able to crash Private Mind during model load, only during generation but on emulator with 2gbs of RAM, I am only wondering if we shouldn't also load vision encoders in similar way as those also can use a lot of RAM and when they are loaded with text_decoder the app can crash.
yeah, we do. I'll add the changes tomorrow. |
turns out we already do that, as vision encoders are a part of the same module. |
## Description Fix crashes when loading LLMs by using mmap and avoiding reporting model file size as external memory pressure. This PR changes how LLM models are loaded to prevent crashes with large models. Previously, reporting the full model file size via setExternalMemoryPressure() would cause Hermes to crash because it breaks the GC's heap accounting when external memory exceeds or approaches the 3GB max heap size. We also set the LoadMode to Mmap instead of File, causing the ET runtime to lazy-load weights to RAM on-demand instead of storing the entire file content in memory, preventing the OS from killing the app. ### Introduces a breaking change? - [ ] Yes - [x] No ### Type of change - [x] Bug fix (change which fixes an issue) - [ ] New feature (change which adds functionality) - [ ] Documentation update (improves or adds clarity to existing documentation) - [ ] Other (chores, tests, code style improvements etc.) ### Tested on - [x] iOS - [x] Android ### Testing instructions - [ ] Take a large model, verify it crashes the app on main and note the memory consumption - [ ] Try running the same model on this branch, make sure it doesn't crash and note the memory consumption - [ ] Verify models that would usually fit in your RAM are not slowed down significantly. ### Screenshots <!-- Add screenshots here, if applicable --> ### Related issues <!-- Link related issues here using #issue-number --> ### Checklist - [x] I have performed a self-review of my code - [x] I have commented my code, particularly in hard-to-understand areas - [ ] I have updated the documentation accordingly - [ ] My changes generate no new warnings ### Additional notes <!-- Include any additional information, assumptions, or context that reviewers might need to understand this PR. --> --------- Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>
## Summary Patch release v0.8.3 — cherry-picks the following bug fixes from `main` into `release/0.8`: - fix: add mutex to VoiceActivityDetection to prevent race between `generate()` and `unload()` (#1056) - fix: prevent apps from crashing when LLMs are loaded (#1063) - fix: add inference mutex to Text Embedding and Text-to-Image (#1060) ## Checklist - [x] Commits cherry-picked from `main` in chronological order - [x] Version bumped to `0.8.3` in `packages/react-native-executorch/package.json` 🤖 Generated with [Claude Code](https://claude.com/claude-code) --------- Co-authored-by: Radek Czemerys <7029942+radko93@users.noreply.github.com> Co-authored-by: Bartosz Hanc <bartosz.hanc02@gmail.com> Co-authored-by: Jakub Chmura <92989966+chmjkb@users.noreply.github.com>
Description
Fix crashes when loading LLMs by using mmap and avoiding reporting model file size as external memory pressure. This PR changes how LLM models are loaded to prevent crashes with large models. Previously, reporting the full model file size via setExternalMemoryPressure() would cause Hermes to crash because it breaks the GC's heap accounting when external memory exceeds or approaches the 3GB max heap size.
We also set the LoadMode to Mmap instead of File, causing the ET runtime to lazy-load weights to RAM on-demand instead of storing the entire file content in memory, preventing the OS from killing the app.
Introduces a breaking change?
Type of change
Tested on
Testing instructions
Screenshots
Related issues
Checklist
Additional notes