Skip to content

fix: prevent apps from crashing when LLMs are loaded#1063

Merged
chmjkb merged 4 commits into
mainfrom
@chmjkb/mmap-load-mode
Apr 9, 2026
Merged

fix: prevent apps from crashing when LLMs are loaded#1063
chmjkb merged 4 commits into
mainfrom
@chmjkb/mmap-load-mode

Conversation

@chmjkb

@chmjkb chmjkb commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator

Description

Fix crashes when loading LLMs by using mmap and avoiding reporting model file size as external memory pressure. This PR changes how LLM models are loaded to prevent crashes with large models. Previously, reporting the full model file size via setExternalMemoryPressure() would cause Hermes to crash because it breaks the GC's heap accounting when external memory exceeds or approaches the 3GB max heap size.

We also set the LoadMode to Mmap instead of File, causing the ET runtime to lazy-load weights to RAM on-demand instead of storing the entire file content in memory, preventing the OS from killing the app.

Introduces a breaking change?

  • Yes
  • No

Type of change

  • Bug fix (change which fixes an issue)
  • New feature (change which adds functionality)
  • Documentation update (improves or adds clarity to existing documentation)
  • Other (chores, tests, code style improvements etc.)

Tested on

  • iOS
  • Android

Testing instructions

  • Take a large model, verify it crashes the app on main and note the memory consumption
  • Try running the same model on this branch, make sure it doesn't crash and note the memory consumption
  • Verify models that would usually fit in your RAM are not slowed down significantly.

Screenshots

Related issues

Checklist

  • I have performed a self-review of my code
  • I have commented my code, particularly in hard-to-understand areas
  • I have updated the documentation accordingly
  • My changes generate no new warnings

Additional notes

@chmjkb chmjkb marked this pull request as ready for review April 8, 2026 10:02
@chmjkb chmjkb linked an issue Apr 8, 2026 that may be closed by this pull request
@msluszniak msluszniak added the bug fix PRs that are fixing bugs label Apr 8, 2026
Comment thread packages/react-native-executorch/common/rnexecutorch/models/llm/LLM.cpp Outdated
…m/LLM.cpp

Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>

@NorbertKlockiewicz NorbertKlockiewicz left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wasn't able to crash Private Mind during model load, only during generation but on emulator with 2gbs of RAM, I am only wondering if we shouldn't also load vision encoders in similar way as those also can use a lot of RAM and when they are loaded with text_decoder the app can crash.

@chmjkb

chmjkb commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator Author

I wasn't able to crash Private Mind during model load, only during generation but on emulator with 2gbs of RAM, I am only wondering if we shouldn't also load vision encoders in similar way as those also can use a lot of RAM and when they are loaded with text_decoder the app can crash.

yeah, we do. I'll add the changes tomorrow.

@chmjkb

chmjkb commented Apr 9, 2026

Copy link
Copy Markdown
Collaborator Author

I wasn't able to crash Private Mind during model load, only during generation but on emulator with 2gbs of RAM, I am only wondering if we shouldn't also load vision encoders in similar way as those also can use a lot of RAM and when they are loaded with text_decoder the app can crash.

yeah, we do. I'll add the changes tomorrow.

turns out we already do that, as vision encoders are a part of the same module.

@chmjkb chmjkb merged commit dc5664f into main Apr 9, 2026
4 checks passed
@chmjkb chmjkb deleted the @chmjkb/mmap-load-mode branch April 9, 2026 06:52
msluszniak added a commit that referenced this pull request Apr 10, 2026
## Description

Fix crashes when loading LLMs by using mmap and avoiding reporting model
file size as external memory pressure. This PR changes how LLM models
are loaded to prevent crashes with large models. Previously, reporting
the full model file size via setExternalMemoryPressure() would cause
Hermes to crash because it breaks the GC's heap accounting when external
memory exceeds or approaches the 3GB max heap size.

We also set the LoadMode to Mmap instead of File, causing the ET runtime
to lazy-load weights to RAM on-demand instead of storing the entire file
content in memory, preventing the OS from killing the app.

### Introduces a breaking change?

- [ ] Yes
- [x] No

### Type of change

- [x] Bug fix (change which fixes an issue)
- [ ] New feature (change which adds functionality)
- [ ] Documentation update (improves or adds clarity to existing
documentation)
- [ ] Other (chores, tests, code style improvements etc.)

### Tested on

- [x] iOS
- [x] Android

### Testing instructions

- [ ] Take a large model, verify it crashes the app on main and note the
memory consumption
- [ ] Try running the same model on this branch, make sure it doesn't
crash and note the memory consumption
- [ ] Verify models that would usually fit in your RAM are not slowed
down significantly.

### Screenshots

<!-- Add screenshots here, if applicable -->

### Related issues

<!-- Link related issues here using #issue-number -->

### Checklist

- [x] I have performed a self-review of my code
- [x] I have commented my code, particularly in hard-to-understand areas
- [ ] I have updated the documentation accordingly
- [ ] My changes generate no new warnings

### Additional notes

<!-- Include any additional information, assumptions, or context that
reviewers might need to understand this PR. -->

---------

Co-authored-by: Mateusz Sluszniak <56299341+msluszniak@users.noreply.github.com>
@msluszniak msluszniak mentioned this pull request Apr 10, 2026
2 tasks
msluszniak added a commit that referenced this pull request Apr 10, 2026
## Summary

Patch release v0.8.3 — cherry-picks the following bug fixes from `main`
into `release/0.8`:

- fix: add mutex to VoiceActivityDetection to prevent race between
`generate()` and `unload()` (#1056)
- fix: prevent apps from crashing when LLMs are loaded (#1063)
- fix: add inference mutex to Text Embedding and Text-to-Image (#1060)

## Checklist

- [x] Commits cherry-picked from `main` in chronological order
- [x] Version bumped to `0.8.3` in
`packages/react-native-executorch/package.json`

🤖 Generated with [Claude Code](https://claude.com/claude-code)

---------

Co-authored-by: Radek Czemerys <7029942+radko93@users.noreply.github.com>
Co-authored-by: Bartosz Hanc <bartosz.hanc02@gmail.com>
Co-authored-by: Jakub Chmura <92989966+chmjkb@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug fix PRs that are fixing bugs

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Prevent OOM-based app crashes

4 participants