Misc. bug: Q4_0 repacking results in double RAM usage #12149

bartowski1182 · 2025-03-02T17:49:53Z

Name and Version

b4792

Operating systems

Linux

Which llama.cpp modules do you know to be affected?

llama-cli

Command line

./llama-cli -m microsoft_Phi-4-mini-instruct-Q4_0.gguf

Problem description & steps to reproduce

When loading the model, it uses 4.3GB of RAM

When using Q4_K_S (similar size) it only uses 2.7GB of RAM

First Bad Commit

No response

Relevant log output

bartowski1182 · 2025-03-02T17:51:47Z

Best guess is there's a missing "free" after we've repacked the weights, and the original weights are kept in memory accidentally

slaren · 2025-03-02T18:00:38Z

Some parts of the model file will remain mapped. The same thing happens when partially offloading a model, in practice it is not likely to cause issues because the OS can reclaim that memory if necessary. Disabling mmap with --no-mmap should avoid this.

bartowski1182 · 2025-03-02T19:52:48Z

Ah okay yes I see the RAM usage drop when using that option.. no performance concerns I assume when using it? Thanks for the speedy response!

slaren · 2025-03-02T21:11:43Z

It may affect model loading time, but it will not affect inference performance.

bartowski1182 added the bug-unconfirmed label Mar 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Misc. bug: Q4_0 repacking results in double RAM usage #12149

Misc. bug: Q4_0 repacking results in double RAM usage #12149

bartowski1182 commented Mar 2, 2025

bartowski1182 commented Mar 2, 2025

slaren commented Mar 2, 2025

bartowski1182 commented Mar 2, 2025

slaren commented Mar 2, 2025

Misc. bug: Q4_0 repacking results in double RAM usage #12149

Misc. bug: Q4_0 repacking results in double RAM usage #12149

Comments

bartowski1182 commented Mar 2, 2025

Name and Version

Operating systems

Which llama.cpp modules do you know to be affected?

Command line

Problem description & steps to reproduce

First Bad Commit

Relevant log output

bartowski1182 commented Mar 2, 2025

slaren commented Mar 2, 2025

bartowski1182 commented Mar 2, 2025

slaren commented Mar 2, 2025