-
Notifications
You must be signed in to change notification settings - Fork 417
[BUG]: Get GGML_ASSERT when running KernelMemorySaveAndLoad.cs #1151
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Comments
LLamaSharp is currently using this version of llama.cpp (three weeks old). Hopefully this should be resolved once we upgrade to a newer version (no set timeline, but usually around once a month). |
Try to set the Embeddings parameter in ModelParams to true. |
I get the same error when I create a LLamaSharpTextEmbeddingGenerator object by providing the config and the model weights (that I have already loaded previously)
If I use the other ctor (by providing only the config object), the object is created as expected (this ofcourse means that it loads the model weights twice, once for the LlamaSharpTextGenerator & another for the LLamaSharpTextEmbeddingGenerator):
The example KernelMemorySaveAndLoad.cs uses the WithLLamaSharpDefaults() extension method, which initially loads the model weights and then creates the text & embedding generators by providing the preloaded weights (which generates the ggml error as well). The commit #1036 (c27cfde) contains a fix in LLama.KernelMemory/LLamaSharpTextEmbeddingGenerator.cs, in method LLamaSharpTextEmbeddingGenerator(LLamaSharpConfig config), where it removes the following two lines from ModelParams object:
Unfortunately these two properties are left intact in the other LLamaSharpTextEmbeddingGenerator constructor:
public LLamaSharpTextEmbeddingGenerator(LLamaSharpConfig config, LLamaWeights weights)
|
Would you be interested in creating a PR to apply the same fixes to the other method? |
Yes I can do it, but it won't be fast (even though the fix is small). I haven't built the LlamaSharp library in my machine, so 'surprises' should be expected... :) |
In the mean time I have found the problem. The distributed dll's are wrong. If you replace them, then the error disappears. I am wondering if we do not have a security problem here because the size of the dll's is double of what is normal... I do not recommend using the distributed dll's. |
I think the CUDA DLLs are enormous because we don't specify any architectures, so they're built to include all of them. I haven't had time to investigate if we can do anything about that though.
Security of the DLLs we build and distribute is something I addressed (to the best of my ability) a while ago by moving it all to GitHub. We will never distribute any binaries that don't come from a GH build action! That way you can check the filehashes and ensure that the build came from GitHub, and you can go back and read the build script (at that exact point in time) to ensure it's not doing anything shady. For example, here is the entry for the latest/current set of DLLs: https://github.com/SciSharp/LLamaSharpBinaries/releases/tag/d79d8f39b4da6. That build process include the CUDA binaries. |
If you do not specify a cuda architecture, then there will only be 1 default architecture used (the devkit has one default) instead of two (what is the usual amount for us), therefore the size of the dll's should be even smaller. The only reason for having a much larger size could be that the build script adds maybe 4 architectures (this could explain the double size). You will need to check this, and that there is something wrong with the cuda 12 dll. If I compile it myself and replace it, then the error mentioned above will not appear. I have also noted a few design problems with the embeddings. KernelMemory uses the built-in embedder, but that was downgraded with non-batching and no normalization, therefore it will not work well with KernelMemory. We will need to adjust the code in a similar way as it was before the modifications, but taking your update into account since that had also some reasons... I will try to work on the examples and on this in the next few days. |
I have added some critical updates to the embedding generation for KernelMemory with this PR: #1170 |
I get the error with the CPU backend as well though. I will try to remove the SplitMode & MainGpu params from the config and see what happens (I'll let you know). |
I think I have a working version of LLamaSharpTextEmbeddingGenerator(config, weights) constructor. The issue wasn't due to the MainGpu & SplitMode parameters, but to Embeddings that was set to True (statically). Changing the Embeddings property to false, it works for both CUDA & CPU backends (at least in my environment!). |
Description
Run embedding example KernelMemorySaveAndLoad.cs, and after the weights and context generated, I got error: LLamaSharp/LLamaSharp/ggml/src/ggml.c:2703: GGML_ASSERT(ggml_can_mul_mat(a,b)) failed
But if I run chatsession with the model, it works perfectly. So it seems something is wrong with the embedding part of kernelmemory, maybe WithLLamaSharpTextEmbeddingGeneration.
Reproduction Steps
Here is my code:
Environment & Configuration
Known Workarounds
It seems to be related with llama.cpp. Because I found possible related issue in llama.cpp: #12517. And the BUG in llama.cpp just resolved last week in #12545.
The text was updated successfully, but these errors were encountered: