[BUG]: Get GGML_ASSERT when running KernelMemorySaveAndLoad.cs #1151

GianMeng · 2025-04-03T09:07:10Z

Description

Run embedding example KernelMemorySaveAndLoad.cs, and after the weights and context generated, I got error: LLamaSharp/LLamaSharp/ggml/src/ggml.c:2703: GGML_ASSERT(ggml_can_mul_mat(a,b)) failed

But if I run chatsession with the model, it works perfectly. So it seems something is wrong with the embedding part of kernelmemory, maybe WithLLamaSharpTextEmbeddingGeneration.

Reproduction Steps

Here is my code:

using LLama;
using LLama.Common;
using LLamaSharp.KernelMemory;
using Microsoft.KernelMemory;
using Microsoft.KernelMemory.Configuration;
using Microsoft.KernelMemory.DocumentStorage.DevTools;
using Microsoft.KernelMemory.FileSystem.DevTools;
using Microsoft.KernelMemory.MemoryStorage.DevTools;
using System.Diagnostics;

namespace LSKMRAG
{
    class Program
    {
        static void Main(string[] args)
        {
            ChatQwen chat = new ChatQwen();
            chat.ChatQwenMain().GetAwaiter().GetResult(); ;
        }
    }

    public class ChatQwen
    {
        static string StorageFolder => Path.GetFullPath($"./storage-{nameof(ChatQwen)}");
        static bool StorageExists => Directory.Exists(StorageFolder) && Directory.GetDirectories(StorageFolder).Length > 0;

        string modelPath = Path.Combine(Directory.GetCurrentDirectory(), "qwen2.5-3b-instruct-q4_k_m.gguf");

        public async Task ChatQwenMain()
        {
            Console.ForegroundColor = ConsoleColor.Yellow;
            Console.WriteLine(
                """

                This program uses the Microsoft.KernelMemory package to ingest documents
                and store the embeddings as local files so they can be quickly recalled
                when this application is launched again. 

                """);
            IKernelMemory memory = CreateMemoryWithLocalStorage(modelPath);

            Console.ForegroundColor = ConsoleColor.Yellow;
            if (StorageExists) {
                Console.WriteLine(
                    """
                    
                    Kernel memory files have been located!
                    Information about previously analyzed documents has been loaded.

                    """);
            }
            else
            {
                Console.WriteLine(
                    """
                    
                    Existing kernel memory was not found.
                    Documents will be analyzed (slow) and information saved to disk.
                    Analysis will not be required the next time this program is run.
                    Press ENTER to proceed...

                    """);
                Console.ReadLine();
                await IngestDocuments(memory);
            }
        }

        private static IKernelMemory CreateMemoryWithLocalStorage(string modelPath)
        {
            InferenceParams infParams = new() { AntiPrompts = ["\n\n"] };

            LLamaSharpConfig lsConfig = new(modelPath) { DefaultInferenceParams = infParams };

            var parameters = new ModelParams(modelPath)
            {
                ContextSize = 2048,
                GpuLayerCount = 99,
                MainGpu = lsConfig.MainGpu,
                SplitMode = lsConfig.SplitMode
            };

            SearchClientConfig searchClientConfig = new()
            {
                MaxMatchesCount = 1,
                AnswerTokens = 100,
            };

            TextPartitioningOptions parseOptions = new()
            {
                MaxTokensPerParagraph = 300,
                // MaxTokensPerLine = 100,
                OverlappingTokens = 30
            };

            SimpleFileStorageConfig storageConfig = new()
            {
                Directory = StorageFolder,
                StorageType = FileSystemTypes.Disk,
            };

            SimpleVectorDbConfig vectorDbConfig = new()
            {
                Directory = StorageFolder,
                StorageType = FileSystemTypes.Disk,
            };

            Console.ForegroundColor = ConsoleColor.Blue;
            Console.WriteLine($"Kernel memory folder: {StorageFolder}");
            Console.ForegroundColor = ConsoleColor.DarkGray;

            return new KernelMemoryBuilder()
                .WithSimpleFileStorage(storageConfig)
                .WithSimpleVectorDb(vectorDbConfig)
                .WithLLamaSharpDefaults(lsConfig)
                .WithSearchClientConfig(searchClientConfig)
                .With(parseOptions)
                .Build();
            */
        }
        //... others same as KernelMemorySaveAndLoad.cs
    }
}

Environment & Configuration

Operating system: Win10 / Win11
.NET runtime version: .NET 8
LLamaSharp version:0.23.0
CUDA version (if you are using cuda backend): 12.4 / 12.6
CPU & GPU device: Nvidia RTX 4060 Laptop / Nvidia RTX 4090 Laptop

Known Workarounds

It seems to be related with llama.cpp. Because I found possible related issue in llama.cpp: #12517. And the BUG in llama.cpp just resolved last week in #12545.

martindevans · 2025-04-03T13:20:09Z

LLamaSharp is currently using this version of llama.cpp (three weeks old). Hopefully this should be resolved once we upgrade to a newer version (no set timeline, but usually around once a month).

zsogitbe · 2025-04-19T04:40:59Z

Try to set the Embeddings parameter in ModelParams to true.

IoannisVarouxis · 2025-04-23T11:21:08Z

I get the same error when I create a LLamaSharpTextEmbeddingGenerator object by providing the config and the model weights (that I have already loaded previously)

        `LLamaSharpTextEmbeddingGenerator embeddingGenerator = new LLamaSharpTextEmbeddingGenerator(config, modelWeights);`

If I use the other ctor (by providing only the config object), the object is created as expected (this ofcourse means that it loads the model weights twice, once for the LlamaSharpTextGenerator & another for the LLamaSharpTextEmbeddingGenerator):

        `LLamaSharpTextEmbeddingGenerator embeddingGenerator = new LLamaSharpTextEmbeddingGenerator(config);`

The example KernelMemorySaveAndLoad.cs uses the WithLLamaSharpDefaults() extension method, which initially loads the model weights and then creates the text & embedding generators by providing the preloaded weights (which generates the ggml error as well).

The commit #1036 (c27cfde) contains a fix in LLama.KernelMemory/LLamaSharpTextEmbeddingGenerator.cs, in method LLamaSharpTextEmbeddingGenerator(LLamaSharpConfig config), where it removes the following two lines from ModelParams object:

            MainGpu = config.MainGpu,
            SplitMode = config.SplitMode,

Unfortunately these two properties are left intact in the other LLamaSharpTextEmbeddingGenerator constructor:

```

public LLamaSharpTextEmbeddingGenerator(LLamaSharpConfig config, LLamaWeights weights)
{
MaxTokens = (int?)config.ContextSize ?? 2048;

        var @params = new ModelParams(config.ModelPath)
        {
            ContextSize = config.ContextSize ?? 2048,
            GpuLayerCount = config.GpuLayerCount ?? 20,
            Embeddings = true,
            MainGpu = config.MainGpu,
            SplitMode = config.SplitMode,
            PoolingType = LLamaPoolingType.Mean,
        };
        _weights = weights;
        _embedder = new LLamaEmbedder(_weights, @params);
        _ownsEmbedder = true;
    }


I'm not sure if the fix in llama.cpp will also resolve this issue, however the MainGpu & SplitMode seems to be involved.

martindevans · 2025-04-23T12:57:33Z

Would you be interested in creating a PR to apply the same fixes to the other method?

IoannisVarouxis · 2025-04-23T13:37:47Z

Yes I can do it, but it won't be fast (even though the fix is small). I haven't built the LlamaSharp library in my machine, so 'surprises' should be expected... :)

zsogitbe · 2025-04-23T13:47:09Z

In the mean time I have found the problem. The distributed dll's are wrong. If you replace them, then the error disappears. I am wondering if we do not have a security problem here because the size of the dll's is double of what is normal... I do not recommend using the distributed dll's.
We have had a discussion about this before. Not sure how you get the CUDA dll's because I think that they are not compiled automatically on GitHub!?

martindevans · 2025-04-23T15:41:53Z

I think the CUDA DLLs are enormous because we don't specify any architectures, so they're built to include all of them. I haven't had time to investigate if we can do anything about that though.

Not sure how you get the CUDA dll's because I think that they are not compiled automatically on GitHub!?

Security of the DLLs we build and distribute is something I addressed (to the best of my ability) a while ago by moving it all to GitHub. We will never distribute any binaries that don't come from a GH build action!

That way you can check the filehashes and ensure that the build came from GitHub, and you can go back and read the build script (at that exact point in time) to ensure it's not doing anything shady. For example, here is the entry for the latest/current set of DLLs: https://github.com/SciSharp/LLamaSharpBinaries/releases/tag/d79d8f39b4da6.

That build process include the CUDA binaries.

zsogitbe · 2025-04-23T18:02:49Z

If you do not specify a cuda architecture, then there will only be 1 default architecture used (the devkit has one default) instead of two (what is the usual amount for us), therefore the size of the dll's should be even smaller. The only reason for having a much larger size could be that the build script adds maybe 4 architectures (this could explain the double size). You will need to check this, and that there is something wrong with the cuda 12 dll. If I compile it myself and replace it, then the error mentioned above will not appear.

I have also noted a few design problems with the embeddings. KernelMemory uses the built-in embedder, but that was downgraded with non-batching and no normalization, therefore it will not work well with KernelMemory. We will need to adjust the code in a similar way as it was before the modifications, but taking your update into account since that had also some reasons...

I will try to work on the examples and on this in the next few days.

zsogitbe · 2025-04-23T19:30:33Z

I have added some critical updates to the embedding generation for KernelMemory with this PR: #1170

IoannisVarouxis · 2025-04-24T07:00:32Z

In the mean time I have found the problem. The distributed dll's are wrong. If you replace them, then the error disappears. I am wondering if we do not have a security problem here because the size of the dll's is double of what is normal... I do not recommend using the distributed dll's. We have had a discussion about this before. Not sure how you get the CUDA dll's because I think that they are not compiled automatically on GitHub!?

I get the error with the CPU backend as well though. I will try to remove the SplitMode & MainGpu params from the config and see what happens (I'll let you know).

IoannisVarouxis · 2025-04-24T14:59:06Z

I think I have a working version of LLamaSharpTextEmbeddingGenerator(config, weights) constructor. The issue wasn't due to the MainGpu & SplitMode parameters, but to Embeddings that was set to True (statically). Changing the Embeddings property to false, it works for both CUDA & CPU backends (at least in my environment!).
I propose to add a bool Embeddings property to LLamaSharpConfig class, which is provided to LLamaSharpTextEmbeddingGenerator ctor and so we can control its' value (at least until the fix in llama.cpp bug is brought to LlamaSharp).

JLeaman99 mentioned this issue Apr 21, 2025

[BUG]: System.AccessViolationException on KernelMemory Example in Llama.Examples #1058

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG]: Get GGML_ASSERT when running KernelMemorySaveAndLoad.cs #1151

[BUG]: Get GGML_ASSERT when running KernelMemorySaveAndLoad.cs #1151

GianMeng commented Apr 3, 2025 •

edited

Loading

martindevans commented Apr 3, 2025

zsogitbe commented Apr 19, 2025

IoannisVarouxis commented Apr 23, 2025

martindevans commented Apr 23, 2025

IoannisVarouxis commented Apr 23, 2025

zsogitbe commented Apr 23, 2025

martindevans commented Apr 23, 2025 •

edited

Loading

zsogitbe commented Apr 23, 2025

zsogitbe commented Apr 23, 2025

IoannisVarouxis commented Apr 24, 2025

IoannisVarouxis commented Apr 24, 2025

[BUG]: Get GGML_ASSERT when running KernelMemorySaveAndLoad.cs #1151

[BUG]: Get GGML_ASSERT when running KernelMemorySaveAndLoad.cs #1151

Comments

GianMeng commented Apr 3, 2025 • edited Loading

Description

Reproduction Steps

Environment & Configuration

Known Workarounds

martindevans commented Apr 3, 2025

zsogitbe commented Apr 19, 2025

IoannisVarouxis commented Apr 23, 2025

martindevans commented Apr 23, 2025

IoannisVarouxis commented Apr 23, 2025

zsogitbe commented Apr 23, 2025

martindevans commented Apr 23, 2025 • edited Loading

zsogitbe commented Apr 23, 2025

zsogitbe commented Apr 23, 2025

IoannisVarouxis commented Apr 24, 2025

IoannisVarouxis commented Apr 24, 2025

GianMeng commented Apr 3, 2025 •

edited

Loading

martindevans commented Apr 23, 2025 •

edited

Loading