Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Tests, examples fail to build with GGML_BACKEND_DL=ON #1120

Open
ckastner opened this issue Feb 23, 2025 · 8 comments
Open

Tests, examples fail to build with GGML_BACKEND_DL=ON #1120

ckastner opened this issue Feb 23, 2025 · 8 comments

Comments

@ckastner
Copy link
Contributor

When building with GGML_BACKEND_DL=ON, linker failures start to appear. For example, from test-opt.c:843:

$ cmake -B build -DGGML_NATIVE=OFF -DGGML_BACKEND_DL=ON
...

$ cmake --build build
...
[ 25%] Building CXX object tests/CMakeFiles/test-opt.dir/test-opt.cpp.o
[ 26%] Linking CXX executable ../bin/test-opt
/usr/bin/ld: CMakeFiles/test-opt.dir/test-opt.cpp.o: in function `main':
test-opt.cpp:(.text.startup+0xbf): undefined reference to `ggml_backend_is_cpu'
/usr/bin/ld: test-opt.cpp:(.text.startup+0xd4): undefined reference to `ggml_backend_cpu_set_n_threads'
collect2: error: ld returned 1 exit status
gmake[2]: *** [tests/CMakeFiles/test-opt.dir/build.make:102: bin/test-opt] Error 1
gmake[1]: *** [CMakeFiles/Makefile2:633: tests/CMakeFiles/test-opt.dir/all] Error 2
gmake: *** [Makefile:146: all] Error 2

I was looking into a PR to fix this, but I'm not sure to best go about it, or if this even needs fixing because it's working as intended.

Using the above example, when using GGML_BACKEND_DL, how would upstream guide to use ggml_backend_cpu_set_n_threads correctly? Should these be dlsymed by the caller?

My naive approach would have been to expect that there is some shim that takes care of this and that errors out when certain functionality is not available, which could be guarded by a ggml_backend_is_<foo>, but the latter two seems to be defined in the modules. But I'm not that familiar with the codebase yet, so perhaps this already exists.

@slaren
Copy link
Member

slaren commented Feb 23, 2025

Many tests are written assuming that the backends are linked statically and are not compatible with GGML_BACKEND_DL. These tests should be disabled when GGML_BACKEND_DL is enabled. test-opt is not normally built, so I suppose that you modified tests/CMakeLists.txt? That's kind of an important detail.

The way to deal with this and support dynamically loadable backends is to use ggml_backend_reg_get_proc_address to obtain the backend-specific functions, such as:
https://github.com/ggml-org/llama.cpp/blob/af7747c95ae71a5db4184947f837179e82c70b77/tests/test-backend-ops.cpp#L4483-L4487
However, this would require updating a lot of code, and it may not necessarily be better. I don't think it is a real problem if some tests require disabling GGML_BACKEND_DL.

@slaren
Copy link
Member

slaren commented Feb 23, 2025

Ah I see the problem, I thought we were on the llama.cpp repository. The problem is that the tests CMakeLists.txt has not been updated here in the same way as in llama.cpp.

We should instead address that by moving all ggml tests and build scripts to the ggml directory, rather than having this weird situation where llama.cpp has a completely different way of building and running the ggml tests.

@ckastner
Copy link
Contributor Author

Many tests are written assuming that the backends are linked statically and are not compatible with GGML_BACKEND_DL.

The way to deal with this and support dynamically loadable backends is to use ggml_backend_reg_get_proc_address to obtain the backend-specific functions, such as: https://github.com/ggml-org/llama.cpp/blob/af7747c95ae71a5db4184947f837179e82c70b77/tests/test-backend-ops.cpp#L4483-L4487

Thanks for the clarification, and for the example.

However, this would require updating a lot of code, and it may not necessarily be better. I don't think it is a real problem if some tests require disabling GGML_BACKEND_DL.

That's fine. I was more curious from pure user POV: do users need to treat an installation of ggml built with GGML_BACKEND_DL (substantially) differently in their code, or does ggml take care of encapsulating most or all of the backend stuff. So more like a plugin system, so to speak.

The tests themselves are not that important to me, they were just a proxy for how one might typically develop against libggml.

Concrete example: I'm building Debian packages, and I'd like to offer all possible backends, but as individually installable packages (-cpu, -blas, -cuda etc.), so that at installation time, a user may choose whatever fits their system best. On the surface, the DL approach seemed to point to that.

But I'd not go down that path if this is not the typical use case.

@slaren
Copy link
Member

slaren commented Feb 23, 2025

I'm building Debian packages, and I'd like to offer all possible backends, but as individually installable packages (-cpu, -blas, -cuda etc.), so that at installation time, a user may choose whatever fits their system best. On the surface, the DL approach seemed to point to that.

But I'd not go down that path if this is not the typical use case.

That's great and that's exactly the way it is intended to be used, so you are absolutely on the right path. However I feel that we need to polish some details before this can be done. For one, the only example of using dynamically loadable backends is llama.cpp, we need to update and add examples to the ggml repository. We need to start versioning ggml so that different applications can use the same ggml package. We also need to deal with issues such as duplicated devices from different backends, e.g. if you install both the CUDA and Vulkan backends, both backends may expose the same devices and that needs to be dealt with. We may also need to change the way dynamic backends are located, because as it is, they are searched only in the same directory as the executable, and that may not fit very well with the way the filesystem is organized in most linux distributions. I imagine that you would like to install the backends to /lib or similar, but it's not always easy to find what is the system lib directory, so maybe that needs to be changed.

So I would say it's pretty much a work in progress at this point. Any input that would help us create better packages for linux would be appreciated, ultimately that's one of the reasons GGML_BACKEND_DL was added.

@ckastner
Copy link
Contributor Author

That's great and that's exactly the way it is intended to be used, so you are absolutely on the right path. However I feel that we need to polish some details before this can be done [...]

Awesome, you touched on many other open questions that I collected during the packaging. Knowing that these are open challenges or at least topics with potential for change, rather than already-made "not our use case" design decisions, helps a lot downstream.

Some patches we ad locally because they are Debian-specific, but work we can merge back upstream is strongly preferable.

So I would say it's pretty much a work in progress at this point. Any input that would help us create better packages for linux would be appreciated, ultimately that's one of the reasons GGML_BACKEND_DL was added.

I'm very glad to hear that and look forward to sharing feedback back upstream in future.

Just a brief outlook on the Debian side: for now, a (local) design decision will probably be for ggml to be installed in a private libdir, as by policy, the lack of a stable SOVER precludes installation in one of the system paths (/usr/lib, /usr/lib/`). We thought that would be ok short-term as the first consumers would be our own llama.cpp and whisper.cpp packages, which we can adjust to work with this. And once we have a better feel for the package and API/ABI stability, we could change that. whisper.cpp already has a stable SONAME so that helps there.

We'd also just build "fat" packages for now, so eg: -cpu and -hip packages are actually full builds that are not co-installable, it's either/or. This is where my question for GGML_BACKEND_DL stems from.

The goal is to get something usable and performant (on par with built-from-source) out as soon as possible, and to frequently iterate on improvement. Knowing what improvement ideas are on or off the table really helps for future work.

Side note: Incidentally, being open-source focused, we've been doing a lot of work getting HIP packaged, and have our own CI with currently 17 different AMD GPU architectures ranging from consumer to Instinct. So on that end, we may be able to provide useful feedback upstream. That's also where the tests come in, as we continuously evaluate our built packages on dependency/reverse-dependency changes.

@slaren
Copy link
Member

slaren commented Feb 26, 2025

If the private libdir also includes the executables, then using GGML_BACKEND_DL together with GGML_CPU_ALL_VARIANTS should result in a package with compatibility for most CPU microarchitectures (only supported in x64 at the moment). If that's not possible, it will need to wait until we figure a better way to do this. The problem is that the backend loading mechanism needs to enumerate the files in a directory to determine what backends are available, and currently it only checks in the current directory and the executable directory.

Note that whisper.cpp does not support GGML_BACKEND_DL at the moment, so currently this would only work with llama.cpp. It would be a very small change to add support however, it only needs to add a call to ggml_backend_load_all.

@mbaudier
Copy link
Contributor

If the private libdir also includes the executables, then using GGML_BACKEND_DL together with GGML_CPU_ALL_VARIANTS should result in a package with compatibility for most CPU microarchitectures (only supported in x64 at the moment). If that's not possible, it will need to wait until we figure a better way to do this. The problem is that the backend loading mechanism needs to enumerate the files in a directory to determine what backends are available, and currently it only checks in the current directory and the executable directory.

As I have been discussing separately with @ckastner, this is the approach I have taken for our internal deb packaging since GGML_BACKEND_DL and GGML_CPU_ALL_VARIANTS have been available:

  • libggml and libggml-base are "proper" shared libraries under /usr/lib/*
  • the various backends libggml-cpu-* and libggml-* are under /usr/libexec/*/ggml
  • the llama-* and whisper-* executables are in the same /usr/libexec/*/ggml directory (so that they can find the backends) and then they are symlinked to /usr/bin in order to be in the PATH as expected.
$ ls -1 /usr/lib/x86_64-linux-gnu/{libggml*,libllama*,libwhisper*}
/usr/lib/x86_64-linux-gnu/libggml-base.so
/usr/lib/x86_64-linux-gnu/libggml-base.so.0
/usr/lib/x86_64-linux-gnu/libggml-base.so.0.0.1722
/usr/lib/x86_64-linux-gnu/libggml.so
/usr/lib/x86_64-linux-gnu/libggml.so.0
/usr/lib/x86_64-linux-gnu/libggml.so.0.0.1722
/usr/lib/x86_64-linux-gnu/libllama.so
/usr/lib/x86_64-linux-gnu/libllama.so.0
/usr/lib/x86_64-linux-gnu/libllama.so.0.0.4719
/usr/lib/x86_64-linux-gnu/libwhisper.so.1
/usr/lib/x86_64-linux-gnu/libwhisper.so.1.7.4
$ ls -1 /usr/libexec/x86_64-linux-gnu/ggml/*
/usr/libexec/x86_64-linux-gnu/ggml/libggml-cpu-alderlake.so
/usr/libexec/x86_64-linux-gnu/ggml/libggml-cpu-haswell.so
/usr/libexec/x86_64-linux-gnu/ggml/libggml-cpu-icelake.so
/usr/libexec/x86_64-linux-gnu/ggml/libggml-cpu-sandybridge.so
/usr/libexec/x86_64-linux-gnu/ggml/libggml-cpu-sapphirerapids.so
/usr/libexec/x86_64-linux-gnu/ggml/libggml-cpu-skylakex.so
/usr/libexec/x86_64-linux-gnu/ggml/libggml-vulkan.so
/usr/libexec/x86_64-linux-gnu/ggml/llama-bench
/usr/libexec/x86_64-linux-gnu/ggml/llama-cli
/usr/libexec/x86_64-linux-gnu/ggml/llama-quantize
/usr/libexec/x86_64-linux-gnu/ggml/whisper-talk-llama

(for i386 and arm64 only the libggml-cpu variant is built.)

This could be improved when the mechanisms searching for the backends are refined. But on the other hand it feels like it fits well with the ggml architecture to only have libggml and libggml-base exposed as shared libraries at system level (and for building upon), while the backends are flexibly gathered "out of sight" in libexec, depending on a given set of constraints and requirements.

But, as I understand from @ckastner, such an approach causes problems at this stage with other architectures/settings, and portability is crucial for Debian as well as for the ggml-related projects (especially given how fast hardware is moving in this field).

Note that whisper.cpp does not support GGML_BACKEND_DL at the moment, so currently this would only work with llama.cpp. It would be a very small change to add support however, it only needs to add a call to ggml_backend_load_all.

Yes, that is blocking me for finalizing a clean whisper.cpp packaging based on the above approach.

But I also had issues with parts of whisper.cpp's code requiring a libggml-cpu backend in order to build. It does not work cleanly in terms of packaging if a shared library is not properly exposed and packaged like libggml and libggml-base are above.

Is it something I could raise in the whisper.cpp issues? Ideally, whisper.cpp code would not need to link with libggml-cpu(-*).

@slaren
Copy link
Member

slaren commented Feb 28, 2025

Thanks for the detailed explanation, that's very helpful.

But, as I understand from @ckastner, such an approach causes problems at this stage with other architectures/settings, and portability is crucial for Debian as well as for the ggml-related projects (especially given how fast hardware is moving in this field).

Can you give me more details about the issues that you are seeing in other systems? GGML_CPU_ALL_VARIANTS is only supported on x64 at the moment, but GGML_BACKEND_DL should still work in other architectures.

It may also make sense to add a symbolic link in /usr/lib/x86_64-linux-gnu/libggml-cpu.so to the lowest supported CPU backend variant. This should be libggml-cpu-sandybridge.so, but we could also add a variant with just the standard linux amd64 ABI only that would work on every x64 system. This should allow applications that link directly to the CPU backend without GGML_BACKEND_DL to work using this variant of the CPU backend (albeit very slowly).

Yes, that is blocking me for finalizing a clean whisper.cpp packaging based on the above approach.

But I also had issues with parts of whisper.cpp's code requiring a libggml-cpu backend in order to build. It does not work cleanly in terms of packaging if a shared library is not properly exposed and packaged like libggml and libggml-base are above.

Is it something I could raise in the whisper.cpp issues? Ideally, whisper.cpp code would not need to link with libggml-cpu(-*).

whisper.cpp should support GGML_BACKEND_DL now (added in ggerganov/whisper.cpp#2843).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants