ggml : refactor ggml-cpu.c into multiple C++ source files #10180

ggerganov · 2024-11-05T07:12:48Z

As per recent discussions (e.g. #10144 (review)), we should split the large ggml-cpu.c implementation into smaller modules - similar to how the CUDA backend is organized. We should utilize ~~C++11~~ C++ to reduce code duplication.

The text was updated successfully, but these errors were encountered:

max-krasnyansky · 2024-11-05T18:42:56Z

@ggerganov @slaren
Any thoughts on bumping C++11 to C++20?
I'd love to enable things like std::span (C++20), std::string_view (C++17) and other C++17 features from atomic and thread libraries (shared_mutex, hardware_interference_size, etc) since they make the code a lot cleaner while keeping it efficient.

We could keep public API headers C11/C++11 only and build ggml/llama.cpp internally with C++20 enabled, and maybe update contrib guidelines to explicitly state that C++20 is enabled internally but only widely supported features are allowed.

C++20 support is fairly ubiquitous at this point.
It's often shown as partial but mostly due to the modules and a few tricky features, all the core bits have been supported for a while now.
https://en.cppreference.com/w/cpp/compiler_support#C.2B.2B20_library_features
https://clang.llvm.org/cxx_status.html
https://gcc.gnu.org/projects/cxx-status.html

ggerganov · 2024-11-05T20:03:04Z

My opinion is that we should bump the standard only when we really need some functionality. For example, we are switching from C to C++ because the higher-level logic in ggml can really benefit from having dynamic containers. We will also try to utilize templates to reduce the code for the ops implementation as a function of the data types, since it is becoming very cumbersome and difficult to maintain large chunks of copy-pasted code. But overall, we should be mindful about the C++ that we use and try to keep things minimal. It's easy for example to slip some extra unnecessary memory allocations by using the wrong containers, or respectively make the code too difficult to understand if templates are overused.

The std::span, std::string_view, etc. don't seem to be something that is really needed in the codebase atm, so I don't think it justifies to bump the standard for mostly QoL improvements. If we do that, the code will likely become less consistent due to being able to do the same thing in different ways. Additionally the compile-time tends to explode if we are not careful with the C++ features and headers that we use.

slaren · 2024-11-05T20:55:11Z

I think there are good reasons to bump the C++ standard, there are very useful features in C++14 and C+++17 that would help make the code simpler. We already have problems due to inconsistency in the coding style and features used in different parts of the code (mostly in llama.cpp), and I don't expect that bumping the standard would make it significantly worse. Ultimately, this is a matter of enforcing a code style by writing a coding standard and just not merging code that does not follow it. You can already write very hard to read template code in just C++98, there is no need to bump the standard for that. Probably more so, since the lack the newer features forces the use of things like SFINAE.

However, I would expect that C++20 would break the build for a lot of people. Support is good in recent compiler versions, but many people are using very old environments, and telling them to just update their compiler is not always a option. We have workarounds to support very old versions of gcc because of this. Some people are using llama.cpp in very old systems and we should be careful about not breaking it for them. C++14 or 17 should be safer, though.

max-krasnyansky · 2024-11-06T05:14:23Z

Sounds good. Looks like Diego and I are mostly on the same page already (C++17 as the default works for me). Once we start refactoring the CPU backend I'll try to share some specific examples that benefit from latest features to revisit this and convince Georgi.

ggerganov · 2024-11-06T08:55:57Z

@max-krasnyansky I'm OK to bump to C++17 - @slaren's opinion is good enough for me. I only hope we will be vigilant and utilize just the relevant features for our needs.

sirus20x6 · 2024-11-06T09:15:06Z

c++11 is ancient. It's always hard dealing with older versions of c++

Djip007 · 2024-11-08T20:30:24Z

Just to bring water to the mill 😉

c++11 is ancient. It's always hard dealing with older versions of c++

For me the main "problem" with c++11 is that it was the first release o many new interesting feature. As any first step it was not "perfect". The c++14 was a minor release that focus on solving some of the "defect". So for me c++14 is the minimal release to use.

Now for c++17/20.. If I look at gcc (we may have to do the same with clang/VisualC++/...): https://gcc.gnu.org/projects/cxx-status.html

C++20 Support in GCC:
[...]
Important: Because the ISO C++20 standard is very recent, GCC's support is experimental.

If I like to use some of there new feature, it may be not completely stable and need latest compiler release. If you use latest Fedora or next Rhel10 and have to create something new I'll really think of using it. But for this project with multi target code/os...

For C++17:

C++17 mode is the default since GCC 11.

And all Language Features is implemented for gcc8+. The oldest supported linux OS is RHEL8 that have GCC8.4 as default compiler (and devtoolset much more resent.)
With FP8 I had use std++17 and look all check pass on all target. And there is nice elements (like "if constexpr" ...) that make some code more simple/readable.

So If i had to vote I will vote for c++17.

max-krasnyansky · 2024-11-08T20:46:20Z

Yep. I think we're going with C++17 for now.
Though as I mentioned earlier that warning about C++20 being new is primarily due to the modules which are not yet supported. I was mainly interested in std::span from C++20 which has been fully supported for a few years now.

Djip007 · 2024-11-08T21:39:38Z

I was mainly interested in std::span from C++20 which has been fully supported for a few years now.

some elements of std::span need c++23 and 26 😉

may be more mdspan / mdarray (with will be available on c++23/26 at the end) that have c++17/20 "backport"
https://github.com/kokkos/mdspan

Djip007 · 2024-11-08T21:40:51Z

I'll have a look (for the FP8 support ) at the change for cpu backend (ggml-cpu)

I like to make that support more friendly with this change (in future MR?).
So it look we split the "convertion" part (quantize_row_zzz / dequantize_row_zzz) and the "compute" part (ggml_vec_dot_zzz)

For now, these elements are still grouped together in the same file (ggml_quant). If I'm correct for this "issues" we may have to splite it, and move the
ggml_vec_dot_zz in the backend?

For now, these elements are still grouped together in the same file (ggml_quant). If I'm not wrong about this "issue", we might have to split it,
and move the ggml_vec_dot_zz into the backend?

Do you have any idea what structure you want?
ie: for exemple:

create a ggml-cpu directory
add vec_dot.h/.cpp in it
move the ggml_vec_dot_ inside

Next:
for the C++ refactor, do you think we can use the namespace for "clean" naming:
For example, use the ggml:cpu namespace for all the internal functions of this backend.
This will lead to a name change for example: ggml_vec_dot_zz => ggml::vec ::dot_...

I'll may try to split my fp8 support.

last: do you think we will/can use c++ for ggml-cpu.c (ie ggml-cpu.cpp) ?

slaren · 2024-11-08T22:46:30Z

For now, these elements are still grouped together in the same file (ggml_quant). If I'm correct for this "issues" we may have to splite it, and move the ggml_vec_dot_zz in the backend?

This is something that I am working on at the moment. The goal is to separate the CPU backend completely from the core ggml functionality, and there is still quite a bit of work left to do to achieve this.

or the C++ refactor, do you think we can use the namespace for "clean" naming

I wouldn't be opposed to that, but I don't think it's critical either.

last: do you think we will/can use c++ for ggml-cpu.c (ie ggml-cpu.cpp) ?

Yes, we only need to make the public interface compatible with C, the rest can be C++.

cmdr2 · 2025-02-25T14:42:26Z

@ggerganov Continuing the discussion from ggml-org/ggml#1121 (comment) (regarding code de-duplication).

Unless someone's already working on this, could I take up the following two steps?

Submit a PR that only ~~renames~~ moves ggml-cpu.c to ggml-cpu.cpp. no other changes. Edit: Moves the code to ggml-cpu.cpp, my apologies.
A second PR that templatizes a few functions, e.g. the binbcast ones that were added recently in ggml in my PR.

PR 1 - The first step is purely so that future diffs are easier to read/review. Otherwise combining the file rename step along with a massive refactor would be confusing to read.

PR 2 - If the template style used for a few functions looks fine, then a lot more functions can be templatized progressively. I think a lot of functions are low-hanging fruits wrt function templates.

I'm happy to take these two up, unless someone's already working on this.

slaren · 2025-02-25T14:52:29Z

Submit a PR that only renames ggml-cpu.c to ggml-cpu.cpp. no other changes.

Sounds good, but be aware that likely a lot of code will need to be updated to work with the more strict C++ type system. A lot of the code in this file platform-specific, and checking that nothing is broken will take some work.

cmdr2 · 2025-02-25T16:02:40Z

@slaren Alternately, I could move a bunch of the operator functions to the existing ggml-cpu.cpp file, and templatize them. Not all at once, but progressively, while leaving the rest of ggml-cpu.c as it is.

This way, I can avoid the problems with a stricter type system, while still helping with reducing the size of the C file.

PS: Sorry, not sure how I missed that there's already a ggml-cpu.cpp file (while posting my earlier message)! :)

slaren · 2025-02-25T16:16:59Z

Yes, that would also work.

ggerganov added the refactoring Refactoring label Nov 5, 2024

ggerganov added this to ggml : roadmap Nov 5, 2024

ggerganov moved this to Todo in ggml : roadmap Nov 5, 2024

slaren mentioned this issue Nov 20, 2024

llama : add .clang-format file #10415

Merged

github-actions bot added the stale label Dec 10, 2024

ggerganov removed the stale label Dec 14, 2024

github-actions bot added the stale label Jan 14, 2025

ggerganov removed the stale label Jan 23, 2025

ggerganov added the roadmap Part of a roadmap project label Feb 4, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

ggml : refactor ggml-cpu.c into multiple C++ source files #10180

ggml : refactor ggml-cpu.c into multiple C++ source files #10180

ggerganov commented Nov 5, 2024 •

edited

Loading

max-krasnyansky commented Nov 5, 2024

ggerganov commented Nov 5, 2024

slaren commented Nov 5, 2024 •

edited

Loading

max-krasnyansky commented Nov 6, 2024

ggerganov commented Nov 6, 2024

sirus20x6 commented Nov 6, 2024

Djip007 commented Nov 8, 2024

max-krasnyansky commented Nov 8, 2024

Djip007 commented Nov 8, 2024 •

edited

Loading

Djip007 commented Nov 8, 2024

slaren commented Nov 8, 2024

cmdr2 commented Feb 25, 2025 •

edited

Loading

slaren commented Feb 25, 2025

cmdr2 commented Feb 25, 2025 •

edited

Loading

slaren commented Feb 25, 2025

ggml : refactor ggml-cpu.c into multiple C++ source files #10180

ggml : refactor ggml-cpu.c into multiple C++ source files #10180

Comments

ggerganov commented Nov 5, 2024 • edited Loading

max-krasnyansky commented Nov 5, 2024

ggerganov commented Nov 5, 2024

slaren commented Nov 5, 2024 • edited Loading

max-krasnyansky commented Nov 6, 2024

ggerganov commented Nov 6, 2024

sirus20x6 commented Nov 6, 2024

Djip007 commented Nov 8, 2024

max-krasnyansky commented Nov 8, 2024

Djip007 commented Nov 8, 2024 • edited Loading

Djip007 commented Nov 8, 2024

slaren commented Nov 8, 2024

cmdr2 commented Feb 25, 2025 • edited Loading

slaren commented Feb 25, 2025

cmdr2 commented Feb 25, 2025 • edited Loading

slaren commented Feb 25, 2025

ggerganov commented Nov 5, 2024 •

edited

Loading

slaren commented Nov 5, 2024 •

edited

Loading

Djip007 commented Nov 8, 2024 •

edited

Loading

cmdr2 commented Feb 25, 2025 •

edited

Loading

cmdr2 commented Feb 25, 2025 •

edited

Loading