Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ggml : refactor ggml-cpu.c into multiple C++ source files #10180

Open
ggerganov opened this issue Nov 5, 2024 · 15 comments
Open

ggml : refactor ggml-cpu.c into multiple C++ source files #10180

ggerganov opened this issue Nov 5, 2024 · 15 comments
Labels
refactoring Refactoring roadmap Part of a roadmap project

Comments

@ggerganov
Copy link
Member

ggerganov commented Nov 5, 2024

As per recent discussions (e.g. #10144 (review)), we should split the large ggml-cpu.c implementation into smaller modules - similar to how the CUDA backend is organized. We should utilize C++11 C++ to reduce code duplication.

@ggerganov ggerganov added the refactoring Refactoring label Nov 5, 2024
@ggerganov ggerganov moved this to Todo in ggml : roadmap Nov 5, 2024
@max-krasnyansky
Copy link
Collaborator

@ggerganov @slaren
Any thoughts on bumping C++11 to C++20?
I'd love to enable things like std::span (C++20), std::string_view (C++17) and other C++17 features from atomic and thread libraries (shared_mutex, hardware_interference_size, etc) since they make the code a lot cleaner while keeping it efficient.

We could keep public API headers C11/C++11 only and build ggml/llama.cpp internally with C++20 enabled, and maybe update contrib guidelines to explicitly state that C++20 is enabled internally but only widely supported features are allowed.

C++20 support is fairly ubiquitous at this point.
It's often shown as partial but mostly due to the modules and a few tricky features, all the core bits have been supported for a while now.
https://en.cppreference.com/w/cpp/compiler_support#C.2B.2B20_library_features
https://clang.llvm.org/cxx_status.html
https://gcc.gnu.org/projects/cxx-status.html

@ggerganov
Copy link
Member Author

My opinion is that we should bump the standard only when we really need some functionality. For example, we are switching from C to C++ because the higher-level logic in ggml can really benefit from having dynamic containers. We will also try to utilize templates to reduce the code for the ops implementation as a function of the data types, since it is becoming very cumbersome and difficult to maintain large chunks of copy-pasted code. But overall, we should be mindful about the C++ that we use and try to keep things minimal. It's easy for example to slip some extra unnecessary memory allocations by using the wrong containers, or respectively make the code too difficult to understand if templates are overused.

The std::span, std::string_view, etc. don't seem to be something that is really needed in the codebase atm, so I don't think it justifies to bump the standard for mostly QoL improvements. If we do that, the code will likely become less consistent due to being able to do the same thing in different ways. Additionally the compile-time tends to explode if we are not careful with the C++ features and headers that we use.

@slaren
Copy link
Member

slaren commented Nov 5, 2024

I think there are good reasons to bump the C++ standard, there are very useful features in C++14 and C+++17 that would help make the code simpler. We already have problems due to inconsistency in the coding style and features used in different parts of the code (mostly in llama.cpp), and I don't expect that bumping the standard would make it significantly worse. Ultimately, this is a matter of enforcing a code style by writing a coding standard and just not merging code that does not follow it. You can already write very hard to read template code in just C++98, there is no need to bump the standard for that. Probably more so, since the lack the newer features forces the use of things like SFINAE.

However, I would expect that C++20 would break the build for a lot of people. Support is good in recent compiler versions, but many people are using very old environments, and telling them to just update their compiler is not always a option. We have workarounds to support very old versions of gcc because of this. Some people are using llama.cpp in very old systems and we should be careful about not breaking it for them. C++14 or 17 should be safer, though.

@max-krasnyansky
Copy link
Collaborator

Sounds good. Looks like Diego and I are mostly on the same page already (C++17 as the default works for me). Once we start refactoring the CPU backend I'll try to share some specific examples that benefit from latest features to revisit this and convince Georgi.

@ggerganov
Copy link
Member Author

@max-krasnyansky I'm OK to bump to C++17 - @slaren's opinion is good enough for me. I only hope we will be vigilant and utilize just the relevant features for our needs.

@sirus20x6
Copy link

c++11 is ancient. It's always hard dealing with older versions of c++

@Djip007
Copy link
Contributor

Djip007 commented Nov 8, 2024

Just to bring water to the mill 😉

c++11 is ancient. It's always hard dealing with older versions of c++

For me the main "problem" with c++11 is that it was the first release o many new interesting feature. As any first step it was not "perfect". The c++14 was a minor release that focus on solving some of the "defect". So for me c++14 is the minimal release to use.

Now for c++17/20.. If I look at gcc (we may have to do the same with clang/VisualC++/...): https://gcc.gnu.org/projects/cxx-status.html

C++20 Support in GCC:
[...]
Important: Because the ISO C++20 standard is very recent, GCC's support is experimental.

If I like to use some of there new feature, it may be not completely stable and need latest compiler release. If you use latest Fedora or next Rhel10 and have to create something new I'll really think of using it. But for this project with multi target code/os...

For C++17:

C++17 mode is the default since GCC 11.

And all Language Features is implemented for gcc8+. The oldest supported linux OS is RHEL8 that have GCC8.4 as default compiler (and devtoolset much more resent.)
With FP8 I had use std++17 and look all check pass on all target. And there is nice elements (like "if constexpr" ...) that make some code more simple/readable.

So If i had to vote I will vote for c++17.

@max-krasnyansky
Copy link
Collaborator

Yep. I think we're going with C++17 for now.
Though as I mentioned earlier that warning about C++20 being new is primarily due to the modules which are not yet supported. I was mainly interested in std::span from C++20 which has been fully supported for a few years now.

@Djip007
Copy link
Contributor

Djip007 commented Nov 8, 2024

I was mainly interested in std::span from C++20 which has been fully supported for a few years now.

some elements of std::span need c++23 and 26 😉

may be more mdspan / mdarray (with will be available on c++23/26 at the end) that have c++17/20 "backport"
https://github.com/kokkos/mdspan

@Djip007
Copy link
Contributor

Djip007 commented Nov 8, 2024

I'll have a look (for the FP8 support ) at the change for cpu backend (ggml-cpu)

I like to make that support more friendly with this change (in future MR?).
So it look we split the "convertion" part (quantize_row_zzz / dequantize_row_zzz) and the "compute" part (ggml_vec_dot_zzz)

For now, these elements are still grouped together in the same file (ggml_quant). If I'm correct for this "issues" we may have to splite it, and move the
ggml_vec_dot_zz in the backend?

For now, these elements are still grouped together in the same file (ggml_quant). If I'm not wrong about this "issue", we might have to split it,
and move the ggml_vec_dot_zz into the backend?

Do you have any idea what structure you want?
ie: for exemple:

  • create a ggml-cpu directory
  • add vec_dot.h/.cpp in it
  • move the ggml_vec_dot_ inside

Next:
for the C++ refactor, do you think we can use the namespace for "clean" naming:
For example, use the ggml:cpu namespace for all the internal functions of this backend.
This will lead to a name change for example: ggml_vec_dot_zz => ggml::vec ::dot_...

I'll may try to split my fp8 support.

last: do you think we will/can use c++ for ggml-cpu.c (ie ggml-cpu.cpp) ?

@slaren
Copy link
Member

slaren commented Nov 8, 2024

For now, these elements are still grouped together in the same file (ggml_quant). If I'm correct for this "issues" we may have to splite it, and move the ggml_vec_dot_zz in the backend?

This is something that I am working on at the moment. The goal is to separate the CPU backend completely from the core ggml functionality, and there is still quite a bit of work left to do to achieve this.

or the C++ refactor, do you think we can use the namespace for "clean" naming

I wouldn't be opposed to that, but I don't think it's critical either.

last: do you think we will/can use c++ for ggml-cpu.c (ie ggml-cpu.cpp) ?

Yes, we only need to make the public interface compatible with C, the rest can be C++.

@github-actions github-actions bot added the stale label Dec 10, 2024
@ggerganov ggerganov removed the stale label Dec 14, 2024
@github-actions github-actions bot added the stale label Jan 14, 2025
@ggerganov ggerganov removed the stale label Jan 23, 2025
@ggerganov ggerganov added the roadmap Part of a roadmap project label Feb 4, 2025
@cmdr2
Copy link
Contributor

cmdr2 commented Feb 25, 2025

@ggerganov Continuing the discussion from ggml-org/ggml#1121 (comment) (regarding code de-duplication).

Unless someone's already working on this, could I take up the following two steps?

  1. Submit a PR that only renames moves ggml-cpu.c to ggml-cpu.cpp. no other changes. Edit: Moves the code to ggml-cpu.cpp, my apologies.
  2. A second PR that templatizes a few functions, e.g. the binbcast ones that were added recently in ggml in my PR.

PR 1 - The first step is purely so that future diffs are easier to read/review. Otherwise combining the file rename step along with a massive refactor would be confusing to read.

PR 2 - If the template style used for a few functions looks fine, then a lot more functions can be templatized progressively. I think a lot of functions are low-hanging fruits wrt function templates.

I'm happy to take these two up, unless someone's already working on this.

@slaren
Copy link
Member

slaren commented Feb 25, 2025

  1. Submit a PR that only renames ggml-cpu.c to ggml-cpu.cpp. no other changes.

Sounds good, but be aware that likely a lot of code will need to be updated to work with the more strict C++ type system. A lot of the code in this file platform-specific, and checking that nothing is broken will take some work.

@cmdr2
Copy link
Contributor

cmdr2 commented Feb 25, 2025

@slaren Alternately, I could move a bunch of the operator functions to the existing ggml-cpu.cpp file, and templatize them. Not all at once, but progressively, while leaving the rest of ggml-cpu.c as it is.

This way, I can avoid the problems with a stricter type system, while still helping with reducing the size of the C file.

PS: Sorry, not sure how I missed that there's already a ggml-cpu.cpp file (while posting my earlier message)! :)

@slaren
Copy link
Member

slaren commented Feb 25, 2025

Yes, that would also work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
refactoring Refactoring roadmap Part of a roadmap project
Projects
None yet
Development

No branches or pull requests

6 participants