-
Notifications
You must be signed in to change notification settings - Fork 11k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
ggml : refactor ggml-cpu.c into multiple C++ source files #10180
Comments
@ggerganov @slaren We could keep public API headers C11/C++11 only and build ggml/llama.cpp internally with C++20 enabled, and maybe update contrib guidelines to explicitly state that C++20 is enabled internally but only widely supported features are allowed. C++20 support is fairly ubiquitous at this point. |
My opinion is that we should bump the standard only when we really need some functionality. For example, we are switching from C to C++ because the higher-level logic in The |
I think there are good reasons to bump the C++ standard, there are very useful features in C++14 and C+++17 that would help make the code simpler. We already have problems due to inconsistency in the coding style and features used in different parts of the code (mostly in llama.cpp), and I don't expect that bumping the standard would make it significantly worse. Ultimately, this is a matter of enforcing a code style by writing a coding standard and just not merging code that does not follow it. You can already write very hard to read template code in just C++98, there is no need to bump the standard for that. Probably more so, since the lack the newer features forces the use of things like SFINAE. However, I would expect that C++20 would break the build for a lot of people. Support is good in recent compiler versions, but many people are using very old environments, and telling them to just update their compiler is not always a option. We have workarounds to support very old versions of gcc because of this. Some people are using llama.cpp in very old systems and we should be careful about not breaking it for them. C++14 or 17 should be safer, though. |
Sounds good. Looks like Diego and I are mostly on the same page already (C++17 as the default works for me). Once we start refactoring the CPU backend I'll try to share some specific examples that benefit from latest features to revisit this and convince Georgi. |
@max-krasnyansky I'm OK to bump to C++17 - @slaren's opinion is good enough for me. I only hope we will be vigilant and utilize just the relevant features for our needs. |
c++11 is ancient. It's always hard dealing with older versions of c++ |
Just to bring water to the mill 😉
For me the main "problem" with c++11 is that it was the first release o many new interesting feature. As any first step it was not "perfect". The c++14 was a minor release that focus on solving some of the "defect". So for me c++14 is the minimal release to use. Now for c++17/20.. If I look at gcc (we may have to do the same with clang/VisualC++/...): https://gcc.gnu.org/projects/cxx-status.html
If I like to use some of there new feature, it may be not completely stable and need latest compiler release. If you use latest Fedora or next Rhel10 and have to create something new I'll really think of using it. But for this project with multi target code/os... For C++17:
And all Language Features is implemented for gcc8+. The oldest supported linux OS is RHEL8 that have GCC8.4 as default compiler (and devtoolset much more resent.) So If i had to vote I will vote for c++17. |
Yep. I think we're going with C++17 for now. |
some elements of may be more mdspan / mdarray (with will be available on c++23/26 at the end) that have c++17/20 "backport" |
I'll have a look (for the FP8 support ) at the change for cpu backend (ggml-cpu) I like to make that support more friendly with this change (in future MR?). For now, these elements are still grouped together in the same file (ggml_quant). If I'm correct for this "issues" we may have to splite it, and move the For now, these elements are still grouped together in the same file (ggml_quant). If I'm not wrong about this "issue", we might have to split it, Do you have any idea what structure you want?
Next: I'll may try to split my fp8 support. last: do you think we will/can use c++ for ggml-cpu.c (ie ggml-cpu.cpp) ? |
This is something that I am working on at the moment. The goal is to separate the CPU backend completely from the core ggml functionality, and there is still quite a bit of work left to do to achieve this.
I wouldn't be opposed to that, but I don't think it's critical either.
Yes, we only need to make the public interface compatible with C, the rest can be C++. |
@ggerganov Continuing the discussion from ggml-org/ggml#1121 (comment) (regarding code de-duplication). Unless someone's already working on this, could I take up the following two steps?
PR 1 - The first step is purely so that future diffs are easier to read/review. Otherwise combining the file rename step along with a massive refactor would be confusing to read. PR 2 - If the template style used for a few functions looks fine, then a lot more functions can be templatized progressively. I think a lot of functions are low-hanging fruits wrt function templates. I'm happy to take these two up, unless someone's already working on this. |
Sounds good, but be aware that likely a lot of code will need to be updated to work with the more strict C++ type system. A lot of the code in this file platform-specific, and checking that nothing is broken will take some work. |
@slaren Alternately, I could move a bunch of the operator functions to the existing This way, I can avoid the problems with a stricter type system, while still helping with reducing the size of the C file. PS: Sorry, not sure how I missed that there's already a |
Yes, that would also work. |
As per recent discussions (e.g. #10144 (review)), we should split the large
ggml-cpu.c
implementation into smaller modules - similar to how the CUDA backend is organized. We should utilizeC++11C++ to reduce code duplication.The text was updated successfully, but these errors were encountered: