-
Notifications
You must be signed in to change notification settings - Fork 3.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[GPU] GOSS boosting error on GPU H100 #6811
Comments
Thanks for using LightGBM. Are you able to share a minimal, reproducible example? Or at least, the exact parameters you passed to LightGBM? The LightGBM functions you use and confirguration you pass to them changes what underlying code is called. Providing details like that reduces the effort required to investigate this. |
Sorry, but I haven't any code because of NDA
|
Does it for you, on the H100(s) you have access to? You could help reduce the effort to debug this by coming up with a self-contained minimal example like that, which shouldn't be affected by any NDA if it's using publicly-available data and non-proprietary code like that. |
Ok, I'll try it out |
Description
I have encountered the following error while training binary classification task with lightgbm 4.5.0 on H100 and
device="cuda"
:Traceback (most recent call last):
File "/usr/local/lib/python3.9/site-packages/pywrapper_utils/run_thread/full_batch_run_thread.py", line 47, in _execute_user_function
result = self.user_main_function(**kwargs)
File "/opt/module/source/main.py", line 31, in main
model.perform_all_calculations()
File "/opt/module/source/model/feature_selector.py", line 61, in perform_all_calculations
selected_features: List[Tuple] = self.select_features(base_model, kfold)
File "/opt/module/source/model/feature_selector.py", line 84, in select_features
model.fit(X_train, y_train)
File "/tmp/.local/lib/python3.9/site-packages/lightgbm/sklearn.py", line 1284, in fit
super().fit(
File "/tmp/.local/lib/python3.9/site-packages/lightgbm/sklearn.py", line 955, in fit
self._Booster = train(
File "/tmp/.local/lib/python3.9/site-packages/lightgbm/engine.py", line 307, in train
booster.update(fobj=fobj)
File "/tmp/.local/lib/python3.9/site-packages/lightgbm/basic.py", line 4135, in update
_safe_call(
File "/tmp/.local/lib/python3.9/site-packages/lightgbm/basic.py", line 296, in _safe_call
raise LightGBMError(_LIB.LGBM_GetLastError().decode("utf-8"))
lightgbm.basic.LightGBMError: [CUDA] invalid argument /tmp/pip-install-9rgzugd6/lightgbm_37941d8e64514c0e844ef71f72ef6b9c/src/boosting/goss.hpp 63
Environment info
python3.9
cuda 12.4
scikit-learn==1.6.1
Command(s) you used to install LightGBM
The text was updated successfully, but these errors were encountered: