Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

mistral-rs: fix cuda support #343254

Open
wants to merge 1 commit into
base: master
Choose a base branch
from

Conversation

GaetanLepage
Copy link
Contributor

@GaetanLepage GaetanLepage commented Sep 20, 2024

Description of changes

cc @SomeoneSerge

Things done

  • Built on platform(s)
    • x86_64-linux
    • aarch64-linux
    • x86_64-darwin
    • aarch64-darwin
  • For non-Linux: Is sandboxing enabled in nix.conf? (See Nix manual)
    • sandbox = relaxed
    • sandbox = true
  • Tested, as applicable:
  • Tested compilation of all packages that depend on this change using nix-shell -p nixpkgs-review --run "nixpkgs-review rev HEAD". Note: all changes have to be committed, also see nixpkgs-review usage
  • Tested basic functionality of all binary files (usually in ./result/bin/)
  • 24.11 Release Notes (or backporting 23.11 and 24.05 Release notes)
    • (Package updates) Added a release notes entry if the change is major or breaking
    • (Module updates) Added a release notes entry if the change is significant
    • (Module addition) Added a release notes entry if adding a new NixOS module
  • Fits CONTRIBUTING.md.

Add a 👍 reaction to pull requests you find important.

@GaetanLepage
Copy link
Contributor Author

Result of nixpkgs-review pr 343254 run on aarch64-linux 1

1 package built:
  • mistral-rs

@GaetanLepage
Copy link
Contributor Author

Result of nixpkgs-review pr 343254 run on x86_64-linux 1

1 package built:
  • mistral-rs

let
patchelfCommand = binaryName: ''
patchelf \
--add-rpath ${
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we use autoPatchelfHook with runtimeDependencies (Since there are only this two executables)?

https://nixos.org/manual/nixpkgs/unstable/#setup-hook-autopatchelfhook

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the tip. I went and try it and it doesn't work weirdly...
I'm pushing the changes.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The ldd output is the same, but I guess that the rpath is not about that.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

image

This is the result of patchelf --print-rpath for the two variants.

Comment on lines 210 to 227
runtimeDependencies = lib.optionals cudaSupport [
cudaPackages.libcublas
cudaPackages.libcurand
];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
runtimeDependencies = lib.optionals cudaSupport [
cudaPackages.libcublas
cudaPackages.libcurand
];

Try this. I guess it's because autoPatchelfHook overrides rpath (--set-rpath) instead of prepending/appending (--add-rpath) it.

if rpath:
print("setting RPATH to:", rpath_str)
subprocess.run(
["patchelf", "--set-rpath", rpath_str, path.as_posix()] + extra_args,
check=True)

But you already have these two in buildInputs, and it seems unneeded to add them to runtimeDependencies again, see torch as an example:

nativeBuildInputs = lib.optionals stdenv.isLinux [
addDriverRunpath
autoPatchelfHook
autoAddDriverRunpath
];
buildInputs = lib.optionals stdenv.isLinux (
with cudaPackages;
[
# $out/${sitePackages}/nvfuser/_C*.so wants libnvToolsExt.so.1 but torch/lib only ships
# libnvToolsExt-$hash.so.1
cuda_nvtx
cuda_cudart
cuda_cupti
cuda_nvrtc
cudnn
libcublas
libcufft
libcurand
libcusolver
libcusparse
nccl
]
);
autoPatchelfIgnoreMissingDeps = lib.optionals stdenv.isLinux [
# This is the hardware-dependent userspace driver that comes from
# nvidia_x11 package. It must be deployed at runtime in
# /run/opengl-driver/lib or pointed at by LD_LIBRARY_PATH variable, rather
# than pinned in runpath
"libcuda.so.1"
];

Copy link
Member

@Aleksanaa Aleksanaa Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we have to add something like extraRuntimeDependencies to autoPatchelfHook then.

Copy link
Member

@Aleksanaa Aleksanaa Sep 23, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also try this:

autoPatchelfIgnoreMissingDeps = lib.optionals stdenv.isLinux [
# This is the hardware-dependent userspace driver that comes from
# nvidia_x11 package. It must be deployed at runtime in
# /run/opengl-driver/lib or pointed at by LD_LIBRARY_PATH variable, rather
# than pinned in runpath
"libcuda.so.1"
];

I guess runtimeDependencies is still not necessary, unless cuda libraries are loaded by dlopen

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for your suggestions !
I tried removing the

runtimeDependencies = lib.optionals cudaSupport [
  cudaPackages.libcublas
  cudaPackages.libcurand
];

block but it doesn't work better.
Those libs are loaded at runtime (dlopen I guess) and thus are not seen as required by patchelf`.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess runtimeDependencies is still not necessary, unless cuda libraries are loaded by dlopen

This works for the cuda lib but it still complains about cublas...

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I guess we have to add something like extraRuntimeDependencies to autoPatchelfHook then.

appendRunpathsArray exists, used for adding $ORIGIN in a few places

@Aleksanaa
Copy link
Member

@GaetanLepage Could you fix this to a usable state? I suppose it's at least better than nothing

@FliegendeWurst FliegendeWurst added the 6.topic: cuda Parallel computing platform and API label Oct 29, 2024
@GaetanLepage
Copy link
Contributor Author

@GaetanLepage Could you fix this to a usable state? I suppose it's at least better than nothing

Sure. Maybe let's merge the update first and then I'll rebase this branch.
#352064

@Aleksanaa
Copy link
Member

Done

@GaetanLepage
Copy link
Contributor Author

Currently fails to build with cudaSupport enabled:

error[E0063]: missing field `seed_value` in initializer of `cuda_backend::device::CudaDevice`
   --> /build/cargo-vendor-dir/candle-core-0.7.2/src/cuda_backend/device.rs:203:12
    |
203 |         Ok(Self {
    |            ^^^^ missing `seed_value`

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
6.topic: cuda Parallel computing platform and API 10.rebuild-darwin: 1-10 10.rebuild-darwin: 1 10.rebuild-linux: 1-10 10.rebuild-linux: 1 11.by: package-maintainer This PR was created by the maintainer of the package it changes
Projects
Status: New
Development

Successfully merging this pull request may close these issues.

4 participants