Skip to content

Move pathfinder to cuda-python top level #723

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 80 commits into
base: main
Choose a base branch
from

Conversation

rwgk
Copy link
Collaborator

@rwgk rwgk commented Jun 24, 2025

Description

Closes #719, #708


Note: The name is changed from path_finder to pathfinder. The word "pathfinder" is very common, and it avoids underscore/dash confusion between directory names and package names.


  • Move cuda.bindings.path_findercuda.pathfinder, with backward compatibility: cuda.bindings.path_finder forwards to cuda.pathfinder.

  • Make cuda-bindings dependent on cuda-pathfinder.

  • Remove 32-bit DLL names from SUPPORTED_WINDOWS_DLLS and add runtime guard in cuda.pathfinder.load_nvidia_dynamic_lib(): "requires 64-bit Python".

  • Add specific DynamicLibNotFound exception type to public API (this was RuntimeError before; now inherits from RuntimeError).

  • Run mypy-pathfinder from pre-commit (after systematic mypy cleanup of library code).

  • Adjust GitHub Actions jobs to the move.

  • Add nvidia_wheels_cu12 to [project.optional-dependencies] in cuda_pathfinder/pyproject.toml and ensure all libs are loaded successfully. Treat all NVIDIA libs as "supported".

  • Mark cuda.bindings.path_finder as version 1.0.0

Copy link
Contributor

copy-pr-bot bot commented Jun 24, 2025

Auto-sync is disabled for draft pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@leofang leofang self-requested a review June 24, 2025 12:51
@leofang leofang added cuda.pathfinder Everything related to the cuda.pathfinder module enhancement Any code-related improvements P0 High priority - Must do! labels Jun 24, 2025
@rwgk rwgk changed the title Move path_finder to cuda-python top level Move pathfinder to cuda-python top level Jun 24, 2025
Comment on lines 11 to 23
def load_nvidia_dynamic_lib(libname: str) -> LoadedDL:
"""Load a NVIDIA dynamic library by name.
Args:
libname: The name of the library to load (e.g. "cudart", "nvvm", etc.)
Returns:
A LoadedDL object containing the library handle and path
Raises:
RuntimeError: If the library cannot be found or loaded
"""
return _load_nvidia_dynamic_lib.load_lib(libname)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You could define this wrapper function in the _dynamic_libs module and just import it here. This way your __init__.py here wouldn't have a ton of function definitions.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is meant to be the public API in one view.

This way your __init__.py here wouldn't have a ton of function definitions.

I understood exactly that was your goal: a flat list of available APIs.

Note that I moved the docstring here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This file is meant to be the public API in one view.

Indeed, everything we define (or import) here that is not prefixed with _ constitutes the public API of the module cuda.path_finder. My above suggestion is just that we import load_nvidia_dynamic_lib rather than define it here.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That would make the public API far less obvious. E.g. to see the docstring, they'd need to open a private file.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think that's true. They could still do:

from cuda.path_finder import load_nvidia_dynamic_lib
help(load_nvidia_dynamic_lib)

Same as they would do now?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll only work interactively.

What I have now can be inspected in obvious ways directly in the sources, e.g. when looking at the sources on github. I can send URLs pointing to specific APIs in this init.py file.

Each function here will just be:

def function(...) -> ...:
    """docstring""
    return call_into_private_code(...)

That's exactly the public API, with a one-line call that's easy to ignore.

What's the point of hiding that away, especially hiding away the docstring and the type hints?

Copy link
Contributor

@shwina shwina Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That'll only work interactively.

The docstrings and type hints would also get picked up:

  • Sphinx (which we use to generate API docs
    -pydoc
  • the developers' IDEs/lsp

Those are primarily the ways consumers of a package interact with docstrings or type hints, rather than looking directly at the source.

What's the point of hiding that away, especially hiding away the docstring and the type hints?

It tightens the scope of __init__.py, whose job is:

  • to include any initialization code for the module
  • to import stuff from submodules that the module wants to expose
  • to define __all__ for the module if needed

As examples, we can look at the __init__.py from some other popular libraries:

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It tightens the scope of init.py, whose job is

The most important job: Show the public API

You didn't answer why you want to hide that away.

Copy link
Contributor

@shwina shwina Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm advocating for keeping function and type definitions outside of __init__.py. I don't think that "hides" anything from the user as we still import them here.

The main reason I'm advocating for that is because __init__.py typically only defines functions and types needed for module initialization, and imports anything else. I think the examples I linked to above are a good demonstration of that.

Defining functions and types beyond that serves to clutter __init__.py.

I would argue it makes the code base less navigable than more for people looking at the source. Do I expect the function load_nvidia_dynamic_libs to be defined in a file called _dynamic_libs.py, or a file called __init__.py?

If this all seems nitpicky, I apologize. While I do have a strong opinion here, I don't mind at all if another reviewer (or you, as the author of this PR) made the final call about this.

Copy link
Member

@leofang leofang Jun 25, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

cuda.core's __init__.py is a good example for what Ashwin suggested. As a developer who occasionally peeks into someone else's __init__.py, I certainly get confused why we are defining things in there directly, as opposed to properly organizing them in respective public/private modules, and just import them. What's imported are considered public APIs (including types). Does it make sense?

@rwgk
Copy link
Collaborator Author

rwgk commented Jun 25, 2025

I'm a bit lost, to be honest, and I’d appreciate some clarification so I can move this forward in the direction you want.

The goals that seemed most important to me originally (from a Slack post last Wednesday) were:

  1. Dependency isolation – avoid importing unrelated logic (e.g. dynamic lib loading) when only working with headers
  2. Public API should be obvious from the code – minimal need for external docs or tooling to understand what’s exposed

After Ashwin’s comment on Slack, I interpreted the request as: prioritize a flat public API, even if that means giving up on some of that isolation — especially when type hints are involved.

Then Leo wrote:

as opposed to properly organizing them in respective public/private modules

That seems more aligned with what I had before the last commit (e.g., a file like nvidia_dynamic_libs.py directly under pathfinder/). But that seemed in conflict with Ashwin’s preference for flatness.

I’m happy to implement what you both want here — I just need a bit more clarity.

If we go back to what I proposed last week:

from cuda.pathfinder import nvidia_dynamic_libs, nvidia_static_libs, nvidia_headers

nvidia_dynamic_libs.load("nvrtc")
nvidia_static_libs.find("cudart")
nvidia_headers.find_file("cuda.h")

The file structure was:

cuda/pathfinder/
├── __init__.py  # empty
├── nvidia_dynamic_libs.py
├── nvidia_static_libs.py
└── nvidia_headers.py

I thought that structure balanced modularity and clarity, and users could still import just what they needed. But it sounds like we might instead prefer:

from cuda.pathfinder import load_nvidia_dynamic_lib, find_nvidia_static_lib, find_nvidia_header_file

That keeps the API flat, but does give up on the isolation goal unless we resort to function-local imports or other workarounds.

Could you please help me understand: What would be the preferred file/module structure to go along with the flat API? Should I keep separate modules like nvidia_dynamic_libs.py and re-export in __init__.py, or move everything into a single file (what would be the name)?

@shwina
Copy link
Contributor

shwina commented Jun 25, 2025

Given there's some urgency for getting this PR in, @rwgk @leofang do you want to proceed with the PR as is and address the questions Ralf posed above as a follow-up?

@leofang
Copy link
Member

leofang commented Jun 26, 2025

At this point we need to follow up offline. I think there is a gap that we unfortunately did not really cover during the weekly meeting. I suggest we don't cancel the dev meeting tomorrow and use the time to finalize it.

This PR also needs at least @kkraus14's approval for the change of license.

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 7, 2025

/ok to test

@rwgk rwgk marked this pull request as ready for review July 7, 2025 20:05
Copy link
Contributor

copy-pr-bot bot commented Jul 7, 2025

Auto-sync is disabled for ready for review pull requests in this repository. Workflows must be run manually.

Contributors can view more details about this message here.

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 7, 2025

@kkraus14 @leofang This is ready for review now. Please see the PR description for a high-level overview of all the things included in this PR.

@@ -286,6 +292,11 @@ jobs:
- name: Set up compute-sanitizer
run: setup-sanitizer

- name: Run cuda.pathfinder tests with see_what_works
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not a blocker for getting this in, but in theory since we're just testing the viability of finding and loading libraries, we shouldn't need to run these tests with a GPU where we may need to split these tests into a separate workflow.

I think the only tricky part would be handling the driver, libcuda, where I'm not sure how easy it is to install and load without a GPU.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good.

I think it'll be useful to keep the see_what_works testing here, for very inexpensive extra test coverage, but probably testing can and should indeed be run on machines without a GPU.

Currently libcuda isn't covered by the pathfinder.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good.

I think it'll be useful to keep the see_what_works testing here, for very inexpensive extra test coverage, but probably testing can and should indeed be run on machines without a GPU.

Currently libcuda isn't covered by the pathfinder.

@@ -27,6 +27,7 @@ dynamic = [
"readme",
]
dependencies = [
"cuda-pathfinder",
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's a problem where when we release a 2.0 with breaking changes, then older releases of cuda_bindings would break from finding it.

* `cuda.pathfinder.load_nvidia_dynamic_lib(libname: str) -> LoadedDL`

* `cuda.pathfinder.LoadedDL`:
* `handle` (platform-specific type)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any concern with this being platform-specific type and API stability? Is this effectively the handle returned from dlopen on Linux?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for Linux it's a plain int, for Windows a pywintypes.HANDLE. This seems to be mainstream for each platform, and very unlikely to require changes.

I guess we could introduce a level of indirection (some wrapper type), but I figure that'll be far more trouble (confused users) than we could hope to avoid.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on the wrapper type. Last question since this is public API is is this handle needed to be a public attribute today? Do we expect usage outside of it how we use it for cuda.bindings?

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 7, 2025

/ok to test

kkraus14
kkraus14 previously approved these changes Jul 8, 2025
Copy link
Collaborator

@kkraus14 kkraus14 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Approved, but would like to finish the conversation on whether handle is needed to be a public attribute

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 8, 2025

Approved, but would like to finish the conversation on whether handle is needed to be a public attribute

Thanks!

I'm thinking of that as essential, with the primary use case in mind, like here:

It's unfortunate that there is no universal handle type, but I believe what we're using (in the checked-in code already) are de-facto standards.

Introducing our own handle type: presumed over-engineering, more likely to cause confusion than being helpful.

Hiding the handle object: presumed to cause usability hardships.

While writing this, I realized that I lost comments that I had earlier, about how to use the handle objects in cython. I'll work on adding them back here (in load_dl_common.py) asap:

if IS_WINDOWS:
    import pywintypes

    HandleType = pywintypes.HANDLE
else:
    HandleType = int

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 9, 2025

In my previous comment I wrote:

It's unfortunate that there is no universal handle type, but I believe what we're using (in the checked-in code already) are de-facto standards.

I quizzed the LLMs a bit, and I'm surprised to learn:

  • ChatGPT: ctypes.WinDLL (and CDLL) is much more widely used and idiomatic for dynamically loading and calling functions from DLLs in Python — especially in cross-platform projects like yours.

  • claude.ai: ctypes.windll is significantly more widely used than win32api for dynamic library loading, ...

@kkraus14 @leofang does that match your experience?

If yes, I believe I should rework load_dl_windows.py, in this PR, and change the LoadDL.handle type to ctypes.CDLL. It will be much nicer to have a platform-independent type.

(I'm also much better set up now to get this done quickly, compared to my situation a couple months ago: I have easy interactive access to a Windows development environment now, and fairly comprehensive CI testing is in place already.)

@kkraus14
Copy link
Collaborator

kkraus14 commented Jul 9, 2025

Yes, ctypes is probably the most common way that a shared library is loaded in Python in this way. That being said, I don't think we particularly want to encourage users to start making function calls into the loaded cuda library, so returning a CDLL doesn't feel like the right choice in my opinion.

My 2c: I think we should remove the handle attribute from being part of the public API for now. For our usage in cuda.bindings we can access an internal attribute in _handle or something similar for now and as we gauge usage we can see if / how we'd like to expose something like this.

If someone wanted a ctypes object based on LoadDL it would be quite straightforward: CDLL(LoadDL.abs_path) which I think is sufficient for now.

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 9, 2025

Yes, ctypes is probably the most common way that a shared library is loaded in Python in this way. That being said, I don't think we particularly want to encourage users to start making function calls into the loaded cuda library, so returning a CDLL doesn't feel like the right choice in my opinion.

My 2c: I think we should remove the handle attribute from being part of the public API for now. For our usage in cuda.bindings we can access an internal attribute in _handle or something similar for now and as we gauge usage we can see if / how we'd like to expose something like this.

If someone wanted a ctypes object based on LoadDL it would be quite straightforward: CDLL(LoadDL.abs_path) which I think is sufficient for now.

Cool! Then we can simplify and standardize to:

@dataclass
class LoadedDL:
    """Represents a loaded dynamic library with minimal public API."""
    abs_path: Optional[str]
    was_already_loaded_from_elsewhere: bool
    _handle: int  # Platform-agnostic unsigned pointer value

With that, win32api and pywintypes will appear exclusively in load_dl_windows.py.

This is based on the observation that ultimately _handle is a pointer type on all platforms:

For Linux that's extremely obvious:

void *dlopen(const char *filename, int flags);

For Windows we have to drill down:

HMODULE LoadLibraryEx(
LPCTSTR lpLibFileName, 
HANDLE hFile, 
DWORD dwFlags );

HMODULE — A handle to a module. This is the base address of the module in memory. HMODULE and HINSTANCE are the same in current versions of Windows, but represented different things in 16-bit Windows. This type is declared in WinDef.h as follows: typedef HINSTANCE HMODULE;

HINSTANCE — A handle to an instance. This is the base address of the module in memory. HMODULE and HINSTANCE are the same today, but represented different things in 16-bit Windows. This type is declared in WinDef.h as follows: typedef HANDLE HINSTANCE;

HANDLE — A handle to an object.This type is declared in WinNT.h as follows: typedef PVOID HANDLE;

PVOID — A pointer to any type. This type is declared in WinNT.h as follows: typedef void *PVOID;

With that, we just have to be careful to consistently cast from/to unsigned integers.

Linux: CDLL._handle is already an unsigned integer.

For Windows, claude.ai suggests:

def _handle_to_int(native_handle: pywintypes.HANDLE) -> int:
    """Convert Windows HANDLE to unsigned int."""
    # Windows HANDLE is already an integer representation of a pointer
    # Just ensure it's treated as unsigned
    handle_int = int(native_handle)
    if handle_int < 0:
        # Convert from signed to unsigned representation
        handle_int += 2**64  # Assuming 64-bit pointers
    return handle_int

rwgk added 4 commits July 9, 2025 09:25
…finder/tests pass, cuda_bindings/tests are broken).

Interactive testing:

CUDA_PATHFINDER_TEST_LOAD_NVIDIA_DYNAMIC_LIB_STRICTNESS=all_must_work pytest -ra -s -v tests/

python ../toolshed/run_cuda_pathfinder.py
This does not break any cuda_pathfinder tests. (cuda_bindings tests are "more broken".)
@rwgk
Copy link
Collaborator Author

rwgk commented Jul 9, 2025

/ok to test

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 9, 2025

@kkraus14 Could you please take another look?

The changes in cuda_pathfinder turned out to be really straightforward:

  • 46fd972 — Make LoadedDL.handle an unsigned integer also for Windows.
  • 428a2dc — Rename LoadedDL.handleLoadedDL._handle_uint
  • 4096885 — Move _handle_uint last in LoadedDL, to emphasize that this is private member.

The changes in cuda/bindings/path_finder.py probably look a bit hairy, but I believe in a month or two we can simply delete the entire file.

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 9, 2025

@kkraus14 I'm testing the mostly auto-generated load_dl_windows.py win32apictypes conversion under PR #751. Assuming it passes, we could add the changes to this PR, so that cuda-pathfinder does not have any external dependencies from the start.

@kkraus14
Copy link
Collaborator

kkraus14 commented Jul 9, 2025

I'm quite happy with the latest set of changes in terms of the state of the public API at this point.

As far as the type of the now private handle attribute, I would defer entirely to your best judgement on how it is typed and represented based on our own consumption of it.

Will do a deeper dive implementation review when I get some spare cycles.

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 9, 2025

As far as the type of the now private handle attribute, I would defer entirely to your best judgement on how it is typed and represented based on our own consumption of it.

I'm very happy with the latest state of the pathfinder code, as under PR #751.

Without your feedback, I'd probably have standardized on ctypes.CDLL, but standardizing on unsigned int instead seems clearly better for these reasons:

  • It doesn't get more robust / future-proof than that. We're not committing to any library.
  • We're not "inviting" people to make ctypes-based function calls into the loaded shared libraries.
  • For power users, a platform-agnostic convention will be the easiest to work with.

To the last point: unsigned ints are the de-facto standard under Linux, signed ints seem to be more common under Windows. In theory, we could go with that, but then people would always have to think twice. A simple "it's always an unsigned int" gives them a firm reference point on our side, then they only need to figure out what the "other side" (e.g. win32api) expects, and only if they actually need to interface.

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 9, 2025

Testing under PR #751 passed: https://github.com/NVIDIA/cuda-python/actions/runs/16180272303?pr=751

I'll hold off moving the changes over to this PR until after your review. (I assume the split makes the review a little easier.)

@rwgk
Copy link
Collaborator Author

rwgk commented Jul 10, 2025

For last-minute polishing, I worked under #751 on hardening the ruff.lint config (going by LLM suggestions) ... and I'm so glad I got to it, this was a pretty silly oversight:

ruff (legacy alias)......................................................Failed
- hook id: ruff
- exit code: 1

cuda_pathfinder/cuda/pathfinder/_dynamic_libs/load_dl_common.py:10:7: N818 Exception name `DynamicLibNotFound` should be named with an Error suffix
   |
10 | class DynamicLibNotFound(RuntimeError):
   |       ^^^^^^^^^^^^^^^^^^ N818
11 |     pass
   |

Found 1 error.

(The other changes in e90855e and 15c5791 are just nice to haves.)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
cuda.pathfinder Everything related to the cuda.pathfinder module enhancement Any code-related improvements P0 High priority - Must do!
Projects
Status: In Progress
Development

Successfully merging this pull request may close these issues.

[FEA]: Move path_finder to cuda-python top level
4 participants