Implement venv/site-packages based binaries #2156

groodt · 2024-08-24T09:47:51Z

Context

This is a tracking issue to recognise that the lack of a site-packages layout causes friction when making use of third-party distribution packages (wheels and sdists) from indexes such as PyPI.

Outside bazel and rules_python, it is common for distribution packages to assume that they will be installed into a single site-packages folder, either in a "virtual environment" or directly into a python user or global site installation.

Notable examples are the libraries in the AI / ML ecosystem that make use of the nvidia CUDA shared libraries. These shared libraries contain relative rpath in the ELF/Mach-O/DLL which fail when not installed as siblings in a site-packages layout.

There is also a complication introduced into the rules due to lack of the single site-packages folder. Namespace packages in rules_python are all processed into pkg-util style namespace packages. This seems to work, but wouldn't be necessary if site-packages was used.

Another rare issue is failure to load *.pth files. Python provides Site-specific configuration hooks that can customize the sys.path at startup. rules_python could workaround this issue perhaps, but if a site-packages layout was used and discovered by the interpreter at startup, no workarounds would be necessary.

Distribution packages on PyPI known to have issues:

torch
onnxruntime-gpu
rerun-sdk

Known workarounds

Patch the third-party dependencies using rules_python patching support
Use an alternative set of rules such as rules_py
Patch the third-party dependencies outside rules_python and push the patched dependencies to a private index

def _preload_cuda_deps(lib_folder: str, lib_name: str) -> None:
    """Preloads cuda deps if they could not be found otherwise."""
    # Should only be called on Linux if default path resolution have failed
    assert platform.system() == 'Linux', 'Should only be called on Linux'
    import glob
    lib_path = None
    for path in sys.path:
        nvidia_path = os.path.join(path, 'nvidia')
        if not os.path.exists(nvidia_path):
            continue
        print(f"Checking nvidia_path {nvidia_path}")
        if "nvimgcodec" == lib_folder:
            candidate_lib_paths = glob.glob(os.path.join(nvidia_path, lib_folder, 'libnvimgcodec.so.*[0-9]'))
        else:
            candidate_lib_paths = glob.glob(os.path.join(nvidia_path, lib_folder, 'lib', lib_name))
        print(f"Found candidate_lib_paths {candidate_lib_paths}")
        if candidate_lib_paths and not lib_path:
            lib_path = candidate_lib_paths[0]
        print(f"Found lib_path {lib_path}")
        if lib_path:
            break
    print(f"Preloading {lib_name} from {lib_path}")
    if not lib_path:
        raise ValueError(f"{lib_name} not found in the system path {sys.path}")
    ctypes.CDLL(lib_path)
def preload_cuda_deps() -> None:
    cuda_libs: Dict[str, str] = {
        'cublas': 'libcublas.so.*[0-9]',
        'cudnn': 'libcudnn.so.*[0-9]',
        'cuda_nvrtc': 'libnvrtc.so.*[0-9].*[0-9]',
        'cuda_runtime': 'libcudart.so.*[0-9].*[0-9]',
        'cuda_cupti': 'libcupti.so.*[0-9].*[0-9]',
        'cufft': 'libcufft.so.*[0-9]',
        'curand': 'libcurand.so.*[0-9]',
        'cusolver': 'libcusolver.so.*[0-9]',
        'cusparse': 'libcusparse.so.*[0-9]',
        'nccl': 'libnccl.so.*[0-9]',
        'nvtx': 'libnvToolsExt.so.*[0-9]',
        'nvimgcodec': 'libnvimgcodec.so.*[0-9]',
    }
    for lib_folder, lib_name in cuda_libs.items():
        _preload_cuda_deps(lib_folder, lib_name)

I have several Nvidia libraries and I wanted to use 'from nvidia import nvimgcodec', but multiple libraries have their own 'nvidia' directory under site-packages (e.g., pip_deps_cublas_cu11/site-packages/nvidia/ and pip_deps_nvimagecodec_cu11/site-packages/nvidia/), and 'from nvidia' always directs me to the cublas library.

My workaround is to copy the nvimgcodec library from my local Python environment to the Bazel directory, place it under pip_deps_nvimagecodec_cu11/site-packages/nvidia_img/, and then use 'from nvidia_img import nvimgcodec'.

I also tried just copying the nvimgcodec library from pip_deps_nvimagecodec_cu11/site-packages/nvidia and modifying the linking, but that didn't work, so I copied it from my local environment instead.

I'm not sure if I can add this as a patch because it doesn't really make sense. Do you know if there's a better solution for this? Thanks so much for your help!

guoriyue · 2024-09-11T18:32:39Z

I also tried just copying the nvimgcodec library from pip_deps_nvimagecodec_cu11/site-packages/nvidia and modifying the linking, but that didn't work, so I copied it from my local environment instead.

By the way, for this, I mean I could use 'from nvidia_img import nvimgcodec,' but seems the library is not initialized correctly. When I try to run the sample code to get a decoder, it seems that I just get None. I'm not sure if it's related to the copying and re-linking.

from nvidia import nvimgcodec
decoder = nvimgcodec.Decoder()

1e100 · 2024-12-07T05:56:31Z

Could someone comment in which version of rules_python this is not broken? PyTorch did work before, at the very minimum. It'd be great to know if there's a rollback path.

1e100 · 2024-12-09T19:27:42Z

rules_py does seem to fix it, with some workarounds.

keith · 2024-12-09T19:28:48Z

Rules python has always worked this way. So yea it's not a regression.

rickeylev · 2025-02-04T17:05:05Z

I've renamed this to better reflect what is desired: to use a virtual environment with site-packages.

I've made a bit of progress on that front. --bootstrap_impl=script creates and uses a venv for the interpreter. So half the problem (use a venv) is solved.

The other half is populating the site-packages directory.

My current thinking is to have PyInfo carry info so that py_binary can then create symlinks in the site-packages directory. Probably a depset[tuple[str site_packages_path, str runfiles_actual_path]]. Or maybe depset[File marker]? It might be possible to combine the import strings (from attr.imports) and the marker file to infer the site-packages directories. For .pth files, I think we'd just glob() any (distro root level) pth files and stick them in site-packages? Which would go into another PyInfo field.

I haven't thought it through extensively or tried anything, yet. I mean, it sounds good at a high level. My optimism is tempered, though, because bootstrap=script has brought up more edge cases than I was hoping. It definitely would have gone better with some more eyes for design, exploration, and verification.

groodt · 2025-02-06T23:57:26Z

I was quite careful to avoid the mention of a "venv" in the original issue to be honest. 😂 The new title is probably more appropriate for a PR that addresses some/all of the combined friction described in this issue, but I don't think it matters too much as long as this issue is easily discoverable.

While a venv might be some path towards a solution to the friction I describe, it's likely just an implementation detail. For example, when packaging python applications into docker containers or distroless (without bazel), you don't need to use a venv and you typically just install into the site-packages folder of the interpreter directly. Using a venv in the bootstrap doesn't directly solve any of the friction outlined in the original issue.

I'm not sure on the commentary about .pth files. When using a traditional site-packages or any folder marked as a "site dir" (site.addsitedir(dir)), .pth files are handled automatically by the interpreter and standard lib.

The same is true for Namespace packages. In a normal "site dir", no special handling is required.

rickeylev · 2025-02-16T22:28:30Z

re: venv not relevant to solving the site-packages issue: ok, good points, fair enough. A key part of the issue is that, right now, the various libraries aren't collected into a single location ("site-packages"). Collecting them like that is problematic because each binary can have a different set of dependencies. Hence why I've closely associated venvs with the issue: the binary-specific venv, which has its own site-packages directory, is a natural place to collect them.

In any case, I have a PR almost ready for review. It has 3 key parts:

PyInfo.site_packages_symlinks. This allows propagating up information about what should be put into site-packages.
py_library.site_packages_root. This allows interpreter the files in srcs as following a site-packages layout; it feeds into the PyInfo field above
In py_binary, it uses the PyInfo field to populate the site-packages directory in its venv.

Note that the above only applies to bootstrap=script. One of the neat things about this is it's pretty cheap and supports multiple dependency closures in a single build.

Something I ran into as I implemented the above is how the pip repo rules handle namespace packages. Right now, they create a pkgutil style shim -- this confuses the build-phase logic into thinking they aren't namespace packages and causes conflicts trying to figure out what files to symlink. This should be easy to address, though.

I have several Nvidia libraries and I wanted to use 'from nvidia import nvimgcodec'
but multiple libraries have their own 'nvidia' directory under site-packages
pip_deps_cublas_cu11/site-packages/nvidia/ and pip_deps_nvimagecodec_cu11/site-packages/nvidia/

This seems problematic no matter what? Unless nvidia is a namespace package?

groodt pinned this issue Aug 24, 2024

groodt mentioned this issue Sep 24, 2024

Efficient way to relocate files in runfiles bazelbuild/bazel#23728

Open

aignas mentioned this issue Nov 14, 2024

PyQt6 .dll/.so load fails on Windows 10 and Ubuntu #508

Closed

aignas mentioned this issue Dec 10, 2024

--bootstrap_impl=script breaks pkg_tar, bazel-lib tar and py_binary with py_package + py_wheel #2489

Closed

rickeylev mentioned this issue Dec 23, 2024

2025 Priorities #2520

Open

aignas mentioned this issue Dec 26, 2024

Issues with PYTHONPATH resolution in recent python/rules_python versions #1221

Closed

rickeylev changed the title ~~Lack of site-packages breaks assumptions of third-party packages causing friction~~ Implement venv/site-packages based binaries Feb 4, 2025

rickeylev added the core-rules Issues concerning core bin/test/lib rules label Feb 4, 2025

aignas marked this as a duplicate of #2605 Feb 10, 2025

aignas mentioned this issue Feb 10, 2025

Recent versions of orbax-checkpoint fail during import #2605

Closed

rickeylev mentioned this issue Feb 17, 2025

feat: allow populating binary's venv site-packages with symlinks #2617

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Implement venv/site-packages based binaries #2156

Implement venv/site-packages based binaries #2156

groodt commented Aug 24, 2024 •

edited

Loading

guoriyue commented Sep 11, 2024 •

edited

Loading

guoriyue commented Sep 11, 2024

1e100 commented Dec 7, 2024

1e100 commented Dec 9, 2024

keith commented Dec 9, 2024

rickeylev commented Feb 4, 2025

groodt commented Feb 6, 2025

rickeylev commented Feb 16, 2025

Implement venv/site-packages based binaries #2156

Implement venv/site-packages based binaries #2156

Comments

groodt commented Aug 24, 2024 • edited Loading

Context

Known workarounds

Related

guoriyue commented Sep 11, 2024 • edited Loading

guoriyue commented Sep 11, 2024

1e100 commented Dec 7, 2024

1e100 commented Dec 9, 2024

keith commented Dec 9, 2024

rickeylev commented Feb 4, 2025

groodt commented Feb 6, 2025

rickeylev commented Feb 16, 2025

groodt commented Aug 24, 2024 •

edited

Loading

guoriyue commented Sep 11, 2024 •

edited

Loading