Skip to content

Missing oneCCL libs in 1.13.100+gpu #43

@robogast

Description

@robogast

Hi! I've installed oneccl_bindings_for_pt==1.13.100+gpu from https://developer.intel.com/ipex-whl-stable-xpu, but after installing I get a "libccl.so.1 not found" error:

$ python                                                                                                                                                                                                                                                  
Python 3.10.4 (main, Oct 26 2022, 02:21:10) [GCC 11.3.0] on linux                                                                                                                                                                                                                                                                                                                   
Type "help", "copyright", "credits" or "license" for more information.                                                                                                                                                                                                                                                                                                              
>>> import oneccl_bindings_for_pytorch                                                                                                                                                                                                                                                                                                                                              
Traceback (most recent call last):                                                                                                                                                                                                                                                                                                                                                  
  File "<stdin>", line 1, in <module>                                                                                                                                                                                                                                                                                                                                               
  File "/gpfs/home5/robertsc/2D-VQ-AE-2/.venv/py310-XPU/lib/python3.10/site-packages/oneccl_bindings_for_pytorch/__init__.py", line 26, in <module>                                                                                                                                                                                                                                 
    from . import _C as ccl_lib                                                                                                                                                                                                                                                                                                                                                     
ImportError: libccl.so.1: cannot open shared object file: No such file or directory 

It seems like including oneCCL was forgotten in the latest build, because when I check a previous version (1.13.0+cpu) libccl.so.1 is included in oneccl_bindings_for_pytorch:

$ grep -r libccl.so.1
Binary file lib/python3.10/site-packages/oneccl_bindings_for_pytorch/lib/libccl.so.1.0 matches
Binary file lib/python3.10/site-packages/oneccl_bindings_for_pytorch/lib/libccl.so.1 matches
Binary file lib/python3.10/site-packages/oneccl_bindings_for_pytorch/lib/libccl.so matches
Binary file lib/python3.10/site-packages/oneccl_bindings_for_pytorch/lib/liboneccl_bindings_for_pytorch.so matches
lib/python3.10/site-packages/oneccl_bind_pt-1.13.0+cpu.dist-info/RECORD:oneccl_bindings_for_pytorch/lib/libccl.so.1,sha256=QsFq3umZ-WRQHD69SAZ9ilXdYcEwwZfBVS4b8P48KjQ,4544872
lib/python3.10/site-packages/oneccl_bind_pt-1.13.0+cpu.dist-info/RECORD:oneccl_bindings_for_pytorch/lib/libccl.so.1.0,sha256=QsFq3umZ-WRQHD69SAZ9ilXdYcEwwZfBVS4b8P48KjQ,4544872
[robertsc@int4 py310-AMX]$ 

But in the 1.13.100+gpu version it's missing:

$ grep -r libccl.so.1
Binary file lib/python3.10/site-packages/oneccl_bindings_for_pytorch/lib/liboneccl_bindings_for_pytorch_xpu.so matches
Binary file lib/python3.10/site-packages/oneccl_bindings_for_pytorch/lib/liboneccl_bindings_for_pytorch.so matches

As a temporary fix I can install oneccl-devel==2021.8.0 from pypi, which still bundles it:

$ grep -r libccl.so.1
Binary file lib/cpu_gpu_dpcpp/libccl.so.1.0 matches
Binary file lib/cpu_gpu_dpcpp/libccl.so.1 matches
Binary file lib/cpu_gpu_dpcpp/libccl.so matches
Binary file lib/python3.10/site-packages/oneccl_bindings_for_pytorch/lib/liboneccl_bindings_for_pytorch_xpu.so matches
Binary file lib/python3.10/site-packages/oneccl_bindings_for_pytorch/lib/liboneccl_bindings_for_pytorch.so matches
lib/python3.10/site-packages/oneccl_devel-2021.8.0.dist-info/RECORD:../../cpu/libccl.so.1,sha256=Mb1k7Cr0EMbtwcPLheTP5ipnzpMYizaUkqVlKC7SJ-s,4847184
lib/python3.10/site-packages/oneccl_devel-2021.8.0.dist-info/RECORD:../../cpu/libccl.so.1.0,sha256=Mb1k7Cr0EMbtwcPLheTP5ipnzpMYizaUkqVlKC7SJ-s,4847184
lib/python3.10/site-packages/oneccl_devel-2021.8.0.dist-info/RECORD:../../cpu_gpu_dpcpp/libccl.so.1,sha256=bYQ16wi5o1aOEmM-x3n2G1-3GVXjVzDsL15XpNRu5u0,7543928
lib/python3.10/site-packages/oneccl_devel-2021.8.0.dist-info/RECORD:../../cpu_gpu_dpcpp/libccl.so.1.0,sha256=bYQ16wi5o1aOEmM-x3n2G1-3GVXjVzDsL15XpNRu5u0,7543928
Binary file lib/cpu/libccl.so.1.0 matches
Binary file lib/cpu/libccl.so.1 matches
Binary file lib/cpu/libccl.so matches

The default build option is to ship with oneCCL, perhaps this flag was accidentally wrongly set while building the latest version?
Could you please re-build with the latest oneCCL version? :)

Edit: same for Intel-MPI, libs and bins are also missing

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions