Skip to content
This repository has been archived by the owner on Oct 19, 2024. It is now read-only.

Install Alpa without GPU #943

Open
AlbertZhangHIT opened this issue Jul 6, 2023 · 1 comment
Open

Install Alpa without GPU #943

AlbertZhangHIT opened this issue Jul 6, 2023 · 1 comment

Comments

@AlbertZhangHIT
Copy link

AlbertZhangHIT commented Jul 6, 2023

Please describe the bug
Alpa-modified jaxlib can not be built on CPU-only environment.

Please describe the expected behavior

System information and environment

  • OS Platform and Distribution (e.g., Linux Ubuntu 16.04, docker): Linux Ubuntu 18.04
  • Python version: 3.7
  • CUDA version: None
  • NCCL version: None
  • cupy version: None
  • GPU model and memory: None
  • Alpa version: 0.2.3
  • TensorFlow version: 2.11
  • JAX version: 0.3.22

To Reproduce
Steps to reproduce the behavior:

  1. cd build_jaxlib
  2. python build/build.py --dev_install --bazel_options=--override_repository=org_tensorflow=$(pwd)/../third_party/tensorflow-alpa
  3. See error: Failed to build alpa-jax without GPU
Bazel binary path: ./bazel-5.1.1-linux-x86_64
Bazel version: 5.1.1
Python binary path: /usr/bin/python
Python version: 3.7
NumPy version: 1.21.6
MKL-DNN enabled: yes
Target CPU: x86_64
Target CPU features: release
CUDA enabled: no
TPU enabled: no
Remote TPU enabled: no
ROCm enabled: no
Plugin device enabled: no

Building XLA and installing it in the jaxlib source tree...
./bazel-5.1.1-linux-x86_64 run --verbose_failures=true :build_wheel -- --output_path=/home/stack/softwares/alpa/build_jaxlib/dist --cpu=x86_64 --dev_install
Extracting Bazel installation...
Starting local Bazel server and connecting to it...
INFO: Options provided by the client:
  Inherited 'common' options: --isatty=0 --terminal_columns=80
INFO: Reading rc options for 'run' from /home/stack/softwares/alpa/build_jaxlib/.bazelrc:
  Inherited 'common' options: --experimental_repo_remote_exec
INFO: Reading rc options for 'run' from /home/stack/softwares/alpa/build_jaxlib/.bazelrc:
  Inherited 'build' options: --apple_platform_type=macos --macos_minimum_os=10.14 --announce_rc --define open_source_build=true --spawn_strategy=standalone --enable_platform_specific_config --experimental_cc_shared_library --define=no_aws_support=true --define=no_gcp_support=true --define=no_hdfs_support=true --define=no_kafka_support=true --define=no_ignite_support=true --define=grpc_no_ares=true -c opt --config=short_logs --copt=-DMLIR_PYTHON_PACKAGE_PREFIX=jaxlib.mlir. --@org_tensorflow//tensorflow/compiler/xla/python:enable_gpu=false --@org_tensorflow//tensorflow/compiler/xla/python:enable_tpu=false --@org_tensorflow//tensorflow/compiler/xla/python:enable_plugin_device=false
INFO: Reading rc options for 'run' from /home/stack/softwares/alpa/build_jaxlib/.jax_configure.bazelrc:
  Inherited 'build' options: --strategy=Genrule=standalone --repo_env PYTHON_BIN_PATH=/usr/bin/python --action_env=PYENV_ROOT --python_path=/usr/bin/python --distinct_host_configuration=false --override_repository=org_tensorflow=/home/stack/softwares/alpa/build_jaxlib/../third_party/tensorflow-alpa --config=avx_posix --config=mkl_open_source_only
INFO: Found applicable config definition build:short_logs in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --output_filter=DONT_MATCH_ANYTHING
INFO: Found applicable config definition build:avx_posix in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --copt=-mavx --host_copt=-mavx
INFO: Found applicable config definition build:mkl_open_source_only in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --define=tensorflow_mkldnn_contraction_kernel=1
INFO: Found applicable config definition build:linux in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --config=posix --copt=-Wno-unknown-warning-option --copt=-Wno-stringop-truncation --copt=-Wno-array-parameter
INFO: Found applicable config definition build:posix in file /home/stack/softwares/alpa/build_jaxlib/.bazelrc: --copt=-fvisibility=hidden --copt=-Wno-sign-compare --cxxopt=-std=c++17 --host_cxxopt=-std=c++17
INFO: Reading rc options for 'run' from /home/stack/softwares/alpa/build_jaxlib/.bazelrc:
  Inherited 'build' options: --apple_platform_type=macos --macos_minimum_os=10.14 --announce_rc --define open_source_build=true --spawn_strategy=standalone --enable_platform_specific_config --experimental_cc_shared_library --define=no_aws_support=true --define=no_gcp_support=true --define=no_hdfs_support=true --define=no_kafka_support=true --define=no_ignite_support=true --define=grpc_no_ares=true -c opt --config=short_logs --copt=-DMLIR_PYTHON_PACKAGE_PREFIX=jaxlib.mlir. --@org_tensorflow//tensorflow/compiler/xla/python:enable_gpu=false --@org_tensorflow//tensorflow/compiler/xla/python:enable_tpu=false --@org_tensorflow//tensorflow/compiler/xla/python:enable_plugin_device=false
ERROR: @org_tensorflow//tensorflow/compiler/xla/python:enable_gpu :: Error loading option @org_tensorflow//tensorflow/compiler/xla/python:enable_gpu: error loading package '': Every .bzl file must have a corresponding package, but '//third_party/ducc:workspace.bzl' does not have one. Please create a BUILD file in the same or any parent directory. Note that this BUILD file does not need to do anything except exist.
b''
Traceback (most recent call last):
  File "build/build.py", line 580, in <module>
    main()
  File "build/build.py", line 575, in main
    shell(command)
  File "build/build.py", line 53, in shell
    output = subprocess.check_output(cmd)
  File "/usr/lib64/python3.7/subprocess.py", line 421, in check_output
    **kwargs).stdout
  File "/usr/lib64/python3.7/subprocess.py", line 522, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['./bazel-5.1.1-linux-x86_64', 'run', '--verbose_failures=true', ':build_wheel', '--', '--output_path=/home/stack/softwares/alpa/build_jaxlib/dist', '--cpu=x86_64', '--dev_install']' returned non-zero exit status 2.

Screenshots
If applicable, add screenshots to help explain your problem.

Code snippet to reproduce the problem

Additional information
Add any other context about the problem here or include any logs that would be helpful to diagnose the problem.

@matthewygf
Copy link

matthewygf commented Jan 22, 2024

Seems to happen for CUDA build as well here.

...
CUDA enabled: yes
NCCL enabled: yes
...

EDIT:
Just to answer myself here, it works for me when I made sure

  1. Jax is at the commit 41417ee or version 0.3.22.
  2. somehow during git clone, most of the symlinks don't work properly at build_jaxlib, so I had to recreate them, jax, jaxlib and third_party symlinks

Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants