Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

about PyTorch 2.5 install te #1309

Closed
klhhhhh opened this issue Nov 4, 2024 · 7 comments
Closed

about PyTorch 2.5 install te #1309

klhhhhh opened this issue Nov 4, 2024 · 7 comments

Comments

@klhhhhh
Copy link

klhhhhh commented Nov 4, 2024

1. the error log of installing from source

pip install .
Processing /global/u2/k/klhhhhh/TransformerEngine
  Installing build dependencies ... done
  Getting requirements to build wheel ... error
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [17 lines of output]
      Traceback (most recent call last):
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/tmp/pip-build-env-unwcz34q/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 333, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=[])
        File "/tmp/pip-build-env-unwcz34q/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 303, in _get_build_requires
          self.run_setup()
        File "/tmp/pip-build-env-unwcz34q/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 521, in run_setup
          super().run_setup(setup_script=setup_script)
        File "/tmp/pip-build-env-unwcz34q/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 319, in run_setup
          exec(code, locals())
        File "<string>", line 37, in <module>
      ModuleNotFoundError: No module named 'torch'
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

2. my torch version

torch                    2.5.1
torchaudio               2.5.1
torchvision              0.20.1

3. my python, cuda & cudnn version

python 3.9.18
cuda 12.4
cudnn 9.01
@klhhhhh
Copy link
Author

klhhhhh commented Nov 4, 2024

I follow this guide to install:

# Clone repository, checkout stable branch, clone submodules
git clone --branch stable --recursive https://github.com/NVIDIA/TransformerEngine.git

cd TransformerEngine
export NVTE_FRAMEWORK=pytorch   # Optionally set framework
pip install .                   # Build and install

https://docs.nvidia.com/deeplearning/transformer-engine/user-guide/installation.html

@klhhhhh
Copy link
Author

klhhhhh commented Nov 4, 2024

new log, but I have cudnn installed ...

      In file included from /pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/torch/include/ATen/cudnn/Handle.h:4,
                       from /tmp/pip-install-09kq6iqa/transformer-engine-torch_82f7491f7d0549d7b08336753af73906/csrc/common.h:14,
                       from /tmp/pip-install-09kq6iqa/transformer-engine-torch_82f7491f7d0549d7b08336753af73906/csrc/common.cu:7:
      /pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/torch/include/ATen/cudnn/cudnn-wrapper.h:3:10: fatal error: cudnn.h: No such file or directory
          3 | #include <cudnn.h>
            |          ^~~~~~~~~
      compilation terminated.
      ninja: build stopped: subcommand failed.
      Traceback (most recent call last):
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2104, in _run_ninja_build
          subprocess.run(
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/subprocess.py", line 528, in run
          raise CalledProcessError(retcode, process.args,
      subprocess.CalledProcessError: Command '['ninja', '-v', '-j', '1']' returned non-zero exit status 1.
      
      The above exception was the direct cause of the following exception:
      
      Traceback (most recent call last):
        File "<string>", line 2, in <module>
        File "<pip-setuptools-caller>", line 34, in <module>
        File "/tmp/pip-install-09kq6iqa/transformer-engine-torch_82f7491f7d0549d7b08336753af73906/setup.py", line 53, in <module>
          setuptools.setup(
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/setuptools/__init__.py", line 153, in setup
          return distutils.core.setup(**attrs)
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/distutils/core.py", line 148, in setup
          dist.run_commands()
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/distutils/dist.py", line 966, in run_commands
          self.run_command(cmd)
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/wheel/_bdist_wheel.py", line 378, in run
          self.run_command("build")
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/distutils/command/build.py", line 135, in run
          self.run_command(cmd_name)
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/distutils/cmd.py", line 313, in run_command
          self.distribution.run_command(command)
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/distutils/dist.py", line 985, in run_command
          cmd_obj.run()
        File "/tmp/pip-install-09kq6iqa/transformer-engine-torch_82f7491f7d0549d7b08336753af73906/build_tools/build_ext.py", line 129, in run
          super().run()
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 79, in run
          _build_ext.run(self)
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/distutils/command/build_ext.py", line 340, in run
          self.build_extensions()
        File "/tmp/pip-install-09kq6iqa/transformer-engine-torch_82f7491f7d0549d7b08336753af73906/build_tools/build_ext.py", line 255, in build_extensions
          super().build_extensions()
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 868, in build_extensions
          build_ext.build_extensions(self)
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/distutils/command/build_ext.py", line 449, in build_extensions
          self._build_extensions_serial()
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/distutils/command/build_ext.py", line 474, in _build_extensions_serial
          self.build_extension(ext)
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/setuptools/command/build_ext.py", line 202, in build_extension
          _build_ext.build_extension(self, ext)
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/Cython/Distutils/build_ext.py", line 135, in build_extension
          super(build_ext, self).build_extension(ext)
        File "/global/common/software/nersc/pe/conda-envs/24.1.0/python-3.9/nersc-python/lib/python3.9/distutils/command/build_ext.py", line 529, in build_extension
          objects = self.compiler.compile(sources,
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 681, in unix_wrap_ninja_compile
          _write_ninja_file_and_compile_objects(
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 1784, in _write_ninja_file_and_compile_objects
          _run_ninja_build(
        File "/pscratch/sd/k/klhhhhh/envs/megatron/lib/python3.9/site-packages/torch/utils/cpp_extension.py", line 2120, in _run_ninja_build
          raise RuntimeError(message) from e
      RuntimeError: Error compiling objects for extension

@klhhhhh
Copy link
Author

klhhhhh commented Nov 4, 2024

after

export CUDNN_PATH=/opt/nersc/pe/modulefiles/cudnn/8.9.3_cuda12.lua

it fixed
#918

@klhhhhh
Copy link
Author

klhhhhh commented Nov 5, 2024

but I still encountered another error:

/tmp/pip-req-build-t606l4qk/transformer_engine/common/util/cuda_driver.cpp:9:10: fatal error: filesystem: No such file or directory
       #include <filesystem>
                ^~~~~~~~~~~~
      compilation terminated.

@ptrendx
Copy link
Member

ptrendx commented Nov 5, 2024

What is your g++ version? filesystem is a feature of C++17, which was added in g++ 8.1.

You can also try our prebuilt pip wheels via pip install transformer_engine[pytorch].

@klhhhhh
Copy link
Author

klhhhhh commented Nov 6, 2024

What is your g++ version? filesystem is a feature of C++17, which was added in g++ 8.1.

You can also try our prebuilt pip wheels via pip install transformer_engine[pytorch].

hello, thanks for your help.
my gcc version is 12.3, and I've tried a lot of ways, but it still can not work.
eg.

CXXFLAGS="-std=c++17 -D_GLIBCXX_USE_CXX11_ABI=0 -lstdc++fs" pip install git+https://github.com/NVIDIA/TransformerEngine.git@stable

@klhhhhh
Copy link
Author

klhhhhh commented Nov 11, 2024

I've finished all installation, thanks.

@klhhhhh klhhhhh closed this as completed Nov 11, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants