-
Notifications
You must be signed in to change notification settings - Fork 89
Closed
Description
Hello,
I am currently trying to train the equivariant transformer model using the OptimizedDistance module by replacing the call to Distance() with OptimizedDistance() in torchmd-net/torchmdnet/models/torchmd_et.py. I want to train on a system with periodic boundary conditions. However, when I try running the training, I get the following traceback:
Traceback (most recent call last):
File "/home/frankhu/torchmd-net/torchmdnet/scripts/train.py", line 189, in <module>
main()
File "/home/frankhu/torchmd-net/torchmdnet/scripts/train.py", line 137, in main
model = LNNP(args, prior_model=prior_models, mean=data.mean, std=data.std)
File "/home/frankhu/torchmd-net/torchmdnet/module.py", line 29, in __init__
self.model = create_model(self.hparams, prior_model, mean, std)
File "/home/frankhu/torchmd-net/torchmdnet/models/model.py", line 70, in create_model
representation_model = TorchMD_ET(
File "/home/frankhu/torchmd-net/torchmdnet/models/torchmd_et.py", line 118, in __init__
self.distance = OptimizedDistance(
File "/home/frankhu/torchmd-net/torchmdnet/models/utils.py", line 199, in __init__
from torchmdnet.neighbors import get_neighbor_pairs_kernel
File "/home/frankhu/torchmd-net/torchmdnet/neighbors/__init__.py", line 15, in <module>
compile_extension()
File "/home/frankhu/torchmd-net/torchmdnet/neighbors/__init__.py", line 11, in compile_extension
cpp_extension.load(
File "/home/frankhu/mambaforge/envs/torchmd-net/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1269, in load
return _jit_compile(
File "/home/frankhu/mambaforge/envs/torchmd-net/lib/python3.10/site-packages/torch/utils/cpp_extension.py", line 1453, in _jit_compile
version = JIT_EXTENSION_VERSIONER.bump_version_if_changed(
File "/home/frankhu/mambaforge/envs/torchmd-net/lib/python3.10/site-packages/torch/utils/_cpp_extension_versioner.py", line 45, in bump_version_if_changed
hash_value = hash_source_files(hash_value, source_files)
File "/home/frankhu/mambaforge/envs/torchmd-net/lib/python3.10/site-packages/torch/utils/_cpp_extension_versioner.py", line 15, in hash_source_files
with open(filename) as file:
FileNotFoundError: [Errno 2] No such file or directory: '/home/frankhu/torchmd-net/torchmdnet/neighbors/backwards.cu'
I saw in a previous commit that this file was removed, but it seems like the model cannot proceed with training without it. For reference, here is the change I made within torchmd_et.py:
self.distance = OptimizedDistance(
cutoff_lower,
cutoff_upper,
max_num_pairs = -max_num_neighbors,
return_vecs = False,
loop = False,
strategy = 'brute',
include_transpose = True,
resize_to_fit = False,
check_errors = False,
box = torch.diag(torch.tensor(pbc_box))
)
I am running on one Nvidia H100 GPU. Any help/clarification would be greatly appreciated.
Thank you!
Metadata
Metadata
Assignees
Labels
No labels