Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Ability to use different GPU in cluster in seggpt.py #4

Open
natehaddon opened this issue Oct 13, 2023 · 2 comments
Open

Ability to use different GPU in cluster in seggpt.py #4

natehaddon opened this issue Oct 13, 2023 · 2 comments
Assignees

Comments

@natehaddon
Copy link

Currently the code is hard coded to use device:0 with device = ("cuda" if torch.cuda.is_available() else "cpu"). It would be a great enhancement to also be able to set an environment variable with the GPU ID that we would want to use when there are multiple GPUs available.

I attempted to set CUDA_VISIBLE_DEVICES=1 to see if it would use the next GPU, but it did not.

So maybe something like in a terminal the user could set:
$export CUDA_DEVICE=1

And in seggpt.py:
if os.getenv("CUDA_DEVICE"):
device = torch.device(f"cuda:{os.getenv('CUDA_DEVICE')}" if torch.cuda.is_available() else "cpu")
else:
device = torch.device("cuda" if torch.cuda.is_available() else "cpu")

@yeldarby
Copy link

That looks like a reasonable solution to me, I might simplify it to the one-liner (using the default param of os.getenv to fallback:

device = torch.device(os.getenv("CUDA_DEVICE_OVERRIDE", "cuda") if torch.cuda.is_available() else "cpu")

And then export CUDA_DEVICE_OVERRIDE=cuda:1 to set it.

Could you submit a PR after ensuring that it works?

Not sure why CUDA_VISIBLE_DEVICES isn't working; possibly related to this, depending on your torch version -- though it looks to have been fixed for quite a while.

@yeldarby yeldarby self-assigned this Oct 13, 2023
@natehaddon
Copy link
Author

Thank you. I will submit a PR after testing the new code. I didn't try setting the environment variable within the script and there is a possibility it has to do with the torch version. I figure if we have this option in the seggpt.py script it would make for easier implementation across different setups.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants