-
Notifications
You must be signed in to change notification settings - Fork 75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Install TDW on a remote Linux server problem: cannot communicate between 2 shell #510
Comments
You need to set Let me know if that works. |
Thanks for your advice, I did set it this way, but it doesn’t work, unfortunately. So what I did is firstly run the my_controller.py in a shell only with CPU and wait for the communication with the following second shell, the second shell is an interactive node with 4 GPUs, so I launched the" DISPLAY=: $DISPLAY number ./TDW.x86_64 -port=1071" there to build the rendering. The problem is that the first shell printed nothing and got stuck there after running my_controller.py, and in the second shell, the TDW window quickly launched and disappeared. BTW, if I run the TDW.x86_64 in the CPU node shell, and run my_controller.py in the GPU node shell, it is a similar situation. These two shells have random distinct DISPLAY numbers generated by HPC every time, should I manually set them to the same? And it seems that no communication is established between the controller and build in the first and second shells, may I ask if you have any idea? |
@LuZeking You need to set the display value in the shell that is running TDW.x86_64 Are the shells on two different machines? If so, you need to set the expected IP address: |
Hi @alters-mit, thanks for your suggestions, but unfortunately it still not works by following the above instructions (e.g. "export DIPLAY=:0.0"). To better illustrate the problem, as shown in the following figure, I use a VNC server in remote HPC: Here we have three shells, the top left is the shell for the my_controller.py, the top right is the shell for build simulation, and the bottom left one is just the content of my_controller.py example. They all run on the remote HPC. And when we run the controller in the top left first, then run the build in the top right, we can find the controller and the build is not successfully connected. |
@LuZeking Please send your xorg config file and the player log. |
Thanks for your reply. The context of /etc/X11/xorg-4gpus.conf and player log are shown as below xorg-4gpus.conf nvidia-xconfig: version 1.0 (buildmeister@builder58) Fri Apr 17 00:40:10 PDT 2009Section "DRI" Section "InputDevice"
EndSection Section "InputDevice" Section "Monitor" Section "Device" Option "UseDisplayDevice" "none"
EndSection Section "Device" Option "UseDisplayDevice" "none"
EndSection Section "Device" Option "UseDisplayDevice" "none"
EndSection Section "Device" Option "UseDisplayDevice" "none"
EndSection Section "Screen" Section "Screen" Section "Screen" Section "Screen" Section "ServerLayout" And player.log is shown as below: |
Sorry, the Player.log show above was on CPU, if it turns to GPU, it would like this: `Mono path[0] = '/home/hpczeji1/tdw_build/TDW/TDW_Data/Managed'
Setting up 64 worker threads for Enlighten. |
Sorry for the above messy report, thanks for your patience in reading until hear. Now I have a big Update, Interestingly, when I launch these shells with Xserver in WSL2 or VScode, it fails, but it works when I switch to Cygwin. But I am now met with another problem, the simulation is too slow, and it seems like it didn't recognize the GPU cards, according to "ALSA lib confmisc.c:855:(parse_card) cannot find card '0'". I used 4 A100 GPU to run a simple ur5 controller, which needs longer than 4.5 mins somehow, may I ask if you know how to solve it? PS. the Xorg-4gpus.conf is controlled by administrators, I am not sure if I could persuade them to change it. The Player log by running the ur5 example controller as shown below:
Setting up 64 worker threads for Enlighten. |
If you're using Windows 10, that's to be expected. WSL2 in Windows 10 doesn't support UI applications.
If you're using a Linux server, the most likely problem is that your DISPLAY variable isn't correct. The reason for the slowdown could be either obsolete GPU drivers, or someone is currently using the GPU, or both. Please send the output of
|
If you're running the build on a server, it shouldn't matter if you're using Cygwin. I took a closer look at your Player log and it looks like you're using Mesa drivers. That might not be optimal. If possible replace the drivers with the latest NVIDIA proprietary drivers (if you're using Ubuntu and have the required permissions, there's a ppa and it's pretty easy to do). |
Hello everyone, I recently came across this issue and I'm wondering if anything has happened in the last two years that would facilitate running TDW on a |
Hi,
Thanks for your information before, this time I test in a remote HPC with X11 Servers following This document explains how to run TDW on a virtual display, which still requires X.. Where I test the way "Install TDW on a remote Linux server".
But the command DISPLAY=:0.0 ./TDW.x86_64 -port=1071 just stuck there and doesn't have any response.
So I command echo $DISPLAY to check my display number is 113.0
so I do DISPLAY=:113.0 ./TDW.x86_64 -port=1071 again, and then the TDW program launched on my windows laptop.
But the problem is: it closed quickly and was not connected to another shell I run my_controller.py (the display number is 103.0 for this shell), which is just stuck there and prints nothing still.
Do you have any idea about this? Thanks:)
The text was updated successfully, but these errors were encountered: