-
Notifications
You must be signed in to change notification settings - Fork 63
Description
Problem Description
When using vLLM as the local inference backend in GraphGen, the Ray Actor created with lifetime="detached" to ensure stable execution.
However, after the task finishes, vLLM-related processes continue to occupy GPU memory, and cannot be reclaimed by Ray, even after explicitly shutting down Ray.
The only reliable way to release GPU resources is to manually kill the process using kill -9.
Ray Actor Lifecycle Configuration
In LLMFactory.create_llm, the Ray actor is explicitly created as a detached actor (init_llm.py:135–143):
actor = (
ray.remote(LLMServiceActor)
.options(
name=actor_name,
num_gpus=num_gpus,
lifetime="detached", # critical configuration
get_if_exists=True,
)
.remote(backend, config)
)Observed Behavior
-
After the task completes, calling:
ray.shutdown()orray stop
-
vLLM GPU worker processes continue running
-
From
nvidia-smiandps -ef:- vLLM subprocesses do not exit
- Their parent process becomes PID 1 (
systemd), i.e. orphaned
This indicates that vLLM never receives a shutdown signal, and Ray does not reclaim subprocesses spawned by detached actors.
Reproduction Steps
-
Run a GraphGen task using
vllmas the backend -
Run:
nvidia-smi
Expected Result
✅ vLLM processes exit and GPU memory is released
Actual Result
❌ vLLM processes remain alive and GPU memory is not released
Attempted Workarounds (All Ineffective)
-
❌ Removing
lifetime="detached"
→ Causes runtime crashes; the system becomes unusable -
❌
ray.shutdown()/ray stop
→ No effect on vLLM processes -
✅ Only
kill -9 <pid>reliably releases GPU memory