vLLM Processes Launched by Detached Ray Actors Are Not Reclaimed After Shutdown

### Problem Description

When using **vLLM as the local inference backend** in GraphGen, the Ray Actor  created with `lifetime="detached"` to ensure stable execution.

However, after the task finishes, vLLM-related processes **continue to occupy GPU memory**, and **cannot be reclaimed by Ray**, even after explicitly shutting down Ray.
The **only reliable way** to release GPU resources is to manually kill the process using `kill -9`.

---

### Ray Actor Lifecycle Configuration 

In `LLMFactory.create_llm`, the Ray actor is explicitly created as a detached actor (`init_llm.py:135–143`):

```python
actor = (
    ray.remote(LLMServiceActor)
    .options(
        name=actor_name,
        num_gpus=num_gpus,
        lifetime="detached",  # critical configuration
        get_if_exists=True,
    )
    .remote(backend, config)
)
```

---

### Observed Behavior

* After the task completes, calling:

  * `ray.shutdown()` **or**
  * `ray stop`
* vLLM GPU worker processes **continue running**
* From `nvidia-smi` and `ps -ef`:

  * vLLM subprocesses do not exit
  * Their parent process becomes PID 1 (`systemd`), i.e. orphaned

This indicates that **vLLM never receives a shutdown signal**, and Ray does not reclaim subprocesses spawned by detached actors.

---

### Reproduction Steps 

1. Run a GraphGen task using `vllm` as the backend
2. Run:

   ```bash
   nvidia-smi
   ```

**Expected Result**
✅ vLLM processes exit and GPU memory is released

**Actual Result**
❌ vLLM processes remain alive and GPU memory is not released

<img width="1503" height="633" alt="Image" src="https://github.com/user-attachments/assets/cf335487-cd47-489f-8234-df51c311b0b8" />

---

### Attempted Workarounds (All Ineffective)

* ❌ Removing `lifetime="detached"`
  → Causes runtime crashes; the system becomes unusable

* ❌ `ray.shutdown()` / `ray stop`
  → No effect on vLLM processes

* ✅ **Only `kill -9 <pid>` reliably releases GPU memory**


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

vLLM Processes Launched by Detached Ray Actors Are Not Reclaimed After Shutdown #130

Problem Description

Ray Actor Lifecycle Configuration

Observed Behavior

Reproduction Steps

Attempted Workarounds (All Ineffective)

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

vLLM Processes Launched by Detached Ray Actors Are Not Reclaimed After Shutdown #130

Description

Problem Description

Ray Actor Lifecycle Configuration

Observed Behavior

Reproduction Steps

Attempted Workarounds (All Ineffective)

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions