Skip to content

vLLM Processes Launched by Detached Ray Actors Are Not Reclaimed After Shutdown #130

@mdjhacker

Description

@mdjhacker

Problem Description

When using vLLM as the local inference backend in GraphGen, the Ray Actor created with lifetime="detached" to ensure stable execution.

However, after the task finishes, vLLM-related processes continue to occupy GPU memory, and cannot be reclaimed by Ray, even after explicitly shutting down Ray.
The only reliable way to release GPU resources is to manually kill the process using kill -9.


Ray Actor Lifecycle Configuration

In LLMFactory.create_llm, the Ray actor is explicitly created as a detached actor (init_llm.py:135–143):

actor = (
    ray.remote(LLMServiceActor)
    .options(
        name=actor_name,
        num_gpus=num_gpus,
        lifetime="detached",  # critical configuration
        get_if_exists=True,
    )
    .remote(backend, config)
)

Observed Behavior

  • After the task completes, calling:

    • ray.shutdown() or
    • ray stop
  • vLLM GPU worker processes continue running

  • From nvidia-smi and ps -ef:

    • vLLM subprocesses do not exit
    • Their parent process becomes PID 1 (systemd), i.e. orphaned

This indicates that vLLM never receives a shutdown signal, and Ray does not reclaim subprocesses spawned by detached actors.


Reproduction Steps

  1. Run a GraphGen task using vllm as the backend

  2. Run:

    nvidia-smi

Expected Result
✅ vLLM processes exit and GPU memory is released

Actual Result
❌ vLLM processes remain alive and GPU memory is not released

Image

Attempted Workarounds (All Ineffective)

  • ❌ Removing lifetime="detached"
    → Causes runtime crashes; the system becomes unusable

  • ray.shutdown() / ray stop
    → No effect on vLLM processes

  • Only kill -9 <pid> reliably releases GPU memory

Metadata

Metadata

Labels

No labels
No labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions