Summary
Running tutorial_notebooks/rag-contexteng/rf-tutorial-rag-fiqa.ipynb locally (Linux, non-Colab) fails at experiment.run_evals(...) with:
RayTaskError(RuntimeError)
Failed to initialize pipeline: RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}
The top-level error is not actionable and hides the real root cause.
Repro
- Open and run
tutorial_notebooks/rag-contexteng/rf-tutorial-rag-fiqa.ipynb locally.
- Execute:
results = experiment.run_evals(
config_group=config_group,
dataset=fiqa_dataset,
num_actors=1,
num_shards=4,
seed=42,
)
- Observe actor init failure with generic engine-core message.
Actual Root Cause (from Ray worker logs)
In Ray worker stderr, the underlying failure is:
fatal error: Python.h: No such file or directory
and then Triton/vLLM fails while compiling runtime CUDA helper code:
CalledProcessError ... /usr/bin/gcc ... -I/usr/include/python3.12 ... returned non-zero exit status 1
Machine was missing Python dev headers package (python3.12-dev / python3-dev).
Why this happens
vllm + torch._inductor + Triton can compile native/CUDA helper modules at runtime. On Linux, this requires Python C headers (Python.h) and compiler toolchain. If headers are missing, model engine init fails.
Confirmed Fix
Installing Python dev headers resolved the issue:
sudo apt-get update
sudo apt-get install -y python3.12-dev
(Equivalent distro package names apply, e.g. python3-dev.)
Requested Improvements
- Preflight dependency checks before actor/model init in evals mode:
- verify
Python.h exists (sysconfig.get_paths()["include"]/Python.h)
- verify compiler exists (
gcc/cc)
- fail fast with actionable install instructions.
- Improve surfaced error message in
QueryProcessingActor.initialize_for_pipeline path:
- include root cause snippet from worker stderr (not only generic “Engine core initialization failed”).
- Docs update for local/tutorial setup (non-Colab):
- Linux prerequisites should explicitly include Python dev headers and build essentials.
- Add CI smoke test for missing headers scenario:
- assert user-facing error is clear and actionable.
Environment
- OS: Linux
- Python: 3.12
- vLLM: 0.10.2
- GPU: NVIDIA L4
User Impact
This is a first-run blocker for local users and can be mistaken for GPU/vLLM incompatibility. Better guardrails and messaging would significantly improve onboarding and support load.
Summary
Running
tutorial_notebooks/rag-contexteng/rf-tutorial-rag-fiqa.ipynblocally (Linux, non-Colab) fails atexperiment.run_evals(...)with:RayTaskError(RuntimeError)Failed to initialize pipeline: RuntimeError: Engine core initialization failed. See root cause above. Failed core proc(s): {}The top-level error is not actionable and hides the real root cause.
Repro
tutorial_notebooks/rag-contexteng/rf-tutorial-rag-fiqa.ipynblocally.Actual Root Cause (from Ray worker logs)
In Ray worker stderr, the underlying failure is:
fatal error: Python.h: No such file or directoryand then Triton/vLLM fails while compiling runtime CUDA helper code:
CalledProcessError ... /usr/bin/gcc ... -I/usr/include/python3.12 ... returned non-zero exit status 1Machine was missing Python dev headers package (
python3.12-dev/python3-dev).Why this happens
vllm+torch._inductor+ Triton can compile native/CUDA helper modules at runtime. On Linux, this requires Python C headers (Python.h) and compiler toolchain. If headers are missing, model engine init fails.Confirmed Fix
Installing Python dev headers resolved the issue:
(Equivalent distro package names apply, e.g.
python3-dev.)Requested Improvements
Python.hexists (sysconfig.get_paths()["include"]/Python.h)gcc/cc)QueryProcessingActor.initialize_for_pipelinepath:Environment
User Impact
This is a first-run blocker for local users and can be mistaken for GPU/vLLM incompatibility. Better guardrails and messaging would significantly improve onboarding and support load.