forked from vllm-project/vllm
-
Notifications
You must be signed in to change notification settings - Fork 8
Closed
Labels
Milestone
Description
Your current environment
The output of vllm python collect_env.py
vllm commit link: c1378b8
The output of vllm-ascend python collect_env.py
vllm-ascend commit link: f78db0894660f3e64afb29b204aeb204806ffe08
The output of llm-service
commit link: 5c37e8dbc71bfefd0c0fc2e00cca219221000e21🐛 Describe the bug
Run the following command to reproduce the error:
1E1PD e_server_args = [ "--model", model, "--gpu-memory-utilization", "0.0", "--tensor-parallel-size", "1", "--enforce-eager", "--no-enable-prefix-caching", "--max-model-len", "20000", "--max-num-batched-tokens", "20000", "--max-num-seqs", "1", "--ec-transfer-config", '{"ec_connector_extra_config":{"shared_storage_path":"' + SHARED_STORAGE_PATH + '"},"ec_connector":"ECSharedStorageConnector","ec_role": "ec_producer"}' ] pd_server_args = [ "--model", model, "--gpu-memory-utilization", "0.9", "--tensor-parallel-size", "4", "--enforce-eager", "--max-model-len", "20000", "--max-num-batched-tokens", "20000", "--max-num-seqs", "128", "--ec-transfer-config", '{"ec_connector_extra_config":{"shared_storage_path":"' + SHARED_STORAGE_PATH + '"},"ec_connector":"ECSharedStorageConnector","ec_role": "ec_consumer"}' ]Error output:
RuntimeError: Gloo connnectFullMesh failed with [/pytorch/third_party/gloo/gloo/transport/tcp/pair.cc:144] no errorBefore submitting a new issue...
- Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.