Skip to content

[Bug]: 数据系统长稳测试场景,1E1P1D(proxye-p-d跨机)和1E1P1D单机,Qwen2.5-VL-7B-Instruct,0.84QPS,ipv4,数据集模拟字节,开启前缀缓存,random调度策略,长稳运行20min后,encode报错:Failed to put bytes_list for keys xxxxxxx with error;40min后proxy报错:Request xxxxx timed out after 300s without worker response. #191

@zhumingjue138

Description

@zhumingjue138

Your current environment

Details vllm v0.11.0rc4-EPD vllm-ascend v0.11.0rc4-EPD-post1 lm-service v0.11.0rc4-EPD

🐛 Describe the bug

encode报错
[PROXY] : INFO: 127.0.0.1:41716 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 13d1add2937853125baffb2cf2ff613d20c50ad11e62c1641739867f7ebde493 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 296129f9d443f44a4f86569f20a91c05b213af9bdf5ce26bc403daa4592fa9f4 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys ea40d5862cc66cbb929efbb70d9e068babe54bc4cffb9b4ea165b246a141014b with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 7f6076b75c5797a22bf554733b2ff119a535936491f1ee11a080e3250f7b2c16 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys dcf5d669ca345e102249690568b65c5cecb442bb3f023e120f30b91337e48c1d with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 6d83a1aeff02e31ce51c687419994a34f5c08c100421fee0c210d1294e7cd493 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 03eb5b1a1db1d746378d0aec20de64f79f8aee746f8e023ebc4f64c3a005f0c6 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 276bb8fab730807a3a63526cdc26c7785f9fd23e4029c71cb7040ae9ac80efd2 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 54d456cdc06d138acb612be9151ea3005d619cb28a48f9baa9e17f1b89112b90 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 6f44f28279b845c21b1a5a3b2c2a7e33ba84fe6b47057c3de290d8a67caed76c with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys e2ce010fd47b664ccaec721c740f29c1d3c514896806a89d7ff4e596653152db with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys df2d0a986c33477b4c6912db52ce7278a4969f687f8bb354b3728b192d5b7412 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys a941256479369a674ef4f86c98eb1eff26b51fcfff009cd26e86e067bcd22abd with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys f267f1fa96e8e7e04f5c6ca81ce4f73c19310bf39aaeb04c9222dad64534761c with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys cf4747ded35a67b9fbd8ef8592999c0d764b6000c3f001132c998671008c1d75 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 974727d1de33a56a6a6074b7dbaaa71c252789b7a997e62ecd0edff6b80385a3 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 71b565ec08022d52d70107cbd87ea85bf72bbd69b5ed5d416fa8a36441769be5 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 01348ef53bf62900dd09fc3c6a4b0a9f7f4c04b57506362ce98e1c3adb6afb3d with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 46e5f81f02cf4c3eaa7a92ac292d5f36a5607975fda0801a71c1cadf0e680b62 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 143940909f40ef956972490658b189a7cf7102aef5d64fe17f6fcbf64034bf97 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 4350516873773f0e4d99c64bed3d34b0615cdde88f06bb2f023f5884e164fe17 with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 63b6879f84c32a24aa27fe2895e5e80b4a3df2e686b31266ed37532afae3f70d with error
[ENCODE_0] : �[1;36m(EngineCore_DP0 pid=2691220)�[0;0m ERROR 12-22 10:07:48 [datasystem_store.py:436] Failed to put bytes_list for keys 516a95721ee7c211c1244921fc384b7ebd2b8a429997659646ee5202ef47cbb9 with error
[PROXY] : INFO: 127.0.0.1:41718 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[PROXY] : INFO: 127.0.0.1:41720 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[PROXY] : INFO: 127.0.0.1:41722 - "POST /v1/chat/completions HTTP/1.1" 200 OK

proxy报错
[PROXY] : INFO: 127.0.0.1:44586 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[D_0] : INFO 12-22 10:21:57 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 334.7 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 11.0%
[ENCODE_0] : INFO 12-22 10:22:03 [loggers.py:127] Engine 000: Avg prompt throughput: 2096.4 tokens/s, Avg generation throughput: 5.4 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[D_0] : INFO 12-22 10:22:07 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 11.0%
[ENCODE_0] : INFO 12-22 10:22:13 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
[PROXY] : ERROR 12-22 10:26:39 [proxy.py:439] Runtime error during generate: Request 2061934b-05a4-4aeb-bd7a-a9776d511777 timed out after 300s without worker response.
[PROXY] : INFO: 127.0.0.1:46112 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[PROXY] : ERROR 12-22 10:26:42 [proxy.py:439] Runtime error during generate: Request 7e7628f4-0104-4b9b-b99f-b0fc5a6353f5 timed out after 300s without worker response.
[PROXY] : ERROR 12-22 10:26:42 [proxy.py:439] Runtime error during generate: Request 532b738d-aacd-4bd0-92bb-c8293ac56aad timed out after 300s without worker response.
[PROXY] : INFO: 127.0.0.1:45974 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[PROXY] : INFO: 127.0.0.1:45976 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[PROXY] : ERROR 12-22 10:26:42 [proxy.py:439] Runtime error during generate: Request e7fa97e7-0f4b-41da-b9d5-7d560818bc7a timed out after 300s without worker response.
[PROXY] : ERROR 12-22 10:26:42 [proxy.py:439] Runtime error during generate: Request 89a9b689-1b33-4e51-8304-feff3b60b595 timed out after 300s without worker response.
[PROXY] : INFO: 127.0.0.1:45978 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[PROXY] : INFO: 127.0.0.1:45980 - "POST /v1/chat/completions HTTP/1.1" 200 OK
[PROXY] : ERROR 12-22 10:26:42 [proxy.py:439] Runtime error during generate: Request 8a3eba1f-4577-4a7e-9682-627295ab81fa timed out after 300s without worker response.
[PROXY] : ERROR 12-22 10:26:42 [proxy.py:439] Runtime error during generate: Request 36e4b93b-d236-4b1f-b187-376f1671eef0 timed out after 300s without worker response.

Before submitting a new issue...

  • Make sure you already searched for relevant issues, and asked the chatbot living at the bottom right corner of the documentation page, which can answer lots of frequently asked questions.

Metadata

Metadata

Assignees

Type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions