Skip to content

Conversation

@Shirley125
Copy link
Collaborator

@Shirley125 Shirley125 commented Dec 26, 2025

Purpose

adapt to ascend direct transport (-DUSE_ASCEND_DIRECT=ON when compile mooncake)

Test Plan

D2D
D2H

Test Result

D2D connector
curl:

[root@devserver-bms-165 cwj]# curl http://10.170.27.165:20009/v1/chat/completions -H "Content-Type: application/json" -d '{        "model": "/data/models/Qwen2.5-VL-7B-Instruct",        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": [
                {"type": "image_url", "image_url": {"url":"file:///workspace/w00613184/testvqa_val/train_images/40c6b4dd3caa006f.jpg"}},
                {"type": "text", "text": "how man price tags are on the bottom shelf?"}
            ]}
        ]
    }'
{"id":"chatcmpl-f28e10cd-a638-496a-a3cd-46cc8ef0a49c","object":"chat.completion","created":1766833895,"model":"/data/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"There are six price tags on the bottom shelf of the image.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":1400,"total_tokens":1414,"completion_tokens":14,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null,"metrics":null}

log: transfer success

(APIServer pid=194185) WARNING 12-27 19:11:28 [sampling_params.py:320] temperature 1e-06 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01.
I20251227 19:11:33.971918 281455917789600 ascend_direct_transport.cpp:825] Connected to segment: 10.170.27.165:20748
I20251227 19:11:33.972998 281455917789600 ascend_direct_transport.cpp:605] Transfer to:10.170.27.165:20748, cost: 968 us
I20251227 19:11:34.031614 281455917789600 ascend_direct_transport.cpp:605] Transfer to:10.170.27.165:20748, cost: 1957 us
[rank0]:[W1227 19:11:34.254606751 compiler_depend.ts:117] Warning: Driver Version: 24.1.rc1.b020 is invalid or not supported yet. (function operator())
(EngineCore_DP0 pid=194631) INFO 12-27 19:11:34 [mooncake_connector.py:834] Delaying free of 11 blocks for request chatcmpl-f28e10cd-a638-496a-a3cd-46cc8ef0a49c
(APIServer pid=194185) INFO:     127.0.0.1:45882 - "POST /v1/chat/completions HTTP/1.1" 200 OK
(APIServer pid=194185) INFO 12-27 19:11:42 [loggers.py:127] Engine 000: Avg prompt throughput: 140.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.0%, Prefix cache hit rate: 0.0%
(APIServer pid=194185) INFO 12-27 19:11:52 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 1.0%, Prefix cache hit rate: 0.0%
(APIServer pid=194185) WARNING 12-27 19:12:03 [sampling_params.py:320] temperature 1e-06 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01.
I20251227 19:12:03.732793 281455917789600 ascend_direct_transport.cpp:605] Transfer to:10.170.27.165:20748, cost: 1181 us
I20251227 19:12:03.759117 281455917789600 ascend_direct_transport.cpp:605] Transfer to:10.170.27.165:20748, cost: 1880 us
(EngineCore_DP0 pid=194631) INFO 12-27 19:12:03 [mooncake_connector.py:834] Delaying free of 11 blocks for request chatcmpl-39508f53-95d5-4f37-b916-248ab8f969eb
(APIServer pid=194185) INFO:     127.0.0.1:40518 - "POST /v1/chat/completions HTTP/1.1" 200 OK


Essential Elements of an Effective PR Description Checklist
  • The purpose of the PR, such as "Fix some issue (link existing issues this PR will resolve)".
  • The test plan, such as providing test command.
  • The test results, such as pasting the results comparison before and after, or e2e results
  • (Optional) The necessary documentation update, such as updating supported_models.md and examples for a new model.
  • (Optional) Release notes update. If your change is user facing, please update the release notes draft in the Google Doc.

D2H

curl http://10.170.27.165:20009/v1/chat/completions -H "Content-Type: application/json" -d '{        "model": "/data/models/Qwen2.5-VL-7B-Instruct",        "messages": [
            {"role": "system", "content": "You are a helpful assistant."},
            {"role": "user", "content": [
                {"type": "image_url", "image_url": {"url":"file:///workspace/w00613184/testvqa_val/train_images/40c6b4dd3caa006f.jpg"}},
                {"type": "text", "text": "how man price tags are on the bottom shelf?"}
            ]}
        ]
    }'
{"id":"chatcmpl-b1ee8ca8-cee0-49a7-b466-7fddbebb0b3d","object":"chat.completion","created":1766834770,"model":"/data/models/Qwen2.5-VL-7B-Instruct","choices":[{"index":0,"message":{"role":"assistant","content":"There are six price tags on the bottom shelf of the image.","refusal":null,"annotations":null,"audio":null,"function_call":null,"tool_calls":[],"reasoning_content":null},"logprobs":null,"finish_reason":"stop","stop_reason":null,"token_ids":null}],"service_tier":null,"system_fingerprint":null,"usage":{"prompt_tokens":1400,"total_tokens":1414,"completion_tokens":14,"prompt_tokens_details":null},"prompt_logprobs":null,"prompt_token_ids":null,"kv_transfer_params":null,"metrics":null}

log: transfer success

I20251227 19:25:22.415610 281457243058592 ascend_direct_transport.cpp:605] Transfer to:10.170.27.165:25522, cost: 878 us
I20251227 19:25:22.417544 281457243058592 ascend_direct_transport.cpp:605] Transfer to:10.170.27.165:25522, cost: 1906 us
(APIServer pid=201147) INFO 12-27 19:25:22 [loggers.py:127] Engine 000: Avg prompt throughput: 139.0 tokens/s, Avg generation throughput: 0.1 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
(APIServer pid=201147) INFO 12-27 19:25:32 [loggers.py:127] Engine 000: Avg prompt throughput: 0.0 tokens/s, Avg generation throughput: 0.0 tokens/s, Running: 0 reqs, Waiting: 0 reqs, GPU KV cache usage: 0.0%, Prefix cache hit rate: 0.0%
tail: inotify cannot be used, reverting to polling: Too many open files
(APIServer pid=201147) WARNING 12-27 19:26:09 [sampling_params.py:320] temperature 1e-06 is less than 0.01, which may cause numerical errors nan or inf in tensors. We have maxed it out to 0.01.
(APIServer pid=201147) INFO:     127.0.0.1:46304 - "POST /v1/chat/completions HTTP/1.1" 200 OK
I20251227 19:26:09.710890 281457243058592 ascend_direct_transport.cpp:605] Transfer to:10.170.27.165:25522, cost: 904 us

Signed-off-by: CHEN <[email protected]>

adapt ascend direct transport

Signed-off-by: CHEN <[email protected]>

adapt ascend direct transport

Signed-off-by: CHEN <[email protected]>

adapt ascend direct transport

Signed-off-by: CHEN <[email protected]>

adapt ascend direct transport

Signed-off-by: CHEN <[email protected]>
@amy-why-3459 amy-why-3459 merged commit ce0871d into JiusiServe:v0.11.0 Dec 29, 2025
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants