-
Notifications
You must be signed in to change notification settings - Fork 1
Description
操作系统及版本
openEuler 24.03 (LTS)
安装工具的python环境
docker容器中的python环境
python版本
3.11
AISBench工具版本
Version: 3.0.0
AISBench执行命令
ais_bench --models vllm_api_general_stream --datasets synthetic_gen -m perf --debug
模型配置文件或自定义配置文件内容
/usr/local/lib/python3.11/site-packages/ais_bench/benchmark/configs/models/vllm_api/vllm_api_stream_chat.py
from ais_bench.benchmark.utils.model_postprocessors import extract_non_reasoning_content
models = [
dict(
attr="service",
type=VLLMCustomAPIChatStream,
abbr='vllm-api-stream-chat',
path="",
model="",
request_rate = 0,
retry = 2,
host_ip = "localhost",
host_port = 8080,
max_out_len = 512,
batch_size=1,
trust_remote_code=False,
generation_kwargs = dict(
temperature = 0.5,
top_k = 10,
top_p = 0.95,
seed = None,
repetition_penalty = 1.03,
),
pred_postprocessor=dict(type=extract_non_reasoning_content)
)
]
/usr/local/lib/python3.11/site-packages/ais_bench/datasets/synthetic/synthetic_config.py
synthetic_config = {
"Type":"string",
"RequestCount": 80,
"TrustRemoteCode": False,
"StringConfig" : {
"Input" : {
"Method": "uniform",
"Params": {"MinValue": 2048, f"MaxValue": 2048}
},
"Output" : {
"Method": "uniform",
"Params": {"MinValue": 2048, "MaxValue": 2048}
}
},
"TokenIdConfig" : {
"RequestSize": 2048
}
}
/usr/local/Ascend/mindie/latest/mindie-service/conf/config.json
{
"Version": "1.0.0",
"ServerConfig": {
"ipAddress": "127.0.0.1",
"managementIpAddress": "127.0.0.2",
"port": 1025,
"managementPort": 1026,
"metricsPort": 1027,
"allowAllZeroIpListening": false,
"maxLinkNum": 1000,
"httpsEnabled": false,
"fullTextEnabled": false,
"tlsCaPath": "security/ca/",
"tlsCaFile": [
"ca.pem"
],
"tlsCert": "security/certs/server.pem",
"tlsPk": "security/keys/server.key.pem",
"tlsPkPwd": "security/pass/key_pwd.txt",
"tlsCrlPath": "security/certs/",
"tlsCrlFiles": [
"server_crl.pem"
],
"managementTlsCaFile": [
"management_ca.pem"
],
"managementTlsCert": "security/certs/management/server.pem",
"managementTlsPk": "security/keys/management/server.key.pem",
"managementTlsPkPwd": "security/pass/management/key_pwd.txt",
"managementTlsCrlPath": "security/management/certs/",
"managementTlsCrlFiles": [
"server_crl.pem"
],
"kmcKsfMaster": "tools/pmt/master/ksfa",
"kmcKsfStandby": "tools/pmt/standby/ksfb",
"inferMode": "standard",
"interCommTLSEnabled": true,
"interCommPort": 1121,
"interCommTlsCaPath": "security/grpc/ca/",
"interCommTlsCaFiles": [
"ca.pem"
],
"interCommTlsCert": "security/grpc/certs/server.pem",
"interCommPk": "security/grpc/keys/server.key.pem",
"interCommPkPwd": "security/grpc/pass/key_pwd.txt",
"interCommTlsCrlPath": "security/grpc/certs/",
"interCommTlsCrlFiles": [
"server_crl.pem"
],
"openAiSupport": "vllm",
"tokenTimeout": 3600,
"e2eTimeout": 65535,
"distDPServerEnabled": false
},
"BackendConfig": {
"backendName": "mindieservice_llm_engine",
"modelInstanceNumber": 1,
"npuDeviceIds": [
[
0,
1
]
],
"tokenizerProcessNumber": 8,
"multiNodesInferEnabled": false,
"multiNodesInferPort": 1120,
"interNodeTLSEnabled": true,
"interNodeTlsCaPath": "security/grpc/ca/",
"interNodeTlsCaFiles": [
"ca.pem"
],
"interNodeTlsCert": "security/grpc/certs/server.pem",
"interNodeTlsPk": "security/grpc/keys/server.key.pem",
"interNodeTlsPkPwd": "security/grpc/pass/mindie_server_key_pwd.txt",
"interNodeTlsCrlPath": "security/grpc/certs/",
"interNodeTlsCrlFiles": [
"server_crl.pem"
],
"interNodeKmcKsfMaster": "tools/pmt/master/ksfa",
"interNodeKmcKsfStandby": "tools/pmt/standby/ksfb",
"kvPoolConfig": {
"backend": "",
"configPath": ""
},
"ModelDeployConfig": {
"maxSeqLen": 6144,
"maxInputTokenLen": 4096,
"truncation": false,
"ModelConfig": [
{
"modelInstanceType": "Standard",
"modelName": "dony_w8a8_test",
"modelWeightPath": "/mnt/DeepSeek-R1-Distill-Llama-70B-w8a8",
"worldSize": 2,
"cpuMemSize": 5,
"npuMemSize": 10,
"backendType": "atb",
"trustRemoteCode": false,
"async_scheduler_wait_time": 120,
"kv_trans_timeout": 10,
"kv_link_timeout": 1080
}
]
},
"ScheduleConfig": {
"templateType": "Standard",
"templateName": "Standard_LLM",
"cacheBlockSize": 128,
"maxPrefillBatchSize": 50,
"maxPrefillTokens": 6144,
"prefillTimeMsPerReq": 150,
"prefillPolicyType": 0,
"decodeTimeMsPerReq": 50,
"decodePolicyType": 0,
"maxBatchSize": 200,
"maxIterTimes": 4096,
"maxPreemptCount": 0,
"supportSelectBatch": true,
"maxQueueDelayMicroseconds": 5000,
"maxFirstTokenWaitTime": 2500
}
},
"LogConfig": {
"dynamicLogLevel": "",
"dynamicLogLevelValidHours": 2,
"dynamicLogLevelValidTime": ""
}
}
docker容器版本
REPOSITORY TAG IMAGE ID CREATED SIZE
swr.cn-south-1.myhuaweicloud.com/ascendhub/mindie 2.2.RC1-800I-A2-py311-openeuler24.03-lts 3006ee810455 4 weeks ago 18.6GB
预期行为
预期行为 insize 2048 outsize 2048 batch 20 可以正常输出数据且没有报错。
实际行为
12/23 17:22:53 - AISBench - INFO - Loading synthetic_gen: /usr/local/lib/python3.11/site-packages/ais_bench/benchmark/configs/./datasets/synthetic/synthetic_gen.py
12/23 17:22:53 - AISBench - INFO - Loading vllm_api_general_stream: /usr/local/lib/python3.11/site-packages/ais_bench/benchmark/configs/./models/vllm_api/vllm_api_general_stream.py
12/23 17:22:53 - AISBench - INFO - Loading example: /usr/local/lib/python3.11/site-packages/ais_bench/benchmark/configs/./summarizers/example.py
12/23 17:22:53 - AISBench - INFO - Current exp folder: outputs/default/20251223_172253
12/23 17:22:53 - AISBench - INFO - Starting performance evaluation tasks...
12/23 17:22:53 - AISBench - INFO - Partitioned into 1 tasks.
12/23 17:23:00 - AISBench - INFO - Task [vllm-api-general-stream/synthetic]
12/23 17:23:03 - AISBench - INFO - Start load data of [vllm-api-general-stream/synthetic]
12/23 17:23:03 - AISBench - WARNING - Parameter 'burstiness' is None. Using default: 0.0
12/23 17:23:03 - AISBench - WARNING - Parameter 'ramp_up_strategy' is None. Using default: None
12/23 17:23:03 - AISBench - WARNING - Parameter 'ramp_up_start_rps' is None. Using default: None
12/23 17:23:03 - AISBench - WARNING - Parameter 'ramp_up_end_rps' is None. Using default: None
12/23 17:23:04 - AISBench - INFO - RPS distribution charts saved to outputs/default/20251223_172253/performances/vllm-api-general-stream/syntheticdataset_rps_distribution_plot.html
12/23 17:23:04 - AISBench - INFO - RPS distribution chart JSON data saved to outputs/default/20251223_172253/performances/vllm-api-general-stream/syntheticdataset_rps_distribution_plot.json
12/23 17:23:04 - AISBench - INFO -
Request Per Second (RPS) Distribution Summary
Metric Value
Total Requests 80
Request Classification Normal: 80 | Timing Anomaly: 0 | Burstiness Anomaly: 0 | Infinite RPS Anomaly: 0
Target Rate 1000.00 RPS
Burstiness 0.000
Normal RPS 1000.00 ± 0.00
Normal RPS Range 1000.00-1000.00
Interval Stats Avg: 0.001s | Min: 0.001s | Max: 0.001s
Interval Classification Normal (Normal + Burstiness Anomaly): 80 | Anomaly (Timing Anomaly + Infinite RPS Anomaly): 0
12/23 17:23:04 - AISBench - INFO - Process 0 using precomputed sleep offsets with 80 requests
12/23 18:24:46 - AISBench - ERROR - /usr/local/lib/python3.11/site-packages/ais_bench/benchmark/clients/base_client.py - raise_error - 35 - [AisBenchClientException] Error processing stream response: [StreamResponseError] Expecting value: line 1 column 1 (char 0)! Raw server response: b'Engine callback timeout: server tokenTimeout'