Skip to content

Commit 4e4134d

Browse files
authored
Merge branch 'develop' into dev_2
2 parents dac4e26 + 951800d commit 4e4134d

File tree

15 files changed

+273
-61
lines changed

15 files changed

+273
-61
lines changed

README.md

100644100755
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -176,8 +176,8 @@ python3 -m paddle_serving_server.serve --model uci_housing_model --thread 10 --p
176176
| Argument | Type | Default | Description |
177177
| ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- |
178178
| `thread` | int | `2` | Number of brpc service thread |
179-
| `op_num` | int[]| `0` | Thread Number for each model in asynchronous mode |
180-
| `op_max_batch` | int[]| `0` | Batch Number for each model in asynchronous mode |
179+
| `runtime_thread_num` | int[]| `0` | Thread Number for each model in asynchronous mode |
180+
| `batch_infer_size` | int[]| `0` | Batch Number for each model in asynchronous mode |
181181
| `gpu_ids` | str[]| `"-1"` | Gpu card id for each model |
182182
| `port` | int | `9292` | Exposed port of current service to users |
183183
| `model` | str[]| `""` | Path of paddle model directory to be served |
@@ -197,16 +197,16 @@ python3 -m paddle_serving_server.serve --model uci_housing_model --thread 10 --p
197197
In asynchronous mode, each model will start n threads of the number you specify, and each thread contains a model instance. In other words, each model is equivalent to a thread pool containing N threads, and the task is taken from the task queue of the thread pool to execute.
198198
In asynchronous mode, each RPC server thread is only responsible for putting the request into the task queue of the model thread pool. After the task is executed, the completed task is removed from the task queue.
199199
In the above table, the number of RPC server threads is specified by --thread, and the default value is 2.
200-
--op_num specifies the number of threads in the thread pool of each model. The default value is 0, indicating that asynchronous mode is not used.
201-
--op_max_batch specifies the number of batches for each model. The default value is 32. It takes effect when --op_num is not 0.
200+
--runtime_thread_num specifies the number of threads in the thread pool of each model. The default value is 0, indicating that asynchronous mode is not used.
201+
--batch_infer_size specifies the number of batches for each model. The default value is 32. It takes effect when --runtime_thread_num is not 0.
202202
#### When you want a model to use multiple GPU cards.
203203
python3 -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --gpu_ids 0,1,2
204204
#### When you want 2 models.
205205
python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292
206206
#### When you want 2 models, and want each of them use multiple GPU cards.
207207
python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292 --gpu_ids 0,1 1,2
208208
#### When a service contains two models, and each model needs to specify multiple GPU cards, and needs asynchronous mode, each model specifies different concurrency number.
209-
python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292 --gpu_ids 0,1 1,2 --op_num 4 8
209+
python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292 --gpu_ids 0,1 1,2 --runtime_thread_num 4 8
210210
</center>
211211

212212
```python

README_CN.md

100644100755
Lines changed: 5 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -175,8 +175,8 @@ python3 -m paddle_serving_server.serve --model uci_housing_model --thread 10 --p
175175
| Argument | Type | Default | Description |
176176
| ---------------------------------------------- | ---- | ------- | ----------------------------------------------------- |
177177
| `thread` | int | `2` | Number of brpc service thread |
178-
| `op_num` | int[]| `0` | Thread Number for each model in asynchronous mode |
179-
| `op_max_batch` | int[]| `32` | Batch Number for each model in asynchronous mode |
178+
| `runtime_thread_num` | int[]| `0` | Thread Number for each model in asynchronous mode |
179+
| `batch_infer_size` | int[]| `32` | Batch Number for each model in asynchronous mode |
180180
| `gpu_ids` | str[]| `"-1"` | Gpu card id for each model |
181181
| `port` | int | `9292` | Exposed port of current service to users |
182182
| `model` | str[]| `""` | Path of paddle model directory to be served |
@@ -195,8 +195,8 @@ python3 -m paddle_serving_server.serve --model uci_housing_model --thread 10 --p
195195
异步模式有助于提高Service服务的吞吐(QPS),但对于单次请求而言,时延会有少量增加。
196196
异步模式中,每个模型会启动您指定个数的N个线程,每个线程中包含一个模型实例,换句话说每个模型相当于包含N个线程的线程池,从线程池的任务队列中取任务来执行。
197197
异步模式中,各个RPC Server的线程只负责将Request请求放入模型线程池的任务队列中,等任务被执行完毕后,再从任务队列中取出已完成的任务。
198-
上表中通过 --thread 10 指定的是RPC Server的线程数量,默认值为2,--op_num 指定的是各个模型的线程池中线程数N,默认值为0,表示不使用异步模式。
199-
--op_max_batch 指定的各个模型的batch数量,默认值为32,该参数只有当--op_num不为0时才生效
198+
上表中通过 --thread 10 指定的是RPC Server的线程数量,默认值为2,--runtime_thread_num 指定的是各个模型的线程池中线程数N,默认值为0,表示不使用异步模式。
199+
--batch_infer_size 指定的各个模型的batch数量,默认值为32,该参数只有当--runtime_thread_num不为0时才生效
200200

201201
#### 当您的某个模型想使用多张GPU卡部署时.
202202
python3 -m paddle_serving_server.serve --model uci_housing_model --thread 10 --port 9292 --gpu_ids 0,1,2
@@ -205,7 +205,7 @@ python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_m
205205
#### 当您的一个服务包含两个模型,且每个模型都需要指定多张GPU卡部署时.
206206
python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292 --gpu_ids 0,1 1,2
207207
#### 当您的一个服务包含两个模型,且每个模型都需要指定多张GPU卡,且需要异步模式每个模型指定不同的并发数时.
208-
python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292 --gpu_ids 0,1 1,2 --op_num 4 8
208+
python3 -m paddle_serving_server.serve --model uci_housing_model_1 uci_housing_model_2 --thread 10 --port 9292 --gpu_ids 0,1 1,2 --runtime_thread_num 4 8
209209

210210
</center>
211211

Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,29 @@
11
dag:
2+
#op资源类型, True, 为线程模型;False,为进程模型
23
is_thread_op: false
4+
#使用性能分析, True,生成Timeline性能数据,对性能有一定影响;False为不使用
35
tracer:
46
interval_s: 30
7+
#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时,不自动生成http_port
58
http_port: 18082
69
op:
710
faster_rcnn:
11+
#并发数,is_thread_op=True时,为线程并发;否则为进程并发
812
concurrency: 2
9-
1013
local_service_conf:
14+
#client类型,包括brpc, grpc和local_predictor.local_predictor不启动Serving服务,进程内预测
1115
client_type: local_predictor
16+
# device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
1217
device_type: 1
18+
#计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
1319
devices: '2'
20+
#Fetch结果列表,以bert_seq128_model中fetch_var的alias_name为准, 如果没有设置则全部返回
1421
fetch_list:
1522
- save_infer_model/scale_0.tmp_1
23+
#模型路径
1624
model_config: serving_server/
25+
#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时,会自动将rpc_port设置为http_port+1
1726
rpc_port: 9998
27+
#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG
28+
#当build_dag_each_worker=False时,框架会设置主线程grpc线程池的max_workers=worker_num
1829
worker_num: 20
Lines changed: 12 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,30 @@
11
dag:
2+
#op资源类型, True, 为线程模型;False,为进程模型
23
is_thread_op: false
4+
#使用性能分析, True,生成Timeline性能数据,对性能有一定影响;False为不使用
35
tracer:
46
interval_s: 30
7+
#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时,不自动生成http_port
58
http_port: 18082
69
op:
710
ppyolo_mbv3:
11+
#并发数,is_thread_op=True时,为线程并发;否则为进程并发
812
concurrency: 10
913

1014
local_service_conf:
15+
#client类型,包括brpc, grpc和local_predictor.local_predictor不启动Serving服务,进程内预测
1116
client_type: local_predictor
17+
# device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
1218
device_type: 1
19+
#计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
1320
devices: '2'
21+
#Fetch结果列表,以bert_seq128_model中fetch_var的alias_name为准, 如果没有设置则全部返回
1422
fetch_list:
1523
- save_infer_model/scale_0.tmp_1
24+
#模型路径
1625
model_config: serving_server/
26+
#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时,会自动将rpc_port设置为http_port+1
1727
rpc_port: 9998
28+
#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG
29+
#当build_dag_each_worker=False时,框架会设置主线程grpc线程池的max_workers=worker_num
1830
worker_num: 20
Lines changed: 12 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,29 @@
11
dag:
2+
#op资源类型, True, 为线程模型;False,为进程模型
23
is_thread_op: false
4+
#使用性能分析, True,生成Timeline性能数据,对性能有一定影响;False为不使用
35
tracer:
46
interval_s: 30
7+
#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时,不自动生成http_port
58
http_port: 18082
69
op:
710
yolov3:
11+
#并发数,is_thread_op=True时,为线程并发;否则为进程并发
812
concurrency: 10
9-
1013
local_service_conf:
14+
#client类型,包括brpc, grpc和local_predictor.local_predictor不启动Serving服务,进程内预测
1115
client_type: local_predictor
16+
# device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
1217
device_type: 1
18+
#计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
1319
devices: '2'
20+
#Fetch结果列表,以bert_seq128_model中fetch_var的alias_name为准, 如果没有设置则全部返回
1421
fetch_list:
1522
- save_infer_model/scale_0.tmp_1
23+
#模型路径
1624
model_config: serving_server/
25+
#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时,会自动将rpc_port设置为http_port+1
1726
rpc_port: 9998
27+
#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG
28+
#当build_dag_each_worker=False时,框架会设置主线程grpc线程池的max_workers=worker_num
1829
worker_num: 20
Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,17 +1,32 @@
1+
#worker_num, 最大并发数。当build_dag_each_worker=True时, 框架会创建worker_num个进程,每个进程内构建grpcSever和DAG
2+
##当build_dag_each_worker=False时,框架会设置主线程grpc线程池的max_workers=worker_num
13
worker_num: 20
4+
#build_dag_each_worker, False,框架在进程内创建一条DAG;True,框架会每个进程内创建多个独立的DAG
5+
build_dag_each_worker: false
6+
27
dag:
8+
#op资源类型, True, 为线程模型;False,为进程模型
39
is_thread_op: false
10+
#使用性能分析, True,生成Timeline性能数据,对性能有一定影响;False为不使用
411
tracer:
512
interval_s: 10
13+
#http端口, rpc_port和http_port不允许同时为空。当rpc_port可用且http_port为空时,不自动生成http_port
614
http_port: 18082
15+
#rpc端口, rpc_port和http_port不允许同时为空。当rpc_port为空且http_port不为空时,会自动将rpc_port设置为http_port+1
716
rpc_port: 9998
817
op:
918
bert:
19+
#并发数,is_thread_op=True时,为线程并发;否则为进程并发
1020
concurrency: 2
11-
21+
#当op配置没有server_endpoints时,从local_service_conf读取本地服务配置
1222
local_service_conf:
23+
#client类型,包括brpc, grpc和local_predictor.local_predictor不启动Serving服务,进程内预测
1324
client_type: local_predictor
25+
# device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
1426
device_type: 1
27+
#计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
1528
devices: '2'
29+
#Fetch结果列表,以bert_seq128_model中fetch_var的alias_name为准, 如果没有设置则全部返回
1630
fetch_list:
31+
#bert模型路径
1732
model_config: bert_seq128_model/

python/examples/pipeline/ocr/config.yml

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -38,6 +38,9 @@ op:
3838

3939
#Fetch结果列表,以client_config中fetch_var的alias_name为准
4040
fetch_list: ["concat_1.tmp_0"]
41+
42+
# device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
43+
device_type: 0
4144

4245
#计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
4346
devices: ""
@@ -71,6 +74,8 @@ op:
7174

7275
#Fetch结果列表,以client_config中fetch_var的alias_name为准
7376
fetch_list: ["ctc_greedy_decoder_0.tmp_0", "softmax_0.tmp_0"]
77+
# device_type, 0=cpu, 1=gpu, 2=tensorRT, 3=arm cpu, 4=kunlun xpu
78+
device_type: 0
7479

7580
#计算硬件ID,当devices为""或不写时为CPU预测;当devices为"0", "0,1,2"时为GPU预测,表示使用的GPU卡
7681
devices: ""

python/paddle_serving_server/serve.py

Lines changed: 24 additions & 14 deletions
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,12 @@ def is_gpu_mode(unformatted_gpus):
109109

110110
def serve_args():
111111
parser = argparse.ArgumentParser("serve")
112-
parser.add_argument("server", type=str, default="start",nargs="?", help="stop or start PaddleServing")
112+
parser.add_argument(
113+
"server",
114+
type=str,
115+
default="start",
116+
nargs="?",
117+
help="stop or start PaddleServing")
113118
parser.add_argument(
114119
"--thread",
115120
type=int,
@@ -123,9 +128,13 @@ def serve_args():
123128
parser.add_argument(
124129
"--gpu_ids", type=str, default="", nargs="+", help="gpu ids")
125130
parser.add_argument(
126-
"--op_num", type=int, default=0, nargs="+", help="Number of each op")
131+
"--runtime_thread_num",
132+
type=int,
133+
default=0,
134+
nargs="+",
135+
help="Number of each op")
127136
parser.add_argument(
128-
"--op_max_batch",
137+
"--batch_infer_size",
129138
type=int,
130139
default=32,
131140
nargs="+",
@@ -251,11 +260,11 @@ def start_gpu_card_model(gpu_mode, port, args): # pylint: disable=doc-string-mi
251260
if args.gpu_multi_stream and device == "gpu":
252261
server.set_gpu_multi_stream()
253262

254-
if args.op_num:
255-
server.set_op_num(args.op_num)
263+
if args.runtime_thread_num:
264+
server.set_runtime_thread_num(args.runtime_thread_num)
256265

257-
if args.op_max_batch:
258-
server.set_op_max_batch(args.op_max_batch)
266+
if args.batch_infer_size:
267+
server.set_batch_infer_size(args.batch_infer_size)
259268

260269
if args.use_lite:
261270
server.set_lite()
@@ -370,7 +379,7 @@ def do_POST(self):
370379
self.wfile.write(json.dumps(response).encode())
371380

372381

373-
def stop_serving(command : str, port : int = None):
382+
def stop_serving(command: str, port: int=None):
374383
'''
375384
Stop PaddleServing by port.
376385
@@ -400,7 +409,7 @@ def stop_serving(command : str, port : int = None):
400409
start_time = info["start_time"]
401410
if port is not None:
402411
if port in storedPort:
403-
kill_stop_process_by_pid(command ,pid)
412+
kill_stop_process_by_pid(command, pid)
404413
infoList.remove(info)
405414
if len(infoList):
406415
with open(filepath, "w") as fp:
@@ -410,17 +419,18 @@ def stop_serving(command : str, port : int = None):
410419
return True
411420
else:
412421
if lastInfo == info:
413-
raise ValueError(
414-
"Please confirm the port [%s] you specified is correct." %
415-
port)
422+
raise ValueError(
423+
"Please confirm the port [%s] you specified is correct."
424+
% port)
416425
else:
417426
pass
418427
else:
419-
kill_stop_process_by_pid(command ,pid)
428+
kill_stop_process_by_pid(command, pid)
420429
if lastInfo == info:
421430
os.remove(filepath)
422431
return True
423432

433+
424434
if __name__ == "__main__":
425435
# args.device is not used at all.
426436
# just keep the interface.
@@ -436,7 +446,7 @@ def stop_serving(command : str, port : int = None):
436446
os._exit(0)
437447
else:
438448
os._exit(-1)
439-
449+
440450
for single_model_config in args.model:
441451
if os.path.isdir(single_model_config):
442452
pass

0 commit comments

Comments
 (0)