You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/installation.mdx
+1-1Lines changed: 1 addition & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -23,7 +23,7 @@ The basic environment is as follows:
23
23
All dependencies mentioned above come from a specific default backend. The construction of C++ core does not rely on any of the above dependencies.
24
24
:::
25
25
26
-
## Using NGC base image
26
+
## Using NGC base image{#NGC}
27
27
The easiest way is to choose NGC mirror for source code compilation (official mirror may still be able to run low version drivers through Forward Compatibility or Minor Version Compatibility).
- Achieves near-optimal performance (peak throughput/TP99) from a business perspective, reducing widespread negative optimization and performance loss between nodes.
24
24
- With a fine-grained generic backend, it is easy to expand hardware and weaken the difficulty of hardware vendor ecosystem migration.
25
25
- Simple and high-performance modeling, including complex business systems such as multi-model fusion. Typical industrial scenarios include AI systems with up to 10 model nodes in smart cities, and OCR systems that involve subgraph independent scheduling, bucket scheduling, and intelligent batch grouping for extreme optimization.
Copy file name to clipboardExpand all lines: docs/quick_start_new_user.md
+248Lines changed: 248 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -4,3 +4,251 @@ title: Beginner's Guide - A Small Step Forward
4
4
type: explainer
5
5
---
6
6
7
+
8
+
# Trial in 30mins(new users)
9
+
10
+
TorchPipe is a multi-instance pipeline parallel library that provides a seamless integration between lower-level acceleration libraries (such as TensorRT and OpenCV) and RPC frameworks. It guarantees high service throughput while meeting latency requirements. This document is mainly for new users, that is, users who are in the introductory stage of acceleration-related theoretical knowledge, know some python grammar, and can read simple codes. This content mainly includes the use of torchpipe for accelerating service deployment, complemented by performance and effect comparisons.
11
+
12
+
## Catalogue
13
+
*[1. Basic knowledge](#1)
14
+
*[2. Environmental installation and configuration](#2)
15
+
*[3. Acceleration Case - The service includes only a single model, using ResNet50 as an example.](#3)
16
+
*[3.1 Using TensorRT Acceleration Scheme](#3.1)
17
+
*[3.2 Using TorchPipe Acceleration Scheme](#3.2)
18
+
*[4. Performance and Effect Comparison](#4)
19
+
20
+
<aname='1'></a>
21
+
22
+
## 1. Basic knowledge
23
+
24
+
The field of deep learning has seen rapid advancement in recent years with significant progress in areas such as image recognition, text recognition, and speech recognition. Currently, there are several model acceleration techniques that enhance the inference speed of deep learning models through computational and hardware optimization, and these have resulted in notable achievements in practical applications. These techniques include those based on TensorRT and TVM acceleration. This tutorial will use the simplest business case from actual business deployment to demonstrate how to use torchpipe for online service deployment. The entire service only includes a single ResNet50 model. The overall service flow process is as illustrated below.
We will briefly explain some concepts that need to be understood in model deployment. We hope to be helpful to you who are experiencing TorchPipe for the first time. For details, please refer to [Preliminary Knowledge](./preliminaries).
28
+
29
+
30
+
31
+
<aname='2'></a>
32
+
33
+
## 2. Environmental installation and configuration
34
+
35
+
For specific installation steps, please refer to [installation](installation.mdx). We provide two methods for configuring the TorchPipe environment:
## 3. Acceleration Case: Advancing from TensorRT to torchpipe.
43
+
44
+
This section begin by discussing the application of the TensorRT acceleration solution,and provide a general acceleration strategy for service deployment.Then, leveraging this solution, we will employ torchpipe to further optimize the acceleration across the entire service.
45
+
<aname='3.1'></a>
46
+
47
+
### 3.1 Using TensorRT Acceleration Scheme {#UTAS}
TensorRT is an SDK that facilitates high-performance machine learning inference. It focuses specifically on running an already-trained network quickly and efficiently on NVIDIA hardware. However, TensorRT only supports optimization and acceleration for a model. Therefore, during the deployment of this service, we still use conventional operations for data decoding and preprocessing, both of which are done in Python. The model acceleration is achieved by using TensorRT to build the engine.
53
+
54
+
The details of each part are as follows:
55
+
56
+
1、Data decoding
57
+
This part primarily relies on CPU data decoding to execute the operation.
58
+
59
+
```py
60
+
## Data decoding(CPU decoding)
61
+
img = cv2.imdecode(img, flags=cv2.IMREAD_COLOR)
62
+
```
63
+
64
+
2、Preprocessing
65
+
In this part, we mainly uses the built-in functions of pytorch to complete the operation
The overall online service deployment can be found at [main_trt.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50/main_trt.py)
95
+
96
+
:::tip
97
+
Since TensorRT is not thread-safe, when using this method for model acceleration, it is necessary to handle locking (with self.lock:) during the service deployment process.
98
+
:::
99
+
100
+
101
+
102
+
<aname='3.2'></a>
103
+
104
+
### 3.2 Using TorchPipe Acceleration Scheme
105
+
106
+
From the above process, it's clear that when accelerating a single model, the focus is primarily on the acceleration of the model itself, while other factors in the service, such as data decoding and preprocessing operations, are overlooked. These preprocessing steps can impact the service's throughput and latency. Therefore, to achieve optimal throughput and latency, we use TorchPipe to optimize the entire service. The specific steps include:
107
+
108
+
- Multi-instance, dynamic batch processing, and bucketing on a single computing node
109
+
- Pipeline scheduling across multiple nodes
110
+
- Logical control flow between nodes
111
+
112
+

113
+
114
+
We've made adjustments to the deployment of our service using TorchPipe.The overall online service deployment can be found at [main_torchpipe.py](https://g.hz.netease.com/deploy/torchpipe/-/blob/develop/examples/resnet50/main_torchpipe.py).
From the above, we see a reduction in code volume compared to the original main function. The key lies within the contents of the toml file, which includes three nodes: [cpu_decoder], [cpu_posdecoder], and [resnet50]. These nodes operate in sequence, corresponding to the three parts mentioned in [section 3.1](quick_start_new_user.md#UTAS), as shown below:
"model::cache"="/you/model/path/resnet50.trt"# or resnet50.trt.encrypted
212
+
213
+
```
214
+
215
+
216
+
:::tip
217
+
- For the specific usage and functionality of other backend operators, please refer to [Basic Backend](./backend-reference/basic), [OpenCV Backend](./backend-reference/opencv), [Torch Backend](./backend-reference/torch), and [Log](./backend-reference/log).
218
+
- This deviates slightly from the original method of generating engines online, as the model needs to be first converted to ONNX format. For the conversion method, see [Converting Torch to ONNX](faq/onnx.mdx).
219
+
- TorchPipe has resolved the issue of TensorRT objects not being thread-safe and has undergone extensive experimental testing. Therefore, the lock can be disabled during service operation, i.e., the line `with self.lock:` can be commented out (in [section 3.1]).
0 commit comments