You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Now run the fleet to process the PDFs. In this tutorial we use the static array index with `--task 11` to specify the tasks for the 11 pdfs. The command is a bash script which is using the `CE_TASK_ID`, which contains values `0..10`, to fetch the pdf file. It's then running docling with 24 CPUs on the mx3d-24x240 worker. Therefore it's only running one instance per worker and utilizing the full worker. We run 4 instance and workers in parallel. Run the fleet with the following command in the `tutorials/docling` directory.
60
+
Now run the fleet to process the PDFs. In this tutorial we use the static array index with `--tasks-from-file commands.jsonl` to specify the tasks for the 11 pdfs. We give each task 24 vCPU, run docling with `--num-threads 24` and choose a mx3d-24x240 worker profile with 24 vCPU. Therefore we run only 1 docling command per worker at a time and utilize the full worker per pdf processing. We run `--max-scale 4` instances and workers in parallel. Launch the fleet with the following command in the `tutorials/docling` directory.
Preparing your tasks: ⠼ Please wait...took 11.233582 seconds.
63
81
Preparing your tasks: ⠴ Please wait...
64
82
COS Bucket used 'ce-fleet-sandbox-data-fbfdde1d'...
@@ -139,14 +157,23 @@ Succeeded Tasks: 0
139
157
</details>
140
158
<br/>
141
159
142
-
If you like you can jump to the machine and see docling processing by running the following command in the root directory:
160
+
(optional) If you like you can jump to the machine and see docling processing by running the following command in the root directory:
143
161
```
144
162
./jump <IP>
145
163
```
146
164
147
165
You can use `htop` to see that docling is processing the PDFs
148
166

149
167
168
+
169
+
#### Playing with more parallism
170
+
171
+
If you want to modify the tutorial to add some more parallism, e.g. to run 4 docling commands per worker, you could change the arguments and run script as follows:
172
+
1. the arguments in commands.jsonl to `--num-threads 6`
173
+
2. the cpu per task to `--cpu 6`
174
+
Now, with `--max-scale 4` you would only get a single worker. Modify `--max-scale 8` to get 2 workers, each processing 4 docling commands.
175
+
176
+
150
177
### Step 4 - Download results
151
178
152
179
Download the results from the COS by running the following command in the root directory:
@@ -156,7 +183,7 @@ Download the results from the COS by running the following command in the root d
0 commit comments