@@ -264,7 +264,19 @@ If you want to continue to train the model, simply re-run the above command beca
264
264
265
265
Just add ``` --strategy=gpus ```
266
266
267
- ## 10. Training EfficientDets on TPUs.
267
+ ## 10. Train on multi node GPUs.
268
+ Following scripts will start a training task with 2 nodes.
269
+
270
+ Start Chief training node.
271
+ ```
272
+ python -m tf2.train --strategy=multi-gpus --worker=server_address1:12345,server_address2:23456 --worker_index=0 --mode=train --train_file_pattern=tfrecord/pascal*.tfrecord --model_name=efficientdet-d0 --model_dir=/tmp/efficientdet-d0 --batch_size=64 --num_examples_per_epoch=5717 --num_epochs=50 --hparams=voc_config.yaml
273
+ ```
274
+ Start the other training node.
275
+ ```
276
+ python -m tf2.train --strategy=multi-gpus --worker=server_address1:12345,server_address2:23456 --worker_index=1 --mode=train --train_file_pattern=tfrecord/pascal*.tfrecord --model_name=efficientdet-d0 --model_dir=/tmp/efficientdet-d0_1 --batch_size=64 --num_examples_per_epoch=5717 --num_epochs=50 --hparams=voc_config.yaml
277
+ ```
278
+
279
+ ## 11. Training EfficientDets on TPUs.
268
280
269
281
To train this model on Cloud TPU, you will need:
270
282
@@ -286,7 +298,7 @@ For more instructions about training on TPUs, please refer to the following tuto
286
298
287
299
* EfficientNet tutorial: https://cloud.google.com/tpu/docs/tutorials/efficientnet
288
300
289
- ## 11 . Reducing Memory Usage when Training EfficientDets on GPU.
301
+ ## 12 . Reducing Memory Usage when Training EfficientDets on GPU.
290
302
291
303
EfficientDets use a lot of GPU memory for a few reasons:
292
304
@@ -306,7 +318,7 @@ If set to True, keras model uses ```tf.recompute_grad``` to achieve gradient che
306
318
Testing shows that:
307
319
* It allows to train a d7x network with batch size of 2 on a 11Gb (1080Ti) GPU
308
320
309
- ## 12 . Visualize TF-Records.
321
+ ## 13 . Visualize TF-Records.
310
322
311
323
You can visualize tf-records with following commands:
312
324
@@ -331,7 +343,7 @@ python dataset/inspect_tfrecords.py --file_pattern dataset/sample.record\
331
343
* save_samples_dir: save dir.
332
344
* eval: flag for eval data.
333
345
334
- ## 13 . Export to ONNX
346
+ ## 14 . Export to ONNX
335
347
(1) Install tf2onnx
336
348
```
337
349
pip install tf2onnx
@@ -352,7 +364,7 @@ nms_configs:
352
364
python -m tf2onnx.convert --saved-model=<saved model directory> --output=<onnx filename> --opset=11
353
365
```
354
366
355
- ## 14 . Debug
367
+ ## 15 . Debug
356
368
Just add ``` --debug ``` after command, then you could use pdb debug the model with eager execution and deterministic operations.
357
369
358
370
NOTE: this is not an official Google product.
0 commit comments