Skip to content

Commit 2649d9a

Browse files
Add TF multi-gpu example
1 parent d956861 commit 2649d9a

File tree

4 files changed

+105
-0
lines changed

4 files changed

+105
-0
lines changed

AI/TensorFlow/Example4/README.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,14 @@
1+
## Purpose
2+
3+
Show how to use multiple GPUs with Tensorflow
4+
5+
## Contents
6+
7+
- `tf_test_multi_gpu.py`: Modified code [`tf_test.py`](../tf_test.py) to use all available GPUs on a node
8+
- `run.sbatch`: Slurm batch-job submission script to pull singularity image and run `tf_test_multi_gpu.py`
9+
- `tf_test.out`: Output file
10+
11+
## Important notes
12+
13+
1. In this example the slurm batch script pulls a singularity container with TensorFlow and runs the examples inside the singularity container. However, you can modify `run.sbatch` script to run within a conda/mamba environment.
14+

AI/TensorFlow/Example4/run.sbatch

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,16 @@
1+
#!/bin/bash
2+
#SBATCH -p gpu
3+
#SBATCH -c 8
4+
#SBATCH -t 00:30:00
5+
#SBATCH -J tf_test
6+
#SBATCH -o tf_test.out
7+
#SBATCH -e tf_test.err
8+
#SBATCH --gres=gpu:4
9+
#SBATCH --mem=8G
10+
11+
# pull singularity image
12+
# this is a one-time setup. Once downloaded, you don't need to pull it again
13+
srun -c $SLURM_CPUS_PER_TASK singularity pull --disable-cache docker://tensorflow/tensorflow:latest-gpu
14+
15+
# --- run code tf_test_multi_gpu.py ---
16+
singularity exec --nv tensorflow_latest-gpu.sif python tf_test_multi_gpu.py

AI/TensorFlow/Example4/tf_test.out

Lines changed: 29 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,29 @@
1+
2.14.0
2+
Number of devices: 4
3+
Epoch 1/10
4+
1875/1875 - 13s - loss: 0.5010 - accuracy: 0.8232 - 13s/epoch - 7ms/step
5+
Epoch 2/10
6+
1875/1875 - 7s - loss: 0.3747 - accuracy: 0.8646 - 7s/epoch - 4ms/step
7+
Epoch 3/10
8+
1875/1875 - 7s - loss: 0.3367 - accuracy: 0.8768 - 7s/epoch - 4ms/step
9+
Epoch 4/10
10+
1875/1875 - 7s - loss: 0.3129 - accuracy: 0.8856 - 7s/epoch - 4ms/step
11+
Epoch 5/10
12+
1875/1875 - 7s - loss: 0.2960 - accuracy: 0.8910 - 7s/epoch - 4ms/step
13+
Epoch 6/10
14+
1875/1875 - 7s - loss: 0.2799 - accuracy: 0.8973 - 7s/epoch - 4ms/step
15+
Epoch 7/10
16+
1875/1875 - 7s - loss: 0.2685 - accuracy: 0.9000 - 7s/epoch - 4ms/step
17+
Epoch 8/10
18+
1875/1875 - 7s - loss: 0.2580 - accuracy: 0.9036 - 7s/epoch - 4ms/step
19+
Epoch 9/10
20+
1875/1875 - 7s - loss: 0.2480 - accuracy: 0.9083 - 7s/epoch - 4ms/step
21+
Epoch 10/10
22+
1875/1875 - 7s - loss: 0.2383 - accuracy: 0.9110 - 7s/epoch - 4ms/step
23+
313/313 - 1s - loss: 0.3294 - accuracy: 0.8843 - 1s/epoch - 5ms/step
24+
25+
Test accuracy: 0.8842999935150146
26+
313/313 - 1s - 1s/epoch - 4ms/step
27+
[4.0451411e-07 4.3211493e-12 1.8949876e-10 1.1165977e-12 3.1353355e-08
28+
1.5895354e-03 4.6215266e-08 5.1007383e-03 7.3685516e-07 9.9330854e-01]
29+
9
Lines changed: 46 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,46 @@
1+
#!/usr/bin/env python
2+
from __future__ import absolute_import, division, print_function, unicode_literals
3+
import os
4+
os.environ['TF_CPP_MIN_LOG_LEVEL'] = '3'
5+
import tensorflow as tf
6+
from tensorflow import keras
7+
import numpy as np
8+
9+
print(tf.__version__)
10+
11+
# Create a MirroredStrategy.
12+
strategy = tf.distribute.MirroredStrategy()
13+
print("Number of devices: {}".format(strategy.num_replicas_in_sync))
14+
15+
# Open a strategy scope.
16+
with strategy.scope():
17+
# Everything that creates variables should be under the strategy scope.
18+
# In general this is only model construction & `compile()`.
19+
fashion_mnist = keras.datasets.fashion_mnist
20+
(train_images, train_labels), (test_images, test_labels) = fashion_mnist.load_data()
21+
22+
class_names = ['T-shirt/top', 'Trouser', 'Pullover', 'Dress', 'Coat', 'Sandal', 'Shirt', 'Sneaker', 'Bag', 'Ankle boot']
23+
24+
train_images = train_images / 255.0
25+
test_images = test_images / 255.0
26+
model = keras.Sequential([
27+
keras.layers.Flatten(input_shape=(28, 28)),
28+
keras.layers.Dense(128, activation='relu'),
29+
keras.layers.Dense(10, activation='softmax')
30+
])
31+
32+
model.compile(optimizer='adam', loss='sparse_categorical_crossentropy', metrics=['accuracy'])
33+
34+
# you can change verbose=1 to see progress bars when running interactively
35+
model.fit(train_images, train_labels, epochs=10, verbose=2)
36+
37+
# you can change verbose=1 to see progress bars when running interactively
38+
test_loss, test_acc = model.evaluate(test_images, test_labels, verbose=2)
39+
print('\nTest accuracy:', test_acc)
40+
41+
# you can change verbose=1 to see progress bars when running interactively
42+
predictions = model.predict(test_images, verbose=2)
43+
print(predictions[0])
44+
print(np.argmax(predictions[0]))
45+
46+

0 commit comments

Comments
 (0)