You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This turned out to be a longer write-up than I anticipated. The main points are:
Keras' Theano Backend runs over 10X slower with batch normalization
This issue does not exist with a Tensorflow backend.
Issue #1309 seems to say the problem is fixed, though in my experience it persists
I don't know whether my problem stems from:
a. Theano's implementation of Batch Normalization
b. Keras' use of Theano's Batch Normalization Procedures
c. My use of Keras' use of Theano's Batch Normalization procedures
I'm currently using fairly recent versions of Keras, Theano and cuDNN:
Using Theano backend.
Using cuDNN version 7104 on context None
Mapped name None to device cuda: GeForce GTX 1080 with Max-Q Design (0000:01:00.0)
keras.version
'2.2.4'
theano.version
u'1.0.3'
When I run the following modified/simplified version of keras/examples/cifar10_resnet.py, I get a significant slowdown when batch normalization is used. The code is:
"""Adapted from cifar10_resnet.py"""
from future import print_function
import argparse
import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
from keras import backend as K
from keras.models import Model
from keras.datasets import cifar10
import numpy as np
import os
import pdb
def get_data():
""" Loads CIFAR10 Data and converts to numpy arrays
for net"""
x = inputs
x = conv(x)
if batch_norm:
x = BatchNormalization()(x)
x = Activation('relu')(x)
return x
def resnet_v1(input_shape, batch_norm):
"""ResNet Version 1 Model builder [a]
Arguments
input_shape (tensor): shape of input image tensor
batch_norm (bool): whether to include batch normalization
Returns
model (Model): Keras model instance
"""
Start model definition.
inputs = Input(shape=input_shape)
x = resnet_layer(inputs, batch_norm)
Instantiate the stack of residual units
for res_block in range(3):
y = resnet_layer(x, batch_norm)
x = keras.layers.add([x, y])
x = Activation('relu')(x)
Add classifier on top.
v1 does not use BN after last shortcut connection-ReLU
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(10,
activation='softmax',
kernel_initializer='he_normal')(y)
Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
if name == 'main':
ap = argparse.ArgumentParser()
ap.add_argument("--bn", action='store_true',
help="batch_normalization flag")
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
The different (with respect to speed) results I get are as follows:
WITH Batchnorm
$ python cifar10_resnet_batchnorm_test.py --bn
Using Theano backend.
Using cuDNN version 7104 on context None
Mapped name None to device cuda: GeForce GTX 1080 with Max-Q Design (0000:01:00.0)
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 82s 2ms/step - loss: 1.7749 - acc: 0.3629 - val_loss: 1.5035 - val_acc: 0.4697
10000/10000 [==============================] - 4s 383us/step
Test loss: 1.5035479030609131
Test accuracy: 0.4697
WITHOUT Batchnorm
$ python cifar10_resnet_batchnorm_test.py
Using Theano backend.
Using cuDNN version 7104 on context None
Mapped name None to device cuda: GeForce GTX 1080 with Max-Q Design (0000:01:00.0)
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 7s 132us/step - loss: 1.7122 - acc: 0.3863 - val_loss: 1.4815 - val_acc: 0.4750
10000/10000 [==============================] - 0s 27us/step
Test loss: 1.481479389190674
Test accuracy: 0.475
So, the issue seems to be the use of batch norm with Theano. It's also puzzling that Theano does slightly worse (-0.5%) with batch norm, while Tensorflow does noticeably better with batch norm (+6%), but the two backends converge with more training epochs.
I note that issue #1309 seems to be about this same problem and seems to regard it as solved, as of Feb. 14, 2017. Yet, I'm still having this problem. Is it something I'm doing, an issue with Keras' interfacing with Theano or an issue with Theano's implementation of Batch Normalization?
(Note: This is a distillation of an issue I raised in #12173, where I noted that Tensorflow does not experience this slowdown. But, I closed that issue and opened this one, since I am now able to more precisely state what the issue I've run into is. I hope this is the correct protocol for redefining an issue after further study)
The text was updated successfully, but these errors were encountered:
This turned out to be a longer write-up than I anticipated. The main points are:
Keras' Theano Backend runs over 10X slower with batch normalization
This issue does not exist with a Tensorflow backend.
Issue #1309 seems to say the problem is fixed, though in my experience it persists
I don't know whether my problem stems from:
a. Theano's implementation of Batch Normalization
b. Keras' use of Theano's Batch Normalization Procedures
c. My use of Keras' use of Theano's Batch Normalization procedures
I'm currently using fairly recent versions of Keras, Theano and cuDNN:
Using Theano backend.
Using cuDNN version 7104 on context None
Mapped name None to device cuda: GeForce GTX 1080 with Max-Q Design (0000:01:00.0)
keras.version
'2.2.4'
theano.version
u'1.0.3'
When I run the following modified/simplified version of keras/examples/cifar10_resnet.py, I get a significant slowdown when batch normalization is used. The code is:
"""Adapted from cifar10_resnet.py"""
from future import print_function
import argparse
import keras
from keras.layers import Dense, Conv2D, BatchNormalization, Activation
from keras.layers import AveragePooling2D, Input, Flatten
from keras.optimizers import Adam
from keras.callbacks import ModelCheckpoint, LearningRateScheduler
from keras.callbacks import ReduceLROnPlateau
from keras.preprocessing.image import ImageDataGenerator
from keras.regularizers import l2
from keras import backend as K
from keras.models import Model
from keras.datasets import cifar10
import numpy as np
import os
import pdb
def get_data():
""" Loads CIFAR10 Data and converts to numpy arrays
for net"""
Load the CIFAR10 data.
(x_train, y_train), (x_test, y_test) = cifar10.load_data()
Normalize data.
x_train = x_train.astype('float32') / 255
x_test = x_test.astype('float32') / 255
Convert class vectors to binary class matrices.
num_classes = 10
y_train = keras.utils.to_categorical(y_train, num_classes)
y_test = keras.utils.to_categorical(y_test, num_classes)
return (x_train, x_test, y_train, y_test)
def resnet_layer(inputs, batch_norm):
"""2D Convolution-Batch Normalization-Activation stack builder
Arguments
Returns
"""
conv = Conv2D(16,
kernel_size=3,
strides=1,
padding='same',
kernel_initializer='he_normal',
kernel_regularizer=l2(1e-4))
x = inputs
x = conv(x)
if batch_norm:
x = BatchNormalization()(x)
x = Activation('relu')(x)
return x
def resnet_v1(input_shape, batch_norm):
"""ResNet Version 1 Model builder [a]
Arguments
Returns
"""
Start model definition.
inputs = Input(shape=input_shape)
x = resnet_layer(inputs, batch_norm)
Instantiate the stack of residual units
for res_block in range(3):
y = resnet_layer(x, batch_norm)
x = keras.layers.add([x, y])
x = Activation('relu')(x)
Add classifier on top.
v1 does not use BN after last shortcut connection-ReLU
x = AveragePooling2D(pool_size=8)(x)
y = Flatten()(x)
outputs = Dense(10,
activation='softmax',
kernel_initializer='he_normal')(y)
Instantiate model.
model = Model(inputs=inputs, outputs=outputs)
return model
if name == 'main':
ap = argparse.ArgumentParser()
ap.add_argument("--bn", action='store_true',
help="batch_normalization flag")
args = ap.parse_args()
batch_norm_flag = args.bn
Get CIFAR10 data
x_train, x_test, y_train, y_test = get_data()
Build net
input_shape = x_train.shape[1:]
model = resnet_v1(input_shape, batch_norm_flag)
model.compile(loss='categorical_crossentropy',
optimizer=Adam(lr=1e-3),
metrics=['accuracy'])
#model.summary()
Run training, without data augmentation.
model.fit(x_train, y_train,
batch_size=32,
epochs=1,
validation_data=(x_test, y_test),
shuffle=True)
Score trained model.
scores = model.evaluate(x_test, y_test, verbose=1)
print('Test loss:', scores[0])
print('Test accuracy:', scores[1])
The different (with respect to speed) results I get are as follows:
WITH Batchnorm
$ python cifar10_resnet_batchnorm_test.py --bn
Using Theano backend.
Using cuDNN version 7104 on context None
Mapped name None to device cuda: GeForce GTX 1080 with Max-Q Design (0000:01:00.0)
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 82s 2ms/step - loss: 1.7749 - acc: 0.3629 - val_loss: 1.5035 - val_acc: 0.4697
10000/10000 [==============================] - 4s 383us/step
Test loss: 1.5035479030609131
Test accuracy: 0.4697
WITHOUT Batchnorm
$ python cifar10_resnet_batchnorm_test.py
Using Theano backend.
Using cuDNN version 7104 on context None
Mapped name None to device cuda: GeForce GTX 1080 with Max-Q Design (0000:01:00.0)
Train on 50000 samples, validate on 10000 samples
Epoch 1/1
50000/50000 [==============================] - 7s 132us/step - loss: 1.7122 - acc: 0.3863 - val_loss: 1.4815 - val_acc: 0.4750
10000/10000 [==============================] - 0s 27us/step
Test loss: 1.481479389190674
Test accuracy: 0.475
This slowdown is by more than a factor of 10.
If I use Tensorflow, the timings are:
WITH Batch Norm
50000/50000 [==============================] - 10s 204us/step - loss: 1.5723 - acc: 0.4416 - val_loss: 1.3853 - val_acc: 0.5129
10000/10000 [==============================] - 1s 57us/step
Test loss: 1.3852726093292236
Test accuracy: 0.5129
WITHOUT Batch Norm:
50000/50000 [==============================] - 10s 194us/step - loss: 1.7507 - acc: 0.3728 - val_loss: 1.5280 - val_acc: 0.4503
10000/10000 [==============================] - 1s 53us/step
Test loss: 1.527961152267456
Test accuracy: 0.4503
So, the issue seems to be the use of batch norm with Theano. It's also puzzling that Theano does slightly worse (-0.5%) with batch norm, while Tensorflow does noticeably better with batch norm (+6%), but the two backends converge with more training epochs.
I note that issue #1309 seems to be about this same problem and seems to regard it as solved, as of Feb. 14, 2017. Yet, I'm still having this problem. Is it something I'm doing, an issue with Keras' interfacing with Theano or an issue with Theano's implementation of Batch Normalization?
(Note: This is a distillation of an issue I raised in #12173, where I noted that Tensorflow does not experience this slowdown. But, I closed that issue and opened this one, since I am now able to more precisely state what the issue I've run into is. I hope this is the correct protocol for redefining an issue after further study)
The text was updated successfully, but these errors were encountered: