CPU and GPU Performance

TensorFlow offers support for both standard CPU as well as GPU based deep learning. This page shows the difference between CPU and GPU models in terms of performance.

GPU Model

GPU or Graphical Processing Unit has a lot of cores that allow it for faster computation simultaneously (parallelism). This feature is ideal for performing massive mathematical calculations like calculating image matrices.

GPUs can be used to train a TensorFlow model. The following program is used to train a model on a GPU.

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
# scaling image values between 0-1
X_train_scaled = X_train/255
X_test_scaled = X_test/255
# one hot encoding labels
y_train_encoded = keras.utils.to_categorical(y_train, num_classes = 10, dtype = 'float32')
y_test_encoded = keras.utils.to_categorical(y_test, num_classes = 10, dtype = 'float32')
# Model Building
def get_model():
    model = keras.Sequential([
        keras.layers.Flatten(input_shape=(32,32,3)),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(1000, activation='relu'),
        keras.layers.Dense(10, activation='sigmoid')
    ])
    model.compile(optimizer='SGD',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
    return model
%%timeit -n1 -r1
# Benchmarking On 10 Epoch
# GPU
with tf.device('/GPU:0'):
    model_gpu = get_model()
    model_gpu.fit(X_train_scaled, y_train_encoded, epochs = 10)

The above code creates a digit classifier for the famous cifar10 dataset consisting of 32*32 color images splattered into 50,000 train and 10,000 test images along with ten classes.

Running this program using Notebook provides the following output:

Epoch 1/10
1563/1563 [==============================] - 13s 6ms/step - loss: 1.8124 - accuracy: 0.3540
Epoch 2/10
1563/1563 [==============================] - 9s 6ms/step - loss: 1.6242 - accuracy: 0.4272
Epoch 3/10
1563/1563 [==============================] - 9s 6ms/step - loss: 1.5429 - accuracy: 0.4577
Epoch 4/10
1563/1563 [==============================] - 9s 6ms/step - loss: 1.4840 - accuracy: 0.4771
Epoch 5/10
1563/1563 [==============================] - 9s 6ms/step - loss: 1.4330 - accuracy: 0.4961
Epoch 6/10
1563/1563 [==============================] - 9s 6ms/step - loss: 1.3922 - accuracy: 0.5121
Epoch 7/10
1563/1563 [==============================] - 9s 6ms/step - loss: 1.3531 - accuracy: 0.5246
Epoch 8/10
1563/1563 [==============================] - 9s 6ms/step - loss: 1.3154 - accuracy: 0.5383
Epoch 9/10
1563/1563 [==============================] - 9s 6ms/step - loss: 1.2848 - accuracy: 0.5494
Epoch 10/10
1563/1563 [==============================] - 9s 6ms/step - loss: 1.2541 - accuracy: 0.5606
2min 26s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

The output shows that it took 2 min and 26s to build the model on a GPU.

CPU Model

You can also build your TensorFlow model on CPUs. For the same purpose, the above example is used with some modification. Instead of running it on a GPU, run it on a GPU. Therefore the last part should be modified as show below:

import tensorflow as tf
from tensorflow import keras
import numpy as np
import matplotlib.pyplot as plt

(X_train, y_train), (X_test, y_test) = keras.datasets.cifar10.load_data()
# scaling image values between 0-1
X_train_scaled = X_train/255
X_test_scaled = X_test/255
# one hot encoding labels
y_train_encoded = keras.utils.to_categorical(y_train, num_classes = 10, dtype = 'float32')
y_test_encoded = keras.utils.to_categorical(y_test, num_classes = 10, dtype = 'float32')
# Model Building
def get_model():
    model = keras.Sequential([
        keras.layers.Flatten(input_shape=(32,32,3)),
        keras.layers.Dense(3000, activation='relu'),
        keras.layers.Dense(1000, activation='relu'),
        keras.layers.Dense(10, activation='sigmoid')
    ])
    model.compile(optimizer='SGD',
              loss='categorical_crossentropy',
              metrics=['accuracy'])
    return model
%%timeit -n1 -r1
# Benchmarking On 10 Epoch
# CPU
with tf.device('/CPU:0'):
    model_cpu = get_model()
    model_cpu.fit(X_train_scaled, y_train_encoded, epochs = 10)

Running this program using Notebook provides the following output:

Epoch 1/10
1563/1563 [==============================] - 60s 38ms/step - loss: 1.8113 - accuracy: 0.3539
Epoch 2/10
1563/1563 [==============================] - 59s 38ms/step - loss: 1.6242 - accuracy: 0.4289
Epoch 3/10
1563/1563 [==============================] - 59s 38ms/step - loss: 1.5450 - accuracy: 0.4565
Epoch 4/10
1563/1563 [==============================] - 59s 38ms/step - loss: 1.4852 - accuracy: 0.4765
Epoch 5/10
1563/1563 [==============================] - 59s 38ms/step - loss: 1.4359 - accuracy: 0.4952
Epoch 6/10
1563/1563 [==============================] - 59s 38ms/step - loss: 1.3914 - accuracy: 0.5089
Epoch 7/10
1563/1563 [==============================] - 59s 38ms/step - loss: 1.3525 - accuracy: 0.5255
Epoch 8/10
1563/1563 [==============================] - 59s 38ms/step - loss: 1.3180 - accuracy: 0.5369
Epoch 9/10
1563/1563 [==============================] - 59s 38ms/step - loss: 1.2877 - accuracy: 0.5485
Epoch 10/10
1563/1563 [==============================] - 59s 38ms/step - loss: 1.2544 - accuracy: 0.5619
9min 53s ± 0 ns per loop (mean ± std. dev. of 1 run, 1 loop each)

The output shows that it took 9 min and 53s to build the model on a CPU.

Conclusion

From the output of the above examples, you can notice that GPU outperformed CPU. The time the program to build the model on CPU is about 4 times the time taken on a CPU.