TadaoYamaokaの日記

山岡忠夫Homeで公開しているプログラムの開発ネタを中心に書いていきます。

NHWC vs NCHW on Google Colab

畳み込みの入力データの形式には、NHWCとNCHW があるが、どちらがTPUに最適か実験してみた。

TensorFlowのデフォルトはNHWCで、ChainerのデフォルトはNCHWになっている。

cuDNNはNCHWに最適化されている。
Performance  |  TensorFlow

しかし、TensorCoreは、NHWCに最適化されている。
Volta Tensor コア GPU が AI パフォーマンスの新記録を達成 | NVIDIA

なお、TensorFlowのCPUモードでは、MaxpoolingはNHWCしかサポートしていない。

TPUではどちらに最適化されているか調べてもわからなかった。

そこで、カラー画像のデータセットであるCIFAR-10を使って、2層の畳み込み層のあるネットワークを学習させて、実際に測ってみた。

NHWC on TPU

import tensorflow as tf
import os

(x_train, y_train),(x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(input_shape=(32, 32, 3), filters=256, kernel_size=3, padding='same', activation=tf.nn.relu),
  tf.keras.layers.MaxPool2D(),
  tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', activation=tf.nn.relu),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dense(10)
])

def loss(y_true, y_pred):
    return tf.keras.backend.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)

def accuracy(y_true, y_pred):
    return tf.keras.metrics.sparse_categorical_accuracy(y_true, tf.nn.softmax(y_pred))

# TPU
model = tf.contrib.tpu.keras_to_tpu_model(
    model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
        tf.contrib.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
    )
)

model.compile(optimizer='adam',
              loss=loss,
              metrics=[accuracy])
  
model.fit(x_train, y_train, batch_size=1024, epochs=5)
model.evaluate(x_test, y_test)
実行結果
INFO:tensorflow:Querying Tensorflow master (grpc://10.38.219.210:8470) for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 14236975761588034518)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 18323984429862649922)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 8554836709659081505)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 14275939449935473668)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 12561286578595138955)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 4788377722856588897)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 6433296346626590566)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 17630926159361266221)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 1107264524916151670)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 9476024643346962673)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 5663133602415764923)
WARNING:tensorflow:tpu_model (from tensorflow.contrib.tpu.python.tpu.keras_support) is experimental and may change or be removed at any time, and without warning.
Epoch 1/5
INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(128,), dtype=tf.int32, name='core_id_40'), TensorSpec(shape=(128, 32, 32, 3), dtype=tf.float32, name='conv2d_8_input_10'), TensorSpec(shape=(128, 1), dtype=tf.float32, name='dense_7_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False}
INFO:tensorflow:Remapping placeholder for conv2d_8_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f16729f7390> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 8.377423286437988 secs
INFO:tensorflow:Setting weights on TPU model.
INFO:tensorflow:CPU -> TPU lr: 0.0010000000474974513 {0.001}
INFO:tensorflow:CPU -> TPU beta_1: 0.8999999761581421 {0.9}
INFO:tensorflow:CPU -> TPU beta_2: 0.9990000128746033 {0.999}
INFO:tensorflow:CPU -> TPU decay: 0.0 {0.0}
WARNING:tensorflow:Cannot update non-variable config: epsilon
WARNING:tensorflow:Cannot update non-variable config: amsgrad
48128/50000 [===========================>..] - ETA: 1s - loss: 1.9748 - accuracy: 0.2997INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(106,), dtype=tf.int32, name='core_id_40'), TensorSpec(shape=(106, 32, 32, 3), dtype=tf.float32, name='conv2d_8_input_10'), TensorSpec(shape=(106, 1), dtype=tf.float32, name='dense_7_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for conv2d_8_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f16729f7390> [<tf.Variable 'tpu_139734392716536/Adam/iterations:0' shape=() dtype=int64>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f167229c898>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16722451d0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1672245780>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16721fea58>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16721c5c50>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f167218fbe0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16720fde10>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16720c4668>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16720903c8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1672059b70>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1672025d68>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671f909e8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671f5bf98>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671f213c8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671eec198>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671eb1cf8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671dfcdd8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671dc3e10>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671db2b70>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671cfc7b8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671cc5d68>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671cb2f60>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671bfaa90>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671bc66d8>]
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 8.441371440887451 secs
50000/50000 [==============================] - 43s 850us/sample - loss: 1.9587 - accuracy: 0.3050
Epoch 2/5
50000/50000 [==============================] - 7s 137us/sample - loss: 1.5943 - accuracy: 0.4338
Epoch 3/5
50000/50000 [==============================] - 7s 138us/sample - loss: 1.8239 - accuracy: 0.4031
Epoch 4/5
50000/50000 [==============================] - 7s 135us/sample - loss: 1.4532 - accuracy: 0.4980
Epoch 5/5
50000/50000 [==============================] - 7s 136us/sample - loss: 1.2632 - accuracy: 0.5597
INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(4,), dtype=tf.int32, name='core_id_50'), TensorSpec(shape=(4, 32, 32, 3), dtype=tf.float32, name='conv2d_8_input_10'), TensorSpec(shape=(4, 1), dtype=tf.float32, name='dense_7_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False}
INFO:tensorflow:Remapping placeholder for conv2d_8_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f167066ac88> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 4.571932315826416 secs
 9952/10000 [============================>.] - ETA: 0s - loss: 2.0134 - accuracy: 0.3715INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(2,), dtype=tf.int32, name='core_id_50'), TensorSpec(shape=(2, 32, 32, 3), dtype=tf.float32, name='conv2d_8_input_10'), TensorSpec(shape=(2, 1), dtype=tf.float32, name='dense_7_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for conv2d_8_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f167066ac88> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 3.0498554706573486 secs
10000/10000 [==============================] - 14s 1ms/sample - loss: 2.0130 - accuracy: 0.3718
[2.0130336515426634, 0.3718]

NCHW on TPU

import tensorflow as tf
import numpy as np
import os

(x_train, y_train),(x_test, y_test) = tf.keras.datasets.cifar10.load_data()
x_train, x_test = np.transpose(x_train, [0, 3, 1, 2]), np.transpose(x_test, [0, 3, 1, 2])
x_train, x_test = x_train / 255.0, x_test / 255.0

model = tf.keras.models.Sequential([
  tf.keras.layers.Conv2D(input_shape=(3, 32, 32), filters=256, kernel_size=3, padding='same', activation=tf.nn.relu, data_format='channels_first'),
  tf.keras.layers.MaxPool2D(data_format='channels_first'),
  tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', activation=tf.nn.relu, data_format='channels_first'),
  tf.keras.layers.Flatten(),
  tf.keras.layers.Dense(512, activation=tf.nn.relu),
  tf.keras.layers.Dense(10)
])

def loss(y_true, y_pred):
    return tf.keras.backend.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True)

def accuracy(y_true, y_pred):
    return tf.keras.metrics.sparse_categorical_accuracy(y_true, tf.nn.softmax(y_pred))

# TPU
model = tf.contrib.tpu.keras_to_tpu_model(
    model,
    strategy=tf.contrib.tpu.TPUDistributionStrategy(
        tf.contrib.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR'])
    )
)

model.compile(optimizer='adam',
              loss=loss,
              metrics=[accuracy])
  
model.fit(x_train, y_train, batch_size=1024, epochs=5)
model.evaluate(x_test, y_test)
実行結果
INFO:tensorflow:Querying Tensorflow master (grpc://10.38.219.210:8470) for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 14236975761588034518)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 18323984429862649922)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 8554836709659081505)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 14275939449935473668)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 12561286578595138955)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 4788377722856588897)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 6433296346626590566)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 17630926159361266221)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 1107264524916151670)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 9476024643346962673)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 5663133602415764923)
WARNING:tensorflow:tpu_model (from tensorflow.contrib.tpu.python.tpu.keras_support) is experimental and may change or be removed at any time, and without warning.
Epoch 1/5
INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(128,), dtype=tf.int32, name='core_id_70'), TensorSpec(shape=(128, 3, 32, 32), dtype=tf.float32, name='conv2d_12_input_10'), TensorSpec(shape=(128, 1), dtype=tf.float32, name='dense_11_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False}
INFO:tensorflow:Remapping placeholder for conv2d_12_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f166d50c518> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 16.735506296157837 secs
INFO:tensorflow:Setting weights on TPU model.
INFO:tensorflow:CPU -> TPU lr: 0.0010000000474974513 {0.001}
INFO:tensorflow:CPU -> TPU beta_1: 0.8999999761581421 {0.9}
INFO:tensorflow:CPU -> TPU beta_2: 0.9990000128746033 {0.999}
INFO:tensorflow:CPU -> TPU decay: 0.0 {0.0}
WARNING:tensorflow:Cannot update non-variable config: epsilon
WARNING:tensorflow:Cannot update non-variable config: amsgrad
48128/50000 [===========================>..] - ETA: 1s - loss: 2.0741 - accuracy: 0.2895INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(106,), dtype=tf.int32, name='core_id_70'), TensorSpec(shape=(106, 3, 32, 32), dtype=tf.float32, name='conv2d_12_input_10'), TensorSpec(shape=(106, 1), dtype=tf.float32, name='dense_11_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for conv2d_12_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f166d50c518> [<tf.Variable 'tpu_139734303609520/Adam/iterations:0' shape=() dtype=int64>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cdb1c50>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cd53518>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cd53ac8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cd125f8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166ccd9e80>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cc460f0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cc10278>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cbdc8d0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cba36a0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cb6b710>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cab64e0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166caa3550>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166ca6c2e8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c9b3b38>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c97d828>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c947be0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c8b5e10>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c879b00>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c849748>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c80ef98>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c780ef0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c749a20>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c70f668>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c6d8eb8>]
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 18.13627290725708 secs
50000/50000 [==============================] - 79s 2ms/sample - loss: 2.0549 - accuracy: 0.2950
Epoch 2/5
50000/50000 [==============================] - 6s 129us/sample - loss: 1.6440 - accuracy: 0.4250
Epoch 3/5
50000/50000 [==============================] - 7s 130us/sample - loss: 1.5291 - accuracy: 0.4633
Epoch 4/5
50000/50000 [==============================] - 6s 129us/sample - loss: 1.5411 - accuracy: 0.4643
Epoch 5/5
50000/50000 [==============================] - 6s 129us/sample - loss: 1.4702 - accuracy: 0.4918
INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(4,), dtype=tf.int32, name='core_id_80'), TensorSpec(shape=(4, 3, 32, 32), dtype=tf.float32, name='conv2d_12_input_10'), TensorSpec(shape=(4, 1), dtype=tf.float32, name='dense_11_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False}
INFO:tensorflow:Remapping placeholder for conv2d_12_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f166b0f7c88> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 11.619062662124634 secs
 9952/10000 [============================>.] - ETA: 0s - loss: 1.9147 - accuracy: 0.3833INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(2,), dtype=tf.int32, name='core_id_80'), TensorSpec(shape=(2, 3, 32, 32), dtype=tf.float32, name='conv2d_12_input_10'), TensorSpec(shape=(2, 1), dtype=tf.float32, name='dense_11_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for conv2d_12_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f166b0f7c88> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 6.101793527603149 secs
10000/10000 [==============================] - 27s 3ms/sample - loss: 1.9154 - accuracy: 0.3828
[1.91538902759552, 0.38279998]

比較

1 2 3 4 5 eval.
NHWC 43s 7s 7s 7s 7s 14s
NCHW 79s 6s 7s 6s 6s 27s

NCHWは1エポック目が遅いが、2エポック目からわずかに速くなっている。
inferenceはNHWCが速い。これもNCHWの初回の遅さがネックになっていると思われる。

TPUはNCHWは初回の時間がかかるが、学習データが多い場合は、最終的には速くなりそうである。

追試

GPUでも比較してみた。

1 2 3 4 5 eval.
NHWC 24s 23s 23s 23s 23s 3s
NCHW 22s 22s 23s 22s 22s 3s

GPUでもNCHWがわずかに速い。

追試2

ローカルのPCの1080Tiでも測ってみた。

1 2 3 4 5 eval.
NHWC 10s 8s 8s 8s 8s 1s
NCHW 9s 8s 7s 7s 7s 1s

やはりNCHWがわずかに速い。