畳み込みの入力データの形式には、NHWCとNCHW があるが、どちらがTPUに最適か実験してみた。
TensorFlowのデフォルトはNHWCで、ChainerのデフォルトはNCHWになっている。
cuDNNはNCHWに最適化されている。
https://www.tensorflow.org/guide/performance/overview
しかし、TensorCoreは、NHWCに最適化されている。
Volta Tensor コア GPU が AI パフォーマンスの新記録を達成 | NVIDIA
なお、TensorFlowのCPUモードでは、MaxpoolingはNHWCしかサポートしていない。
TPUではどちらに最適化されているか調べてもわからなかった。
そこで、カラー画像のデータセットであるCIFAR-10を使って、2層の畳み込み層のあるネットワークを学習させて、実際に測ってみた。
NHWC on TPU
import tensorflow as tf import os (x_train, y_train),(x_test, y_test) = tf.keras.datasets.cifar10.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Conv2D(input_shape=(32, 32, 3), filters=256, kernel_size=3, padding='same', activation=tf.nn.relu), tf.keras.layers.MaxPool2D(), tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', activation=tf.nn.relu), tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dense(10) ]) def loss(y_true, y_pred): return tf.keras.backend.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True) def accuracy(y_true, y_pred): return tf.keras.metrics.sparse_categorical_accuracy(y_true, tf.nn.softmax(y_pred)) # TPU model = tf.contrib.tpu.keras_to_tpu_model( model, strategy=tf.contrib.tpu.TPUDistributionStrategy( tf.contrib.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR']) ) ) model.compile(optimizer='adam', loss=loss, metrics=[accuracy]) model.fit(x_train, y_train, batch_size=1024, epochs=5) model.evaluate(x_test, y_test)
実行結果
INFO:tensorflow:Querying Tensorflow master (grpc://10.38.219.210:8470) for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 14236975761588034518)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 18323984429862649922)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 8554836709659081505)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 14275939449935473668)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 12561286578595138955)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 4788377722856588897)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 6433296346626590566)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 17630926159361266221)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 1107264524916151670)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 9476024643346962673)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 5663133602415764923)
WARNING:tensorflow:tpu_model (from tensorflow.contrib.tpu.python.tpu.keras_support) is experimental and may change or be removed at any time, and without warning.
Epoch 1/5
INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(128,), dtype=tf.int32, name='core_id_40'), TensorSpec(shape=(128, 32, 32, 3), dtype=tf.float32, name='conv2d_8_input_10'), TensorSpec(shape=(128, 1), dtype=tf.float32, name='dense_7_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False}
INFO:tensorflow:Remapping placeholder for conv2d_8_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f16729f7390> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 8.377423286437988 secs
INFO:tensorflow:Setting weights on TPU model.
INFO:tensorflow:CPU -> TPU lr: 0.0010000000474974513 {0.001}
INFO:tensorflow:CPU -> TPU beta_1: 0.8999999761581421 {0.9}
INFO:tensorflow:CPU -> TPU beta_2: 0.9990000128746033 {0.999}
INFO:tensorflow:CPU -> TPU decay: 0.0 {0.0}
WARNING:tensorflow:Cannot update non-variable config: epsilon
WARNING:tensorflow:Cannot update non-variable config: amsgrad
48128/50000 [===========================>..] - ETA: 1s - loss: 1.9748 - accuracy: 0.2997INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(106,), dtype=tf.int32, name='core_id_40'), TensorSpec(shape=(106, 32, 32, 3), dtype=tf.float32, name='conv2d_8_input_10'), TensorSpec(shape=(106, 1), dtype=tf.float32, name='dense_7_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for conv2d_8_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f16729f7390> [<tf.Variable 'tpu_139734392716536/Adam/iterations:0' shape=() dtype=int64>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f167229c898>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16722451d0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1672245780>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16721fea58>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16721c5c50>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f167218fbe0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16720fde10>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16720c4668>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16720903c8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1672059b70>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1672025d68>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671f909e8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671f5bf98>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671f213c8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671eec198>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671eb1cf8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671dfcdd8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671dc3e10>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671db2b70>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671cfc7b8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671cc5d68>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671cb2f60>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671bfaa90>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671bc66d8>]
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 8.441371440887451 secs
50000/50000 [==============================] - 43s 850us/sample - loss: 1.9587 - accuracy: 0.3050
Epoch 2/5
50000/50000 [==============================] - 7s 137us/sample - loss: 1.5943 - accuracy: 0.4338
Epoch 3/5
50000/50000 [==============================] - 7s 138us/sample - loss: 1.8239 - accuracy: 0.4031
Epoch 4/5
50000/50000 [==============================] - 7s 135us/sample - loss: 1.4532 - accuracy: 0.4980
Epoch 5/5
50000/50000 [==============================] - 7s 136us/sample - loss: 1.2632 - accuracy: 0.5597
INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(4,), dtype=tf.int32, name='core_id_50'), TensorSpec(shape=(4, 32, 32, 3), dtype=tf.float32, name='conv2d_8_input_10'), TensorSpec(shape=(4, 1), dtype=tf.float32, name='dense_7_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False}
INFO:tensorflow:Remapping placeholder for conv2d_8_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f167066ac88> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 4.571932315826416 secs
9952/10000 [============================>.] - ETA: 0s - loss: 2.0134 - accuracy: 0.3715INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(2,), dtype=tf.int32, name='core_id_50'), TensorSpec(shape=(2, 32, 32, 3), dtype=tf.float32, name='conv2d_8_input_10'), TensorSpec(shape=(2, 1), dtype=tf.float32, name='dense_7_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for conv2d_8_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f167066ac88> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 3.0498554706573486 secs
10000/10000 [==============================] - 14s 1ms/sample - loss: 2.0130 - accuracy: 0.3718
[2.0130336515426634, 0.3718]
NCHW on TPU
import tensorflow as tf import numpy as np import os (x_train, y_train),(x_test, y_test) = tf.keras.datasets.cifar10.load_data() x_train, x_test = np.transpose(x_train, [0, 3, 1, 2]), np.transpose(x_test, [0, 3, 1, 2]) x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Conv2D(input_shape=(3, 32, 32), filters=256, kernel_size=3, padding='same', activation=tf.nn.relu, data_format='channels_first'), tf.keras.layers.MaxPool2D(data_format='channels_first'), tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', activation=tf.nn.relu, data_format='channels_first'), tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dense(10) ]) def loss(y_true, y_pred): return tf.keras.backend.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True) def accuracy(y_true, y_pred): return tf.keras.metrics.sparse_categorical_accuracy(y_true, tf.nn.softmax(y_pred)) # TPU model = tf.contrib.tpu.keras_to_tpu_model( model, strategy=tf.contrib.tpu.TPUDistributionStrategy( tf.contrib.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR']) ) ) model.compile(optimizer='adam', loss=loss, metrics=[accuracy]) model.fit(x_train, y_train, batch_size=1024, epochs=5) model.evaluate(x_test, y_test)
実行結果
INFO:tensorflow:Querying Tensorflow master (grpc://10.38.219.210:8470) for TPU system metadata.
INFO:tensorflow:Found TPU system:
INFO:tensorflow:*** Num TPU Cores: 8
INFO:tensorflow:*** Num TPU Workers: 1
INFO:tensorflow:*** Num TPU Cores Per Worker: 8
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 14236975761588034518)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 18323984429862649922)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 8554836709659081505)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 14275939449935473668)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 12561286578595138955)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 4788377722856588897)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 6433296346626590566)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 17630926159361266221)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 1107264524916151670)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 9476024643346962673)
INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 5663133602415764923)
WARNING:tensorflow:tpu_model (from tensorflow.contrib.tpu.python.tpu.keras_support) is experimental and may change or be removed at any time, and without warning.
Epoch 1/5
INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(128,), dtype=tf.int32, name='core_id_70'), TensorSpec(shape=(128, 3, 32, 32), dtype=tf.float32, name='conv2d_12_input_10'), TensorSpec(shape=(128, 1), dtype=tf.float32, name='dense_11_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False}
INFO:tensorflow:Remapping placeholder for conv2d_12_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f166d50c518> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 16.735506296157837 secs
INFO:tensorflow:Setting weights on TPU model.
INFO:tensorflow:CPU -> TPU lr: 0.0010000000474974513 {0.001}
INFO:tensorflow:CPU -> TPU beta_1: 0.8999999761581421 {0.9}
INFO:tensorflow:CPU -> TPU beta_2: 0.9990000128746033 {0.999}
INFO:tensorflow:CPU -> TPU decay: 0.0 {0.0}
WARNING:tensorflow:Cannot update non-variable config: epsilon
WARNING:tensorflow:Cannot update non-variable config: amsgrad
48128/50000 [===========================>..] - ETA: 1s - loss: 2.0741 - accuracy: 0.2895INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(106,), dtype=tf.int32, name='core_id_70'), TensorSpec(shape=(106, 3, 32, 32), dtype=tf.float32, name='conv2d_12_input_10'), TensorSpec(shape=(106, 1), dtype=tf.float32, name='dense_11_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for conv2d_12_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f166d50c518> [<tf.Variable 'tpu_139734303609520/Adam/iterations:0' shape=() dtype=int64>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cdb1c50>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cd53518>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cd53ac8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cd125f8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166ccd9e80>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cc460f0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cc10278>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cbdc8d0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cba36a0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cb6b710>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cab64e0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166caa3550>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166ca6c2e8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c9b3b38>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c97d828>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c947be0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c8b5e10>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c879b00>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c849748>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c80ef98>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c780ef0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c749a20>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c70f668>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c6d8eb8>]
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 18.13627290725708 secs
50000/50000 [==============================] - 79s 2ms/sample - loss: 2.0549 - accuracy: 0.2950
Epoch 2/5
50000/50000 [==============================] - 6s 129us/sample - loss: 1.6440 - accuracy: 0.4250
Epoch 3/5
50000/50000 [==============================] - 7s 130us/sample - loss: 1.5291 - accuracy: 0.4633
Epoch 4/5
50000/50000 [==============================] - 6s 129us/sample - loss: 1.5411 - accuracy: 0.4643
Epoch 5/5
50000/50000 [==============================] - 6s 129us/sample - loss: 1.4702 - accuracy: 0.4918
INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(4,), dtype=tf.int32, name='core_id_80'), TensorSpec(shape=(4, 3, 32, 32), dtype=tf.float32, name='conv2d_12_input_10'), TensorSpec(shape=(4, 1), dtype=tf.float32, name='dense_11_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False}
INFO:tensorflow:Remapping placeholder for conv2d_12_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f166b0f7c88> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 11.619062662124634 secs
9952/10000 [============================>.] - ETA: 0s - loss: 1.9147 - accuracy: 0.3833INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(2,), dtype=tf.int32, name='core_id_80'), TensorSpec(shape=(2, 3, 32, 32), dtype=tf.float32, name='conv2d_12_input_10'), TensorSpec(shape=(2, 1), dtype=tf.float32, name='dense_11_target_10')]
INFO:tensorflow:Overriding default placeholder.
INFO:tensorflow:Remapping placeholder for conv2d_12_input
INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f166b0f7c88> []
INFO:tensorflow:Started compiling
INFO:tensorflow:Finished compiling. Time elapsed: 6.101793527603149 secs
10000/10000 [==============================] - 27s 3ms/sample - loss: 1.9154 - accuracy: 0.3828
[1.91538902759552, 0.38279998]
比較
| 1 | 2 | 3 | 4 | 5 | eval. | |
|---|---|---|---|---|---|---|
| NHWC | 43s | 7s | 7s | 7s | 7s | 14s |
| NCHW | 79s | 6s | 7s | 6s | 6s | 27s |
NCHWは1エポック目が遅いが、2エポック目からわずかに速くなっている。
inferenceはNHWCが速い。これもNCHWの初回の遅さがネックになっていると思われる。
TPUはNCHWは初回の時間がかかるが、学習データが多い場合は、最終的には速くなりそうである。
追試
GPUでも比較してみた。
| 1 | 2 | 3 | 4 | 5 | eval. | |
|---|---|---|---|---|---|---|
| NHWC | 24s | 23s | 23s | 23s | 23s | 3s |
| NCHW | 22s | 22s | 23s | 22s | 22s | 3s |
GPUでもNCHWがわずかに速い。
追試2
ローカルのPCの1080Tiでも測ってみた。
| 1 | 2 | 3 | 4 | 5 | eval. | |
|---|---|---|---|---|---|---|
| NHWC | 10s | 8s | 8s | 8s | 8s | 1s |
| NCHW | 9s | 8s | 7s | 7s | 7s | 1s |
やはりNCHWがわずかに速い。