畳み込みの入力データの形式には、NHWCとNCHW があるが、どちらがTPUに最適か実験してみた。
TensorFlowのデフォルトはNHWCで、ChainerのデフォルトはNCHWになっている。
cuDNNはNCHWに最適化されている。
https://www.tensorflow.org/guide/performance/overview
しかし、TensorCoreは、NHWCに最適化されている。
Volta Tensor コア GPU が AI パフォーマンスの新記録を達成 | NVIDIA
なお、TensorFlowのCPUモードでは、MaxpoolingはNHWCしかサポートしていない。
TPUではどちらに最適化されているか調べてもわからなかった。
そこで、カラー画像のデータセットであるCIFAR-10を使って、2層の畳み込み層のあるネットワークを学習させて、実際に測ってみた。
NHWC on TPU
import tensorflow as tf import os (x_train, y_train),(x_test, y_test) = tf.keras.datasets.cifar10.load_data() x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Conv2D(input_shape=(32, 32, 3), filters=256, kernel_size=3, padding='same', activation=tf.nn.relu), tf.keras.layers.MaxPool2D(), tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', activation=tf.nn.relu), tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dense(10) ]) def loss(y_true, y_pred): return tf.keras.backend.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True) def accuracy(y_true, y_pred): return tf.keras.metrics.sparse_categorical_accuracy(y_true, tf.nn.softmax(y_pred)) # TPU model = tf.contrib.tpu.keras_to_tpu_model( model, strategy=tf.contrib.tpu.TPUDistributionStrategy( tf.contrib.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR']) ) ) model.compile(optimizer='adam', loss=loss, metrics=[accuracy]) model.fit(x_train, y_train, batch_size=1024, epochs=5) model.evaluate(x_test, y_test)
実行結果
INFO:tensorflow:Querying Tensorflow master (grpc://10.38.219.210:8470) for TPU system metadata. INFO:tensorflow:Found TPU system: INFO:tensorflow:*** Num TPU Cores: 8 INFO:tensorflow:*** Num TPU Workers: 1 INFO:tensorflow:*** Num TPU Cores Per Worker: 8 INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 14236975761588034518) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 18323984429862649922) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 8554836709659081505) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 14275939449935473668) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 12561286578595138955) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 4788377722856588897) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 6433296346626590566) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 17630926159361266221) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 1107264524916151670) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 9476024643346962673) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 5663133602415764923) WARNING:tensorflow:tpu_model (from tensorflow.contrib.tpu.python.tpu.keras_support) is experimental and may change or be removed at any time, and without warning. Epoch 1/5 INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(128,), dtype=tf.int32, name='core_id_40'), TensorSpec(shape=(128, 32, 32, 3), dtype=tf.float32, name='conv2d_8_input_10'), TensorSpec(shape=(128, 1), dtype=tf.float32, name='dense_7_target_10')] INFO:tensorflow:Overriding default placeholder. INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False} INFO:tensorflow:Remapping placeholder for conv2d_8_input INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f16729f7390> [] INFO:tensorflow:Started compiling INFO:tensorflow:Finished compiling. Time elapsed: 8.377423286437988 secs INFO:tensorflow:Setting weights on TPU model. INFO:tensorflow:CPU -> TPU lr: 0.0010000000474974513 {0.001} INFO:tensorflow:CPU -> TPU beta_1: 0.8999999761581421 {0.9} INFO:tensorflow:CPU -> TPU beta_2: 0.9990000128746033 {0.999} INFO:tensorflow:CPU -> TPU decay: 0.0 {0.0} WARNING:tensorflow:Cannot update non-variable config: epsilon WARNING:tensorflow:Cannot update non-variable config: amsgrad 48128/50000 [===========================>..] - ETA: 1s - loss: 1.9748 - accuracy: 0.2997INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(106,), dtype=tf.int32, name='core_id_40'), TensorSpec(shape=(106, 32, 32, 3), dtype=tf.float32, name='conv2d_8_input_10'), TensorSpec(shape=(106, 1), dtype=tf.float32, name='dense_7_target_10')] INFO:tensorflow:Overriding default placeholder. INFO:tensorflow:Remapping placeholder for conv2d_8_input INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f16729f7390> [<tf.Variable 'tpu_139734392716536/Adam/iterations:0' shape=() dtype=int64>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f167229c898>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16722451d0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1672245780>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16721fea58>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16721c5c50>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f167218fbe0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16720fde10>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16720c4668>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f16720903c8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1672059b70>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1672025d68>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671f909e8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671f5bf98>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671f213c8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671eec198>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671eb1cf8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671dfcdd8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671dc3e10>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671db2b70>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671cfc7b8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671cc5d68>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671cb2f60>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671bfaa90>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f1671bc66d8>] INFO:tensorflow:Started compiling INFO:tensorflow:Finished compiling. Time elapsed: 8.441371440887451 secs 50000/50000 [==============================] - 43s 850us/sample - loss: 1.9587 - accuracy: 0.3050 Epoch 2/5 50000/50000 [==============================] - 7s 137us/sample - loss: 1.5943 - accuracy: 0.4338 Epoch 3/5 50000/50000 [==============================] - 7s 138us/sample - loss: 1.8239 - accuracy: 0.4031 Epoch 4/5 50000/50000 [==============================] - 7s 135us/sample - loss: 1.4532 - accuracy: 0.4980 Epoch 5/5 50000/50000 [==============================] - 7s 136us/sample - loss: 1.2632 - accuracy: 0.5597 INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(4,), dtype=tf.int32, name='core_id_50'), TensorSpec(shape=(4, 32, 32, 3), dtype=tf.float32, name='conv2d_8_input_10'), TensorSpec(shape=(4, 1), dtype=tf.float32, name='dense_7_target_10')] INFO:tensorflow:Overriding default placeholder. INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False} INFO:tensorflow:Remapping placeholder for conv2d_8_input INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f167066ac88> [] INFO:tensorflow:Started compiling INFO:tensorflow:Finished compiling. Time elapsed: 4.571932315826416 secs 9952/10000 [============================>.] - ETA: 0s - loss: 2.0134 - accuracy: 0.3715INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(2,), dtype=tf.int32, name='core_id_50'), TensorSpec(shape=(2, 32, 32, 3), dtype=tf.float32, name='conv2d_8_input_10'), TensorSpec(shape=(2, 1), dtype=tf.float32, name='dense_7_target_10')] INFO:tensorflow:Overriding default placeholder. INFO:tensorflow:Remapping placeholder for conv2d_8_input INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f167066ac88> [] INFO:tensorflow:Started compiling INFO:tensorflow:Finished compiling. Time elapsed: 3.0498554706573486 secs 10000/10000 [==============================] - 14s 1ms/sample - loss: 2.0130 - accuracy: 0.3718 [2.0130336515426634, 0.3718]
NCHW on TPU
import tensorflow as tf import numpy as np import os (x_train, y_train),(x_test, y_test) = tf.keras.datasets.cifar10.load_data() x_train, x_test = np.transpose(x_train, [0, 3, 1, 2]), np.transpose(x_test, [0, 3, 1, 2]) x_train, x_test = x_train / 255.0, x_test / 255.0 model = tf.keras.models.Sequential([ tf.keras.layers.Conv2D(input_shape=(3, 32, 32), filters=256, kernel_size=3, padding='same', activation=tf.nn.relu, data_format='channels_first'), tf.keras.layers.MaxPool2D(data_format='channels_first'), tf.keras.layers.Conv2D(filters=256, kernel_size=3, padding='same', activation=tf.nn.relu, data_format='channels_first'), tf.keras.layers.Flatten(), tf.keras.layers.Dense(512, activation=tf.nn.relu), tf.keras.layers.Dense(10) ]) def loss(y_true, y_pred): return tf.keras.backend.sparse_categorical_crossentropy(y_true, y_pred, from_logits=True) def accuracy(y_true, y_pred): return tf.keras.metrics.sparse_categorical_accuracy(y_true, tf.nn.softmax(y_pred)) # TPU model = tf.contrib.tpu.keras_to_tpu_model( model, strategy=tf.contrib.tpu.TPUDistributionStrategy( tf.contrib.cluster_resolver.TPUClusterResolver(tpu='grpc://' + os.environ['COLAB_TPU_ADDR']) ) ) model.compile(optimizer='adam', loss=loss, metrics=[accuracy]) model.fit(x_train, y_train, batch_size=1024, epochs=5) model.evaluate(x_test, y_test)
実行結果
INFO:tensorflow:Querying Tensorflow master (grpc://10.38.219.210:8470) for TPU system metadata. INFO:tensorflow:Found TPU system: INFO:tensorflow:*** Num TPU Cores: 8 INFO:tensorflow:*** Num TPU Workers: 1 INFO:tensorflow:*** Num TPU Cores Per Worker: 8 INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:CPU:0, CPU, -1, 14236975761588034518) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:XLA_CPU:0, XLA_CPU, 17179869184, 18323984429862649922) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:0, TPU, 17179869184, 8554836709659081505) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:1, TPU, 17179869184, 14275939449935473668) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:2, TPU, 17179869184, 12561286578595138955) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:3, TPU, 17179869184, 4788377722856588897) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:4, TPU, 17179869184, 6433296346626590566) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:5, TPU, 17179869184, 17630926159361266221) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:6, TPU, 17179869184, 1107264524916151670) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU:7, TPU, 17179869184, 9476024643346962673) INFO:tensorflow:*** Available Device: _DeviceAttributes(/job:worker/replica:0/task:0/device:TPU_SYSTEM:0, TPU_SYSTEM, 17179869184, 5663133602415764923) WARNING:tensorflow:tpu_model (from tensorflow.contrib.tpu.python.tpu.keras_support) is experimental and may change or be removed at any time, and without warning. Epoch 1/5 INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(128,), dtype=tf.int32, name='core_id_70'), TensorSpec(shape=(128, 3, 32, 32), dtype=tf.float32, name='conv2d_12_input_10'), TensorSpec(shape=(128, 1), dtype=tf.float32, name='dense_11_target_10')] INFO:tensorflow:Overriding default placeholder. INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False} INFO:tensorflow:Remapping placeholder for conv2d_12_input INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f166d50c518> [] INFO:tensorflow:Started compiling INFO:tensorflow:Finished compiling. Time elapsed: 16.735506296157837 secs INFO:tensorflow:Setting weights on TPU model. INFO:tensorflow:CPU -> TPU lr: 0.0010000000474974513 {0.001} INFO:tensorflow:CPU -> TPU beta_1: 0.8999999761581421 {0.9} INFO:tensorflow:CPU -> TPU beta_2: 0.9990000128746033 {0.999} INFO:tensorflow:CPU -> TPU decay: 0.0 {0.0} WARNING:tensorflow:Cannot update non-variable config: epsilon WARNING:tensorflow:Cannot update non-variable config: amsgrad 48128/50000 [===========================>..] - ETA: 1s - loss: 2.0741 - accuracy: 0.2895INFO:tensorflow:New input shapes; (re-)compiling: mode=train (# of cores 8), [TensorSpec(shape=(106,), dtype=tf.int32, name='core_id_70'), TensorSpec(shape=(106, 3, 32, 32), dtype=tf.float32, name='conv2d_12_input_10'), TensorSpec(shape=(106, 1), dtype=tf.float32, name='dense_11_target_10')] INFO:tensorflow:Overriding default placeholder. INFO:tensorflow:Remapping placeholder for conv2d_12_input INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f166d50c518> [<tf.Variable 'tpu_139734303609520/Adam/iterations:0' shape=() dtype=int64>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cdb1c50>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cd53518>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cd53ac8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cd125f8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166ccd9e80>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cc460f0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cc10278>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cbdc8d0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cba36a0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cb6b710>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166cab64e0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166caa3550>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166ca6c2e8>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c9b3b38>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c97d828>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c947be0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c8b5e10>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c879b00>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c849748>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c80ef98>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c780ef0>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c749a20>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c70f668>, <tensorflow.contrib.tpu.python.tpu.keras_tpu_variables.ReplicatedVariable object at 0x7f166c6d8eb8>] INFO:tensorflow:Started compiling INFO:tensorflow:Finished compiling. Time elapsed: 18.13627290725708 secs 50000/50000 [==============================] - 79s 2ms/sample - loss: 2.0549 - accuracy: 0.2950 Epoch 2/5 50000/50000 [==============================] - 6s 129us/sample - loss: 1.6440 - accuracy: 0.4250 Epoch 3/5 50000/50000 [==============================] - 7s 130us/sample - loss: 1.5291 - accuracy: 0.4633 Epoch 4/5 50000/50000 [==============================] - 6s 129us/sample - loss: 1.5411 - accuracy: 0.4643 Epoch 5/5 50000/50000 [==============================] - 6s 129us/sample - loss: 1.4702 - accuracy: 0.4918 INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(4,), dtype=tf.int32, name='core_id_80'), TensorSpec(shape=(4, 3, 32, 32), dtype=tf.float32, name='conv2d_12_input_10'), TensorSpec(shape=(4, 1), dtype=tf.float32, name='dense_11_target_10')] INFO:tensorflow:Overriding default placeholder. INFO:tensorflow:Cloning Adam {'lr': 0.0010000000474974513, 'beta_1': 0.8999999761581421, 'beta_2': 0.9990000128746033, 'decay': 0.0, 'epsilon': 1e-07, 'amsgrad': False} INFO:tensorflow:Remapping placeholder for conv2d_12_input INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f166b0f7c88> [] INFO:tensorflow:Started compiling INFO:tensorflow:Finished compiling. Time elapsed: 11.619062662124634 secs 9952/10000 [============================>.] - ETA: 0s - loss: 1.9147 - accuracy: 0.3833INFO:tensorflow:New input shapes; (re-)compiling: mode=eval (# of cores 8), [TensorSpec(shape=(2,), dtype=tf.int32, name='core_id_80'), TensorSpec(shape=(2, 3, 32, 32), dtype=tf.float32, name='conv2d_12_input_10'), TensorSpec(shape=(2, 1), dtype=tf.float32, name='dense_11_target_10')] INFO:tensorflow:Overriding default placeholder. INFO:tensorflow:Remapping placeholder for conv2d_12_input INFO:tensorflow:KerasCrossShard: <tensorflow.python.keras.optimizers.Adam object at 0x7f166b0f7c88> [] INFO:tensorflow:Started compiling INFO:tensorflow:Finished compiling. Time elapsed: 6.101793527603149 secs 10000/10000 [==============================] - 27s 3ms/sample - loss: 1.9154 - accuracy: 0.3828 [1.91538902759552, 0.38279998]
比較
1 | 2 | 3 | 4 | 5 | eval. | |
---|---|---|---|---|---|---|
NHWC | 43s | 7s | 7s | 7s | 7s | 14s |
NCHW | 79s | 6s | 7s | 6s | 6s | 27s |
NCHWは1エポック目が遅いが、2エポック目からわずかに速くなっている。
inferenceはNHWCが速い。これもNCHWの初回の遅さがネックになっていると思われる。
TPUはNCHWは初回の時間がかかるが、学習データが多い場合は、最終的には速くなりそうである。
追試
GPUでも比較してみた。
1 | 2 | 3 | 4 | 5 | eval. | |
---|---|---|---|---|---|---|
NHWC | 24s | 23s | 23s | 23s | 23s | 3s |
NCHW | 22s | 22s | 23s | 22s | 22s | 3s |
GPUでもNCHWがわずかに速い。
追試2
ローカルのPCの1080Tiでも測ってみた。
1 | 2 | 3 | 4 | 5 | eval. | |
---|---|---|---|---|---|---|
NHWC | 10s | 8s | 8s | 8s | 8s | 1s |
NCHW | 9s | 8s | 7s | 7s | 7s | 1s |
やはりNCHWがわずかに速い。