初期値のモデルから自己対局と学習を繰り返すと、入玉宣言の棋譜が増え過ぎる問題が起きたので、最大手数で終局した対局を除外することを検討する。
最大手数の対局を除外することで、相対的に詰ます棋譜が増えるので、詰みを覚えることを期待する。
結果
3サイクル学習した結果は以下の通り。
入玉宣言の割合が増える傾向は変わらなかった。
3サイクル目で41.2%になったため打ち切った。
(base) PS D:\src\gumbel-dlshogi> d:; cd 'd:\src\gumbel-dlshogi'; & 'anaconda3\python.exe' '.vscode\extensions\ms-python.debugpy-2025.10.0-win32-x64\bundled\libs\debugpy\launcher' '61166' '--' '-m' 'gumbel_dlshogi.selfplay' 'initial_model.pt' '--batch_size' '64' '--num_simulations' '16' '--num_positions' '1000000' '--amp' '--num_processes' '16' '--skip_max_moves' 1000259pos [50:51, 327.84pos/s, Games=3409, AverageMoves=293, Nyugyoku=34, Draw=208, MaxMoves=1856] All processes terminated. bugpy-2025.10.0-win32-x64\bundled\libs\debugpy\launcher' '65266' '--' '-m' 'gumbel_dlshogi.train' 'training_data' '--test_file' 'F:\hcpe3\floodgate.hcpe' '--initial_model' 'initial_model.pth' '--num_files' '1' '-e' '1' '--amp' '-b' '256' '--num_workers' '4' '--save_torchscript' 'model.pt' Using device: cuda Initial model loaded from initial_model.pth Using 1 files for training: - training_data\20250718_223536_107.data Epoch 1/1 Training: 100%|████████████████████████████████████████████████████| 3907/3907 [01:42<00:00, 38.05batch/s, Loss=4.2612, Policy=3.7455, Value=0.5157] Train - Loss: 4.5323, Policy Loss: 3.9408, Value Loss: 0.5914, Time: 102.69s Evaluating: 100%|████████████████████████████████████████████| 837/837 [00:28<00:00, 29.64batch/s, Loss=5.5059, Policy Acc=0.0335, Value Acc=0.5060] Eval - Loss: 5.2054, Policy Loss: 4.4424, Value Loss: 0.7630, Policy Acc: 0.0335, Value Acc: 0.5060, Time: 28.24s Checkpoint saved to checkpoints\checkpoint_epoch_001.pth TorchScript model saved to model.pt (base) PS D:\src\gumbel-dlshogi> d:; cd 'd:\src\gumbel-dlshogi'; & 'anaconda3\python.exe' '.vscode\extensions\ms-python.debugpy-2025.10.0-win32-x64\bundled\libs\debugpy\launcher' '65468' '--' '-m' 'gumbel_dlshogi.selfplay' 'model.pt' '--batch_size' '64' '--num_simulations' '16' '--num_positions' '1000000' '--amp' '--num_processes' '16' '--skip_max_moves' 1000173pos [40:23, 412.77pos/s, Games=3356, AverageMoves=298, Nyugyoku=1029, Draw=389, MaxMoves=897] All processes terminated. (base) PS D:\src\gumbel-dlshogi> d:; cd 'd:\src\gumbel-dlshogi'; & 'anaconda3\python.exe' '.vscode\extensions\ms-python.debugpy-2025.10.0-win32-x64\bundled\libs\debugpy\launcher' '52133' '--' '-m' 'gumbel_dlshogi.train' 'training_data' '--test_file' 'F:\hcpe3\floodgate.hcpe' '--num_files' '1' '-e' '1' '--amp' '-b' '256' '--num_workers' '4' '--resume' 'checkpoints\checkpoint_epoch_001.pth' '--save_torchscript' 'model.pt' Using device: cuda Using 1 files for training: - training_data\20250718_234247_671.data Checkpoint loaded from checkpoints\checkpoint_epoch_001.pth, epoch 0, loss 4.5323 Epoch 2/2 Training: 100%|████████████████████████████████████████████████████| 3906/3906 [01:55<00:00, 33.81batch/s, Loss=4.3049, Policy=3.8389, Value=0.4660] Train - Loss: 4.3065, Policy Loss: 3.8116, Value Loss: 0.4949, Time: 115.52s Evaluating: 100%|████████████████████████████████████████████| 837/837 [00:28<00:00, 29.56batch/s, Loss=5.4782, Policy Acc=0.0558, Value Acc=0.5119] Eval - Loss: 5.2772, Policy Loss: 4.3865, Value Loss: 0.8907, Policy Acc: 0.0558, Value Acc: 0.5119, Time: 28.32s Checkpoint saved to checkpoints\checkpoint_epoch_002.pth TorchScript model saved to model.pt (base) PS D:\src\gumbel-dlshogi> d:; cd 'd:\src\gumbel-dlshogi'; & 'anaconda3\python.exe' '.vscode\extensions\ms-python.debugpy-2025.10.0-win32-x64\bundled\libs\debugpy\launcher' '52659' '--' '-m' 'gumbel_dlshogi.selfplay' 'model.pt' '--batch_size' '64' '--num_simulations' '16' '--num_positions' '1000000' '--amp' '--num_processes' '16' '--skip_max_moves' 1000008pos [40:35, 410.57pos/s, Games=3143, AverageMoves=318, Nyugyoku=1298, Draw=436, MaxMoves=951] All processes terminated. (base) PS D:\src\gumbel-dlshogi> d:; cd 'd:\src\gumbel-dlshogi'; & 'anaconda3\python.exe' '.vscode\extensions\ms-python.debugpy-2025.10.0-win32-x64\bundled\libs\debugpy\launcher' '58965' '--' '-m' 'gumbel_dlshogi.train' 'training_data' '--test_file' 'F:\hcpe3\floodgate.hcpe' '--num_files' '1' '-e' '1' '--amp' '-b' '256' '--num_workers' '4' '--resume' 'checkpoints\checkpoint_epoch_002.pth' '--save_torchscript' 'model.pt' Using device: cuda Using 1 files for training: - training_data\20250719_003226_422.data Checkpoint loaded from checkpoints\checkpoint_epoch_002.pth, epoch 1, loss 4.3065 Epoch 3/3 Training: 100%|████████████████████████████████████████████████████| 3906/3906 [01:54<00:00, 33.99batch/s, Loss=4.0271, Policy=3.6966, Value=0.3305] Train - Loss: 4.1502, Policy Loss: 3.7170, Value Loss: 0.4332, Time: 114.94s Evaluating: 100%|████████████████████████████████████████████| 837/837 [00:28<00:00, 29.03batch/s, Loss=5.5846, Policy Acc=0.0752, Value Acc=0.5130] Eval - Loss: 5.3237, Policy Loss: 4.3428, Value Loss: 0.9809, Policy Acc: 0.0752, Value Acc: 0.5130, Time: 28.84s Checkpoint saved to checkpoints\checkpoint_epoch_003.pth TorchScript model saved to model.pt