前回までは手動で行っていた、自己対局と訓練のサイクルを自動化することを検討する。
処理方式
WindowsでもLinuxでも実行できるようにOSのシェルは使わずに、Pythonで実装する。
自己対局と、訓練処理のメイン処理は関数にしているので、自動化処理からPython関数として呼び出すことができる。
自己対局で生成した訓練データを訓練処理の入力として、訓練処理で出力したTorchScriptを自己対局で使用するようにする。
自己対局と訓練処理のコマンドラインパラメータは、自動化で一括で管理するようにする。
訓練の経過を確認できるように、訓練処理でTensorBordにメトリックスを保存するようにする。
実装
自己対局と訓練処理を呼び出しを繰り返すだけなので、特に難しい処理はない。
import argparse from pathlib import Path from gumbel_dlshogi.selfplay import selfplay_multiprocess from gumbel_dlshogi.train import train def main(): parser = argparse.ArgumentParser( description="Automate self-play and training cycles." ) # Cycle arguments parser.add_argument( "--cycles", type=int, default=10, help="Number of cycles to run." ) parser.add_argument( "--start_cycle", type=int, default=1, help="The cycle number to start from.", ) parser.add_argument( "--initial_model", type=str, help="Path to the initial model (.pt file).", ) parser.add_argument( "--initial_model_state", type=str, help="Path to the initial model state (.pth file).", ) parser.add_argument( "--workspace", type=str, default="workspace", help="Directory to store models and data.", ) # Self-play arguments parser.add_argument("--selfplay_batch_size", type=int, default=64) parser.add_argument("--max_num_considered_actions", type=int, default=16) parser.add_argument("--num_simulations", type=int, default=32) parser.add_argument("--num_positions", type=int, default=1000000) parser.add_argument("--selfplay_amp", action="store_true") parser.add_argument("--num_processes", type=int, default=4) parser.add_argument("--skip_max_moves", action="store_true") # Training arguments parser.add_argument("--train_epochs", type=int, default=1) parser.add_argument("--train_batch_size", type=int, default=256) parser.add_argument("--test_file") parser.add_argument("--num_files", type=int, default=1) parser.add_argument("--eval_batch_size", type=int, default=1024) parser.add_argument("--lr", type=float, default=0.001) parser.add_argument("--weight_decay", type=float, default=1e-4) parser.add_argument("--train_num_workers", type=int, default=4) parser.add_argument("--train_amp", action="store_true") parser.add_argument("--save_interval", type=int, default=1) parser.add_argument("--eval_interval", type=int, default=1) parser.add_argument("--blocks", type=int, default=10) parser.add_argument("--channels", type=int, default=192) parser.add_argument("--fcl", type=int, default=256) args = parser.parse_args() workspace = Path(args.workspace) models_dir = workspace / "models" data_dir = workspace / "data" log_dir = workspace / "logs" models_dir.mkdir(parents=True, exist_ok=True) data_dir.mkdir(parents=True, exist_ok=True) # Determine the starting model path if args.start_cycle == 1: if not args.initial_model: parser.error("--initial_model is required when starting from cycle 1.") latest_model_path = args.initial_model else: # Resuming from a later cycle prev_cycle_num = args.start_cycle - 1 resume_model_path = models_dir / f"model_{prev_cycle_num:08d}.pt" if not resume_model_path.exists(): parser.error( f"Model for cycle {prev_cycle_num} not found at {resume_model_path}. " "Cannot resume." ) latest_model_path = str(resume_model_path) for cycle in range(args.start_cycle - 1, args.cycles): print(f"--- Starting Cycle {cycle + 1}/{args.cycles} ---") # --- Self-play Phase --- selfplay_multiprocess( model_path=latest_model_path, batch_size=args.selfplay_batch_size, max_num_considered_actions=args.max_num_considered_actions, num_simulations=args.num_simulations, output_dir=str(data_dir), num_positions=args.num_positions, amp=args.selfplay_amp, num_processes=args.num_processes, skip_max_moves=args.skip_max_moves, ) # --- Training Phase --- print("--- Training Phase ---") checkpoint_dir = workspace / f"checkpoints" checkpoint_dir.mkdir(exist_ok=True) next_model_path = models_dir / f"model_{cycle + 1:08d}.pt" # Determine which model/checkpoint to use for training initial_model_state = None resume_path = None if cycle == 0: # For the very first cycle, use the initial model provided initial_model_state = args.initial_model_state else: # For subsequent cycles, resume from the last checkpoint of the previous cycle checkpoints = sorted( checkpoint_dir.glob("*.pth"), key=lambda p: int(p.name.split("_")[-1].split(".")[0]), ) resume_path = str(checkpoints[-1]) train( train_dir=str(data_dir), test_file=args.test_file, num_files=args.num_files, blocks=args.blocks, channels=args.channels, fcl=args.fcl, initial_model=initial_model_state, epochs=args.train_epochs, batch_size=args.train_batch_size, eval_batch_size=args.eval_batch_size, lr=args.lr, weight_decay=args.weight_decay, num_workers=args.train_num_workers, amp=args.train_amp, checkpoint_dir=str(checkpoint_dir), log_dir=str(log_dir), resume=resume_path, save_interval=args.save_interval, eval_interval=args.eval_interval, save_torchscript=str(next_model_path), ) latest_model_path = str(next_model_path) print("--- All cycles completed ---") if __name__ == "__main__": main()
結果
前回同様10サイクル実行した結果は以下の通り。
--- Starting Cycle 1/10 --- 1000061pos [25:46, 646.64pos/s, Games=6715, AverageMoves=149, Nyugyoku=2, Draw=145, MaxMoves=3] All processes terminated. --- Training Phase --- Using device: cuda Initial model loaded from initial_model.pth Using 1 files for training: - workspace\data\20250720_131815_135.data Epoch 1/1 Training: 100%|█████████████████████████████████████████████████████████| 3906/3906 [01:44<00:00, 37.30batch/s, Loss=4.3057, Policy=3.6852, Value=0.6205] Train - Loss: 4.4387, Policy Loss: 3.7777, Value Loss: 0.6609, Time: 104.72s, Total Steps: 3906 Evaluating: 100%|█████████████████████████████████████████████████| 837/837 [00:29<00:00, 28.61batch/s, Loss=5.5570, Policy Acc=0.0326, Value Acc=0.5144] Eval - Loss: 5.2559, Policy Loss: 4.5317, Value Loss: 0.7242, Policy Acc: 0.0326, Value Acc: 0.5144, Time: 29.26s Checkpoint saved to workspace\models\checkpoints\checkpoint_epoch_001.pth TorchScript model saved to workspace\models\model_00000001.pt --- Starting Cycle 2/10 --- 1000028pos [25:25, 655.44pos/s, Games=8879, AverageMoves=113, Nyugyoku=2, Draw=51, MaxMoves=1] All processes terminated. --- Training Phase --- Using device: cuda Using 1 files for training: - workspace\data\20250720_134629_436.data Checkpoint loaded from workspace\models\checkpoints\checkpoint_epoch_001.pth, epoch 0, steps 3906, loss 4.4387 Epoch 2/2 Training: 100%|█████████████████████████████████████████████████████████| 3906/3906 [01:54<00:00, 34.10batch/s, Loss=4.1691, Policy=3.5933, Value=0.5758] Train - Loss: 4.2536, Policy Loss: 3.6297, Value Loss: 0.6239, Time: 114.53s, Total Steps: 7812 Evaluating: 100%|█████████████████████████████████████████████████| 837/837 [00:29<00:00, 28.79batch/s, Loss=5.5701, Policy Acc=0.0740, Value Acc=0.5212] Eval - Loss: 5.2208, Policy Loss: 4.3840, Value Loss: 0.8368, Policy Acc: 0.0740, Value Acc: 0.5212, Time: 29.07s Checkpoint saved to workspace\models\checkpoints\checkpoint_epoch_002.pth TorchScript model saved to workspace\models\model_00000002.pt --- Starting Cycle 3/10 --- 1000084pos [25:24, 656.09pos/s, Games=9254, AverageMoves=108, Nyugyoku=7, Draw=30, MaxMoves=7] All processes terminated. --- Training Phase --- Using device: cuda Using 1 files for training: - workspace\data\20250720_141430_176.data Checkpoint loaded from workspace\models\checkpoints\checkpoint_epoch_002.pth, epoch 1, steps 7812, loss 4.2536 Epoch 3/3 /3906 [01:53<00:00, 34.30batch/s, Loss=3.8404, Policy=3.3009, Value=0.5395] Train - Loss: 4.0003, Policy Loss: 3.3951, Value Loss: 0.6052, Time: 113.87s, Total Steps: 11718 Evaluating: 100%|█████████████████████████████████████████████████| 837/837 [00:29<00:00, 28.70batch/s, Loss=5.5844, Policy Acc=0.0852, Value Acc=0.5039] Eval - Loss: 5.1932, Policy Loss: 4.3543, Value Loss: 0.8388, Policy Acc: 0.0852, Value Acc: 0.5039, Time: 29.17s Checkpoint saved to workspace\models\checkpoints\checkpoint_epoch_003.pth TorchScript model saved to workspace\models\model_00000003.pt --- Starting Cycle 4/10 --- 1000076pos [25:16, 659.34pos/s, Games=10584, AverageMoves=94.5, Nyugyoku=5, Draw=13, MaxMoves=3] All processes terminated. --- Training Phase --- Using device: cuda Using 1 files for training: - workspace\data\20250720_144228_769.data Checkpoint loaded from workspace\models\checkpoints\checkpoint_epoch_003.pth, epoch 2, steps 11718, loss 4.0003 Epoch 4/4 Training: 100%|█████████████████████████████████████████████████████████| 3906/3906 [01:55<00:00, 33.94batch/s, Loss=3.5326, Policy=3.0157, Value=0.5169] Train - Loss: 3.6435, Policy Loss: 3.0485, Value Loss: 0.5950, Time: 115.08s, Total Steps: 15624 Evaluating: 100%|█████████████████████████████████████████████████| 837/837 [00:29<00:00, 28.59batch/s, Loss=5.5962, Policy Acc=0.0916, Value Acc=0.5026] Eval - Loss: 5.2728, Policy Loss: 4.3757, Value Loss: 0.8972, Policy Acc: 0.0916, Value Acc: 0.5026, Time: 29.28s Checkpoint saved to workspace\models\checkpoints\checkpoint_epoch_004.pth TorchScript model saved to workspace\models\model_00000004.pt --- Starting Cycle 5/10 --- 1000078pos [25:13, 660.74pos/s, Games=11129, AverageMoves=89.9, Nyugyoku=5, Draw=23, MaxMoves=3] All processes terminated. --- Training Phase --- Using device: cuda Using 1 files for training: - workspace\data\20250720_151021_312.data Checkpoint loaded from workspace\models\checkpoints\checkpoint_epoch_004.pth, epoch 3, steps 15624, loss 3.6435 Epoch 5/5 Training: 100%|█████████████████████████████████████████████████████████| 3906/3906 [01:53<00:00, 34.34batch/s, Loss=3.3969, Policy=2.9011, Value=0.4958] Train - Loss: 3.3703, Policy Loss: 2.7783, Value Loss: 0.5920, Time: 113.75s, Total Steps: 19530 Evaluating: 100%|█████████████████████████████████████████████████| 837/837 [00:28<00:00, 28.95batch/s, Loss=5.6582, Policy Acc=0.1040, Value Acc=0.5156] Eval - Loss: 5.3833, Policy Loss: 4.4369, Value Loss: 0.9463, Policy Acc: 0.1040, Value Acc: 0.5156, Time: 28.92s Checkpoint saved to workspace\models\checkpoints\checkpoint_epoch_005.pth TorchScript model saved to workspace\models\model_00000005.pt --- Starting Cycle 6/10 --- 1000427pos [25:16, 659.58pos/s, Games=11903, AverageMoves=84, Nyugyoku=19, Draw=38, MaxMoves=9] All processes terminated. --- Training Phase --- Using device: cuda Using 1 files for training: - workspace\data\20250720_153808_966.data Checkpoint loaded from workspace\models\checkpoints\checkpoint_epoch_005.pth, epoch 4, steps 19530, loss 3.3703 Epoch 6/6 Training: 100%|█████████████████████████████████████████████████████████| 3907/3907 [01:53<00:00, 34.29batch/s, Loss=3.1205, Policy=2.5762, Value=0.5443] Train - Loss: 3.1426, Policy Loss: 2.5558, Value Loss: 0.5868, Time: 113.94s, Total Steps: 23437 Evaluating: 100%|█████████████████████████████████████████████████| 837/837 [00:29<00:00, 28.67batch/s, Loss=5.7500, Policy Acc=0.1079, Value Acc=0.5103] Eval - Loss: 5.5193, Policy Loss: 4.5732, Value Loss: 0.9462, Policy Acc: 0.1079, Value Acc: 0.5103, Time: 29.20s Checkpoint saved to workspace\models\checkpoints\checkpoint_epoch_006.pth TorchScript model saved to workspace\models\model_00000006.pt --- Starting Cycle 7/10 --- 1000041pos [24:55, 668.68pos/s, Games=12720, AverageMoves=78.6, Nyugyoku=10, Draw=22, MaxMoves=0] All processes terminated. --- Training Phase --- Using device: cuda Using 1 files for training: - workspace\data\20250720_160600_356.data Checkpoint loaded from workspace\models\checkpoints\checkpoint_epoch_006.pth, epoch 5, steps 23437, loss 3.1426 Epoch 7/7 Training: 100%|█████████████████████████████████████████████████████████| 3906/3906 [01:53<00:00, 34.27batch/s, Loss=2.7871, Policy=2.2842, Value=0.5029] Train - Loss: 2.8651, Policy Loss: 2.2847, Value Loss: 0.5804, Time: 113.97s, Total Steps: 27343 Evaluating: 100%|█████████████████████████████████████████████████| 837/837 [00:28<00:00, 28.89batch/s, Loss=5.9259, Policy Acc=0.1102, Value Acc=0.5147] Eval - Loss: 5.7643, Policy Loss: 4.6706, Value Loss: 1.0936, Policy Acc: 0.1102, Value Acc: 0.5147, Time: 28.97s Checkpoint saved to workspace\models\checkpoints\checkpoint_epoch_007.pth TorchScript model saved to workspace\models\model_00000007.pt --- Starting Cycle 8/10 --- 1000088pos [24:59, 667.11pos/s, Games=13030, AverageMoves=76.8, Nyugyoku=17, Draw=29, MaxMoves=1] All processes terminated. --- Training Phase --- Using device: cuda Using 1 files for training: - workspace\data\20250720_163330_284.data Checkpoint loaded from workspace\models\checkpoints\checkpoint_epoch_007.pth, epoch 6, steps 27343, loss 2.8651 Epoch 8/8 Training: 100%|█████████████████████████████████████████████████████████| 3906/3906 [01:53<00:00, 34.43batch/s, Loss=2.6677, Policy=2.1300, Value=0.5377] Train - Loss: 2.6749, Policy Loss: 2.0930, Value Loss: 0.5819, Time: 113.44s, Total Steps: 31249 Evaluating: 100%|█████████████████████████████████████████████████| 837/837 [00:29<00:00, 28.82batch/s, Loss=6.1204, Policy Acc=0.1182, Value Acc=0.5190] Eval - Loss: 5.7907, Policy Loss: 4.7554, Value Loss: 1.0353, Policy Acc: 0.1182, Value Acc: 0.5190, Time: 29.04s Checkpoint saved to workspace\models\checkpoints\checkpoint_epoch_008.pth TorchScript model saved to workspace\models\model_00000008.pt --- Starting Cycle 9/10 --- 1000028pos [24:51, 670.65pos/s, Games=13820, AverageMoves=72.4, Nyugyoku=11, Draw=48, MaxMoves=0] All processes terminated. --- Training Phase --- Using device: cuda Using 1 files for training: - workspace\data\20250720_170103_349.data Checkpoint loaded from workspace\models\checkpoints\checkpoint_epoch_008.pth, epoch 7, steps 31249, loss 2.6749 Epoch 9/9 Training: 100%|█████████████████████████████████████████████████████████| 3906/3906 [01:54<00:00, 34.17batch/s, Loss=2.4190, Policy=1.8587, Value=0.5603] Train - Loss: 2.5047, Policy Loss: 1.9250, Value Loss: 0.5797, Time: 114.32s, Total Steps: 35155 Evaluating: 100%|█████████████████████████████████████████████████| 837/837 [00:29<00:00, 28.72batch/s, Loss=6.2612, Policy Acc=0.1211, Value Acc=0.5249] Eval - Loss: 5.8662, Policy Loss: 4.9052, Value Loss: 0.9609, Policy Acc: 0.1211, Value Acc: 0.5249, Time: 29.15s Checkpoint saved to workspace\models\checkpoints\checkpoint_epoch_009.pth TorchScript model saved to workspace\models\model_00000009.pt --- Starting Cycle 10/10 --- 1000061pos [24:41, 675.24pos/s, Games=15209, AverageMoves=65.8, Nyugyoku=9, Draw=21, MaxMoves=0] All processes terminated. --- Training Phase --- Using device: cuda Using 1 files for training: - workspace\data\20250720_172829_398.data Checkpoint loaded from workspace\models\checkpoints\checkpoint_epoch_009.pth, epoch 8, steps 35155, loss 2.5047 Epoch 10/10 Training: 100%|█████████████████████████████████████████████████████████| 3906/3906 [01:54<00:00, 34.21batch/s, Loss=2.3739, Policy=1.7836, Value=0.5904] Train - Loss: 2.3480, Policy Loss: 1.7716, Value Loss: 0.5763, Time: 114.17s, Total Steps: 39061 Evaluating: 100%|█████████████████████████████████████████████████| 837/837 [00:28<00:00, 29.03batch/s, Loss=6.2254, Policy Acc=0.1311, Value Acc=0.5310] Eval - Loss: 6.0087, Policy Loss: 5.0092, Value Loss: 0.9995, Policy Acc: 0.1311, Value Acc: 0.5310, Time: 28.84s Checkpoint saved to workspace\models\checkpoints\checkpoint_epoch_010.pth TorchScript model saved to workspace\models\model_00000010.pt --- All cycles completed ---
TensorBoard
TensorBoardで、精度を確認できるようになった。
まとめ
自己対局と訓練のサイクルをPythonで自動化できるようにした。
また、TensorBoardにより訓練状況の可視化もできるようにした。
次は、シミュレーション回数などのハイパーパラメータのスケジューラを実装して、本格的に学習を行いたい。