I left a model training overnight with
docker-compose exec app mpirun -np 8 python3 train.py -e connect4
But after just an hour or so it crashed with error:
A load persistent id instruction was encountered, but no persistent_load function was specified.
Subsequently, I could not restart training, as whenever the program attempted to load best_model.zip it produced the same error. Investigation revealed that the best_model.zip file had somehow become malformed/corrupted. I had to replace it with a prior saved model in order to resume training.