Is there an existing issue / discussion for this? | 是否已有关于该错误的issue或讨论?
Is there an existing answer for this in tutorial? | 该问题是否在教程中有解答?
Current Behavior | 当前行为
Checkpoint filenames depend on num_epochs (e.g., ..._1.pt vs ..._01.pt).
When saving a checkpoint and resuming training with a different number of digits in num_epochs, basicts_runner._get_ckpt_path() generates a different filename pattern.
This causes an error, when renaming the old checkpoint after the first epoch of the new run in checkpoint.backup_last_ckpt().
Expected Behavior | 期望行为
Checkpoint paths should remain compatible across runs, regardless of the number of digits in num_epochs.
Proposed solutions:
- Use fixed-width padding (e.g. {epoch:04d}), or
- Use pattern matching (like in checkpoint.get_last_ckpt_path() , or
- Avoid padding entirely
I’d be happy to open a PR for this if one approach sounds reasonable.
Environment | 运行环境
- OS:
- DEVICE:
- NVIDIA Driver:
- CUDA:
- NVIDIA GPU Memory:
- PyTorch:
BasicTS logs | BasicTS日志
No response
Steps To Reproduce | 复现方法
Train with num_epochs=5
Resume with num_epochs=50
(wait until first epoch of the new run is finished)
Anything else? | 备注
No response
Is there an existing issue / discussion for this? | 是否已有关于该错误的issue或讨论?
Is there an existing answer for this in tutorial? | 该问题是否在教程中有解答?
Current Behavior | 当前行为
Checkpoint filenames depend on num_epochs (e.g., ..._1.pt vs ..._01.pt).
When saving a checkpoint and resuming training with a different number of digits in num_epochs, basicts_runner._get_ckpt_path() generates a different filename pattern.
This causes an error, when renaming the old checkpoint after the first epoch of the new run in checkpoint.backup_last_ckpt().
Expected Behavior | 期望行为
Checkpoint paths should remain compatible across runs, regardless of the number of digits in num_epochs.
Proposed solutions:
I’d be happy to open a PR for this if one approach sounds reasonable.
Environment | 运行环境
BasicTS logs | BasicTS日志
No response
Steps To Reproduce | 复现方法
Train with num_epochs=5
Resume with num_epochs=50
(wait until first epoch of the new run is finished)
Anything else? | 备注
No response