Add CLI workflow for diffusion UNet training#33
Conversation
Greptile SummaryThis PR adds
Confidence Score: 3/5The new workflow script orchestrates existing steps correctly but has a defect where using --skip-create-training-data on a fresh working directory crashes before training begins. The unconditional call to scripts/diff_model_train_workflow.py — specifically the unconditional _create_embedding_sidecars call in main() and the flag-conflict handling around --train-from-scratch. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[CLI args parsed] --> B[Set default modality]
B --> C{--download-model-data?}
C -- yes --> D[download_model_data]
C -- no --> E
D --> E[_stage_configs\nwrite env/model/network JSONs\nstage datalist]
E --> F{--skip-create-training-data?}
F -- no --> G[diff_model_create_training_data\ncreate latent embeddings]
F -- yes --> H
G --> H[_create_embedding_sidecars\nwrite .nii.gz.json files]
H --> I{--skip-train?}
I -- no --> J[diff_model_train]
I -- yes --> K
J --> K{--run-inference?}
K -- yes --> L[diff_model_infer]
K -- no --> M
L --> M[Write workflow_summary.json]
style H fill:#ffcccc,stroke:#cc0000
H -.->|crashes if embedding dir does not exist| H
Reviews (1): Last reviewed commit: "Add CLI workflow for diffusion UNet trai..." | Re-trigger Greptile |
| sidecars = _create_embedding_sidecars( | ||
| staged["embedding_dir"], | ||
| args.modality, | ||
| staged["include_body_region"], | ||
| args.top_region_index, | ||
| args.bottom_region_index, | ||
| ) |
There was a problem hiding this comment.
_create_embedding_sidecars crashes when embedding dir doesn't exist
_create_embedding_sidecars is called unconditionally regardless of --skip-create-training-data. When that flag is used on a fresh --work-dir (i.e., the embedding directory has never been populated), sorted(embedding_base_dir.rglob("*_emb.nii.gz")) raises FileNotFoundError because the directory doesn't exist yet. The error surface is unhelpful — there's no message indicating the user should drop --skip-create-training-data or point to an existing work dir with embeddings. Consider guarding with embedding_base_dir.is_dir() or only calling _create_embedding_sidecars when embeddings are present.
| def _modality_mapping(repo_root: Path) -> dict[str, int]: | ||
| path = repo_root / "configs" / "modality_mapping.json" | ||
| if not path.is_file(): | ||
| return {} | ||
| return {str(k): int(v) for k, v in _load_json(path).items()} |
There was a problem hiding this comment.
Misleading error when
modality_mapping.json is absent
_modality_mapping silently returns {} when modality_mapping.json doesn't exist. The subsequent ValueError then says the modality was "not found in configs/modality_mapping.json", implying the file exists but lacks the entry — when in fact the file itself is missing. Distinguishing the two cases helps the user understand what's wrong faster.
| def _modality_mapping(repo_root: Path) -> dict[str, int]: | |
| path = repo_root / "configs" / "modality_mapping.json" | |
| if not path.is_file(): | |
| return {} | |
| return {str(k): int(v) for k, v in _load_json(path).items()} | |
| def _modality_mapping(repo_root: Path) -> dict[str, int]: | |
| path = repo_root / "configs" / "modality_mapping.json" | |
| if not path.is_file(): | |
| raise FileNotFoundError(f"modality_mapping.json not found at {path}") | |
| return {str(k): int(v) for k, v in _load_json(path).items()} |
Summary
Validation