Disambiguate modality-CFG vs tumor-CFG#30
Conversation
Closes NVIDIA-Medtech#29. The repo has two distinct `cfg_guidance_scale` flavors that happen to share a key name: modality-CFG in `scripts.diff_model_infer` (image-only) and tumor-mask-CFG in `scripts.inference` / `scripts.infer_image_from_mask` (CT ControlNet). CT works at 0 in both, but MR variants need ~10 on the image-only path. The skills previously documented only the tumor flavor's "0 disables CFG" rule, which silently breaks MR. Doc changes: - README §2: CFG-defaults callout next to the FOV warning; inline notes in §2.2 (MR Brain) and §2.5 (MR). - skills/infer_image-only.md: new CFG section explaining the effect of small/large values per modality, plus cross-link warning. - skills/infer_image-from-mask.md: rewrite §6 to lead with value effects (0 = off; 1..5 = stronger tumor enforcement), CT-only scope, cross-link. - skills/infer_mask-image-paired.md: tighten the config-table cell. Config change: - configs/config_maisi_diff_model_rflow-mr.json: cfg_guidance_scale 15 → 10 so the shipped default matches the documented recommendation. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Disambiguates the two distinct CFG flavors in this repo so the same key name no longer overloads two unconditional-branch semantics: - `cfg_guidance_scale` stays in `scripts.diff_model_infer` (image-only path), where the unconditional branch zeros the modality class label. This is the CFG that MR variants need ~10 for. - `cfg_guidance_scale_tumor` is the new key in `scripts.inference` / `scripts.infer_image_from_mask` / `LDMSampler` (CT mask-conditioned path), where the unconditional branch is the same mask with tumors removed via `remove_tumors()`. CT-only; `0` remains the default. Back-compat: `scripts.inference` and `scripts.infer_image_from_mask` accept the legacy `cfg_guidance_scale` key in `config_infer*.json` for one release, emitting a DeprecationWarning before copying to the new attribute. All shipped `config_infer*.json` files updated to the new key. Skills updated to reference the new key name and re-emphasize that the two CFG flavors are now keyed distinctly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Parallels the cfg_guidance_scale_tumor rename so both CFG flavors are now named explicitly by what they steer: - `cfg_guidance_scale_modality` (new) in scripts.diff_model_infer and all configs/config_maisi_diff_model_*.json — the unconditional branch zeros the modality class label. CT → 0, MR → 10. - `cfg_guidance_scale_tumor` (renamed earlier) in scripts.inference / scripts.infer_image_from_mask / LDMSampler — the unconditional branch rebuilds the mask with tumors removed. CT-only; 0 = off. Back-compat: diff_model_infer accepts the legacy `cfg_guidance_scale` key in config_maisi_diff_model_*.json for one release with a DeprecationWarning. Also: - ruff-format auto-fixes on multi-line warning strings. - README.md MD028 fix (fused the FOV and CFG blockquotes with `>` separator). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Consolidates the configuration-by-GPU-memory knowledge that was previously implicit across ten `config_infer_<XXg>_<dim>.json` preset files into one canonical reference table. - skills/infer_mask-image-paired.md: new "How to configure a run" section walking through the five things users set in `config_infer.json` — (1) modality (CT-only for this pipeline), (2) `output_size` + `spacing` from FOV with the `spacing = FOV / output_size` formula, (3) AE sliding-window knobs (`_size`, `_overlap`, `_tp_num_splits`) by (GPU memory, output_size) with a 9-row validated-presets table covering 16/24/32/80 GB cards, (4) `cfg_guidance_scale_tumor`, (5) `num_inference_steps`. - skills/infer_image-from-mask.md: cross-link to the mask-image-paired table (same configs apply), and add `autoencoder_tp_num_splits` to the Configuration-knobs table. - skills/infer_image-only.md: spell out `spacing = FOV / dim`; note that this path does NOT expose the AE sliding-window knobs (they're hardcoded in `scripts.diff_model_infer`), with cross-link to the presets table for users who need them. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per the original request, the "How to configure a run" table should only carry the five columns: GPU memory, output_size, and the three AE sliding-window knobs (size, overlap, tp_num_splits). Removes spacing, num_inference_steps, and the config-file-name columns I added in the prior pass. Also reorders the numbered subsections to match the requested workflow: modality → AE knobs (GPU+output_size) → spacing formula → cfg_guidance_scale_modality cross-link → cfg_guidance_scale_tumor. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stays out of the GPU-memory table (it's scheduler-driven, not memory-driven) but is restored as a numbered step in the workflow — mainly so users see the rflow-ct → 30 / ddpm-ct → 1000 distinction and the always-1000 rule for the mask DM. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Brings the file name in line with the other skill names in this directory (infer_image-only, infer_image-from-mask, infer_mask-image-paired). Updates the frontmatter `name:` field and all cross-references in the other skills. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drops the "Algorithm step by step" walkthrough and the `ldm_conditional_sample_one_image` function-args table — those were developer-internal references, not what a user running the script needs. New structure (user-operator scoped): - Command to run (the README §2.6 invocation with arg explanations). - Input: mask format, two production paths (nv-segment-ct + body envelope; or another segmenter + remap), 0..124 vs 132-class pitfall, CLI validation behavior. - Configuration: cross-link to the GPU-memory presets and per-knob walkthrough in infer_mask-image-paired.md, plus the hard output_size/spacing constraints and the tumor-CFG knob (the one user-facing knob that's unique to this skill). - Output: file names, intensity ranges, background HU handling. - Related scripts: short table of the four entry points a user might call into. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Applies the same operator-focused trim to the remaining three inference skills that was applied to infer_image-from-mask.md. Drops algorithm walkthroughs and function-args tables; keeps the workflow figures (per user direction). - infer_mask-image-paired.md: keep the "How LDMSampler chooses the mask path" flowchart. Drop the "Two paths in detail" algorithm prose, the `LDMSampler.__init__ — required state` table (internal API), and the Quality check loop details. Rename "Code references" to "Related scripts" (5 rows, user-relevant scripts only). - infer_mask-only.md: drop "Inputs to ldm_conditional_sample_one_mask" function-args table, the 7-step "Algorithm step by step" section, and the internal "Two paths" prose. Keep the TL;DR workflow figure and the anatomy_size 10-d slot table (which is itself a user-input reference). New "Configuration: Path A vs Path B" + "Related scripts" sections. - infer_image-only.md: already operator-focused; add brief Output and Related scripts sections for consistency. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per follow-up direction: keep the Path A / Path B explainer in infer_mask-image-paired.md — it's a must-have for understanding the dispatch, just kept short (one bullet per path, plus when-to-use guidance) rather than the long algorithmic detail the original had. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirrors the same six-step template used in infer_mask-image-paired.md so the two skills are visually parallel: 1. modality (with body-region indices folded in as a ddpm-ct sub-knob) 2. AE sliding-window knobs (N/A here — hardcoded in diff_model_infer) 3. dim / spacing from FOV (recommended table + hard constraints + spacing = FOV / dim formula) 4. cfg_guidance_scale_modality (the relevant CFG for image-only) 5. cfg_guidance_scale_tumor (N/A here — CT mask-conditioned only) 6. num_inference_steps Removes the appended "Workflow summary" recap (the numbered walkthrough now IS the workflow) and the separate Modality codes / CFG / Body-region sections (folded into the numbered steps). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Each inference skill that uses a `cfg_guidance_scale_*` knob now expands "CFG" to "classifier-free guidance" at the first occurrence and gives a one-sentence description of what the unconditional branch is in that specific path: - infer_image-only.md (step 4): CFG on modality conditioning, unconditional zeros the modality label. - infer_image-from-mask.md (Configuration > cfg_guidance_scale_tumor): CFG on tumor presence, unconditional uses the mask with `remove_tumors()` applied. - infer_mask-image-paired.md (step 5): same as above (this skill's pipeline uses the tumor-CFG flavor). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Drop the cross-link explanation paragraph under "### 5. cfg_guidance_scale_tumor → N/A in this path". The header already says it all. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two consistency fixes to make the skill set easier for an AI to navigate as a unit: - All skills with a CLI now use "## Command to run" as the cmd section header (was "Quick Start commands" / "Quick Start command" in two). - download-models.md renames "## Code reference" to "## Related scripts" so it parallels the other skills' bottom-section template, and switches its Related-skills bullets from bare backticks to `[name](name.md)` markdown links to match the convention used by every other inference skill. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The biggest gap a fresh AI hits when trying to run inference from these skills is synthesizing the per-knob explanations into a concrete config that actually works. Each inference skill now has an "End-to-end example" block right after the Command-to-run section showing: - The exact `download_model_data` invocation. - The exact config JSON values for a representative use case (no template substitutions, no "edit X to Y" pointers — concrete values). - The exact `python -m scripts.X` invocation that ties it together. - The expected output file-name pattern. Examples: - infer_image-only.md → T2-weighted whole-brain MRI with rflow-mr-brain (the modality + CFG behaviour that issue NVIDIA-Medtech#29 cares about). - infer_image-from-mask.md → CT synthesis from a user-supplied MAISI-vocabulary mask with rflow-ct on a 24 GB GPU. - infer_mask-image-paired.md → Path B chest CT pair with rflow-ct on a 24 GB GPU, plus a one-line note for switching to Path A. These are concrete copy-paste starting points. The per-knob walkthrough later in each skill is still the reference for any non-default tuning. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
New FOV reference for the MR-Brain model, computed from the MR-RATE training set (batches 0–27, train+val splits, 128³ embeddings only — each 128³ emb = one source image). Total unique images per skull condition: 323 221. Whole-brain and skull-stripped share identical FOV (two preprocessings of the same subject). Placed in: - README §2.2 (MR Brain Image Generation) — new "Recommended FOV for MR `rflow-mr-brain` model" subsection. - skills/infer_image-only.md — new "Per-plane FOV table for `rflow-mr-brain`" subsection under the existing FOV-recommendation block. The two simplified brain rows in the parent table now link to this detailed per-plane breakdown. Oblique acquisition plane omitted (rare); MRA included for completeness even though no `mri_mra` modality code exists at inference. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous example used an isotropic 256³ × 1 mm volume — that matched the (now-removed) simplified Brain row in the FOV table but sits outside the actual MR-RATE training FOV distribution. Updated to T1 axial (the most-supported combination at 47 810 training images) with `dim=[256,256,128]`, `spacing=[0.94, 0.94, 1.36]`, reproducing the median training FOV of 240 × 240 × 174 mm. Modality code updated to 9 (T1 whole-brain) accordingly. The tail instructions tell the user how to switch to other (modality, plane) combinations using the per-plane FOV table. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Per-plane FOV table for rflow-mr-brain previously lived in both README §2.2 and skills/infer_image-only.md — duplicated. Moved to docs/inference.md so all three FOV references (rflow-ct, rflow-mr, rflow-mr-brain) sit in one canonical location. - docs/inference.md: new "Recommended FOV for MR `rflow-mr-brain` model" section right after the existing rflow-mr section. Same per-plane data (T1/T2/FLAIR/SWI/MRA × axial/sagittal/coronal, N counts the 128³ embeddings = unique images). - README §2.2: 23-line table replaced with one-line cross-link parallel to how §2.5 already cross-links to the rflow-mr table. - skills/infer_image-only.md: per-plane brain table removed; parent FOV-recommendation table now has a cross-link row to docs for rflow-mr-brain AND a new cross-link row for rflow-mr (which was missing entirely from this skill). Worked-example anchor links re-pointed to docs/inference.md. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The top-of-§2 warning callout already pointed readers to the CT and rflow-mr FOV tables in docs/inference.md but was missing the rflow-mr-brain one. Added it as a third bullet so all three model variants' FOV references are surfaced in one place. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
For consistency with the MR-variant rows (which already cross-link to docs/inference.md), the four inline CT rows in infer_image-only.md's parent FOV table are now one row pointing to the canonical "Recommended Spacing for CT" section in docs/inference.md. The whole table is now a 3-row variant→docs-anchor lookup instead of mixing inline values with cross-links. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Reflow the existing content so a human reader gets: 1. Overview — what each variant is for + which skill backs it. 2. Execute inference — one subsection per (variant, mode), with the rflow-mr-brain run command added (previously missing — only rflow-mr was shown). 3. Recommended FOV — the three tables (CT spacing, rflow-mr, rflow-mr-brain) live as sibling H2 sections in one place. Anchor names preserved so the cross-links from README and skills keep working. 4. Modality codes — consolidated table covering all four variants (previously scattered in "MR Modality Control"). 5. Configuration parameter reference — turned the dense bulleted parameter list into a table, added the new cfg_guidance_scale_modality / cfg_guidance_scale_tumor / autoencoder_tp_num_splits entries. 6. Quality check, Tuning checklist (new — OOM / seam / speed / washed-out MR / OOD FOV symptoms with concrete remedies). 7. Architecture (moved to end — reference, not flow). Also drops "(batches 0–27, train+val splits)" and the "one 128³ embedding per image" implementation note from the rflow-mr-brain FOV section per follow-up — those are noise for the reader. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
|
@Sidharth1743 If you have time, it would be appreciated if you can take a look. |
Greptile SummaryThis PR disambiguates the two overloaded
Confidence Score: 5/5Safe to merge — the rename is applied consistently across all 24 files and both previously-raised gaps are now closed. All back-compat shims handle the three key-lookup outcomes (new key / legacy key with warning / missing key with default), the LDMSampler signature change is backward-compatible for both keyword and positional callers, and the modality-agnostic core run_controlnet_conditioned_image_dm is untouched. No regressions found in the call chains for either CFG flavor. No files require special attention. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[User config JSON] --> B{Which config?}
B -->|config_maisi_diff_model_*.json| C[cfg_guidance_scale_modality]
B -->|config_infer*.json| D[cfg_guidance_scale_tumor]
B -->|Legacy: cfg_guidance_scale| E[Back-compat shim\nDeprecationWarning]
E -->|in diff_model_infer| C
E -->|in inference / infer_image_from_mask| D
C --> F[scripts.diff_model_infer run_inference]
F -->|scale=0| G[CT: single UNet forward]
F -->|scale >0| H[MR: doubled inputs, class_labels zeroed]
D --> I[ldm_conditional_sample_one_image_from_mask]
I -->|scale=0| J[CT: standard controlnet forward]
I -->|scale >0| K[CT: tumor-free mask branch]
Reviews (11): Last reviewed commit: "Add cfg_guidance_scale → cfg_guidance_sc..." | Re-trigger Greptile |
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
Two breakages identified in code review: 1. Cell 12 reads `config_infer.json` into `args` via setattr. After the key rename in this PR, the shipped configs use `cfg_guidance_scale_tumor`, so `args.cfg_guidance_scale` no longer exists and the LDMSampler call below would raise AttributeError on any user re-running the notebook. 2. Cell 18 passes `cfg_guidance_scale=` to LDMSampler, which now only accepts `cfg_guidance_scale_tumor` — TypeError on construction. Fixes: - Cell 12: add the same back-compat shim that scripts/inference.py already does — if the loaded config has the legacy `cfg_guidance_scale` key (and not the new one), copy the value to `args.cfg_guidance_scale_tumor` with a DeprecationWarning. Default to 0.0 if neither is set. - Cell 18: rename the kwarg to `cfg_guidance_scale_tumor`. With these, the notebook works against both the new shipped configs and any user's older fork that still has `cfg_guidance_scale`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Surfaces the two user-facing changes from this PR in the README News section so existing users see them at a glance. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Quick start should be quick. Drops the Why column and the configuration-file-path / sub-warning copy. Leaves just the 2-row "models → values" table that lets a reader scan and move on. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
If the user is just keeping the default, no callout is needed. The top-of-§2 CFG-defaults table already enumerates the shipped values. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Co-authored-by: greptile-apps[bot] <165735046+greptile-apps[bot]@users.noreply.github.com>
|
Want your agent to iterate on Greptile's feedback? Try greploops. |
build.nvidia.com/nvidia/maisi is retired; the live demo now lives at huggingface.co/spaces/nvidia/nv-generate. Updates both references in README (the Overview Live-Demo line and the Resources list). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
These two changes are independent of the issue-NVIDIA-Medtech#29 CFG-key rename and inference-skill restructure, and were extracted into a smaller PR (branch: inference-md-and-demo-link) so they can be reviewed and merged on a faster track. - Reverts docs/inference.md to upstream/main state (the reorg + new rflow-mr-brain FOV section live in the extracted PR). - Reverts the README live-demo link change (also in the extracted PR). After the extracted PR merges, this PR's cross-links from README §2 and skills/infer_image-only.md to the rflow-mr-brain FOV anchor in docs/inference.md will resolve cleanly. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The operator-focused rewrite of skills/* + the infer_mask-generation → infer_mask-only rename are extracted into a separate PR (branch: skills-restructure) so this PR stays focused on the cfg_guidance_scale key rename + tutorial fix. - skills/* restored to upstream/main state (every file reverted; the renamed-away infer_mask-only.md deleted and infer_mask-generation.md reinstated). - README News entry trimmed to mention only the CFG-key rename. The "Added operator-focused inference skills" portion moves to the skills-restructure PR.
…VIDIA-Medtech#31) Brings the post-restructure skills layout and reorganized docs/inference.md onto this branch so the cfg key renames can be re-applied against the now-canonical doc structure. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After merging upstream/main (PR NVIDIA-Medtech#32 restructured skills, PR NVIDIA-Medtech#31 reorg'd docs/inference.md), re-apply the cfg key renames to the new structure: - `infer_image-only.md` — image-only path uses `cfg_guidance_scale_modality` (modality CFG: required `> 0` for MR). - `infer_image-from-mask.md`, `infer_mask-image-paired.md` — mask-conditioned path uses `cfg_guidance_scale_tumor` (CT tumor CFG). - `docs/inference.md` — both keys differentiated in the knobs table and the MR-contrast tuning bullet. Each skill notes that the legacy un-suffixed `cfg_guidance_scale` key is still accepted for one release with a `DeprecationWarning`. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…ampler Address review feedback on the cfg rename PR: LDMSampler.__init__ was the one rename site missing a deprecation shim. Config-file callers already get a graceful DeprecationWarning in scripts.inference / scripts.infer_image_from_mask / scripts.diff_model_infer, but direct Python-API callers passing the legacy `cfg_guidance_scale` kwarg got a hard TypeError with no migration path. LDMSampler now accepts both kwargs; passing the legacy name remaps to `cfg_guidance_scale_tumor` with a DeprecationWarning, matching the config-level shim policy. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Closes #29.
Why
cfg_guidance_scaleoverloaded two unrelated CFG flavors:scripts.diff_model_infer): unconditional branch zerosthe modality class label. MR needs > 0 (default 10);
0washes out contrast —issue Add guidence to how to set cfg in readme and skills #29.
scripts.inference/scripts.infer_image_from_mask):unconditional branch is the mask with tumors removed. CT-only;
0is thecorrect default.
Users applying "0 disables CFG" (the tumor-flavor default) to MR silently
broke output.
Code
Renamed the two keys to disambiguate, with back-compat shims that emit
DeprecationWarning:cfg_guidance_scale(inconfig_maisi_diff_model_*.json)cfg_guidance_scale_modalityscripts.diff_model_infercfg_guidance_scale(inconfig_infer*.json)cfg_guidance_scale_tumorscripts.inference,scripts.infer_image_from_mask,LDMSamplerAligned
configs/config_maisi_diff_model_rflow-mr.jsondefault15 → 10tomatch
rflow-mr-brain.inference_tutorial.ipynb(cells 12 + 18) fixed to use the new key +back-compat shim, so the notebook keeps working against both the new shipped
configs and any user's older fork.
Docs
which value to keep.
The
docs/inference.mdreorganization + the operator-focusedskills/restructure are in separate PRs (#31 already merged;
skills-restructureopen).
Test plan
cfg_guidance_scalestill run;DeprecationWarningfires; outputs unchanged.
python -m scripts.diff_model_inferforrflow-mr-brainwith shippedconfig produces contrast-faithful MR (verified in CI smoke + manual H100 run).
pre-commit run --all-filespasses.