docs: clarify ControlNet finetuning data prep (§4.3) + add data-prep … by Can-Zhao · Pull Request #39 · NVIDIA-Medtech/NV-Generate-CTMR

Can-Zhao · 2026-06-10T22:50:56Z

…skill

Rewrite data/README.md §4.3 into clear steps and document how to derive the preprocessed files (image embedding, VISTA-3D pseudo labels, combined labels) from a user's own original image + mask. Add skills/finetune_data-prep.md covering the same flow, including remapping a new class onto any unclaimed label index (0-255). Fix two errors in the old text: the backwards fold comment (held-out fold = validation) and stale dataset paths (maisi/dataset -> datasets/).

…skill Rewrite data/README.md §4.3 into clear steps and document how to derive the preprocessed files (image embedding, VISTA-3D pseudo labels, combined labels) from a user's own original image + mask. Add skills/finetune_data-prep.md covering the same flow, including remapping a new class onto any unclaimed label index (0-255). Fix two errors in the old text: the backwards fold comment (held-out fold = validation) and stale dataset paths (maisi/dataset -> datasets/). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Rename skills/finetune_data-prep.md -> controlnet_finetune_data-prep.md, update the skill `name` field and the README link to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

greptile-apps · 2026-06-10T23:01:54Z

Greptile Summary

This PR rewrites data/README.md §4.3 into structured sub-sections and adds a new skills/finetune_image-from-mask_data-prep.md skill file, together documenting the full pipeline for turning raw images + masks into the three preprocessed files the CT ControlNet training loop needs. It also fixes two pre-existing errors: the backwards fold comment (held-out fold = validation, not training) and stale maisi/dataset/ paths replaced with datasets/.

data/README.md gains a download block, a per-case file-layout diagram, a three-step derivation walkthrough (VAE embedding → NV-Segment pseudo labels → combined mask), and a corrected, annotated data-list JSON example.
skills/finetune_image-from-mask_data-prep.md (new) provides a self-contained step-by-step guide covering embeddings, body-envelope generation, label remapping onto any unclaimed index, fold splits, and weighted_loss_label configuration.

Confidence Score: 5/5

Documentation-only change with no impact on runnable code; safe to merge.

Both changed files are markdown documentation. The fold-semantics fix and path corrections are accurate and verified against the surrounding codebase. Cross-references to infer_image-from-mask.md resolve correctly. The one new finding is a // comment in a json fenced block that would break copy-paste into a config file, but it carries no runtime risk.

No files require special attention beyond the minor //-in-JSON note in skills/finetune_image-from-mask_data-prep.md.

Important Files Changed

Filename	Overview
data/README.md	Rewrites §4.3 into clear sub-sections: download, per-case layout, step-by-step derivation, and data-list JSON; fixes backwards fold comment and stale maisi/dataset → datasets/ paths. No functional code changes.
skills/finetune_image-from-mask_data-prep.md	New skill doc covering end-to-end data prep for ControlNet finetuning; a `//` comment in a `json` fenced block is invalid JSON and will break config parsing if copy-pasted.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[Original Image\n*.nii.gz] --> B[Step 1: VAE Encode\nscripts/diff_model_create_training_data.py\nautoencoder_v1.pt]
    A --> C[Step 2: NV-Segment\nexternal tool]
    B --> D[image_emb.nii.gz]
    C --> E[Organ labels only\nno body envelope]
    E --> F[add_body_envelope\nscripts.utils]
    F --> G[mask_pseudo_label.nii.gz\norgans + body 200]
    H[Original Mask\n*.nii.gz] --> I[Step 3a: remap_labels\nto MAISI indices]
    I --> J[remapped mask]
    G --> K[Step 3b: Overlay\nwrite remapped mask on top]
    J --> K
    K --> L[mask_combined_label.nii.gz]
    D --> M[JSON data list\nimage + label + spacing + fold]
    L --> M
    M --> N[scripts.train_controlnet\nControlNet finetuning]

_{Reviews (4): Last reviewed commit: "docs: fix markdownlint MD051/MD028 in fi..." | Re-trigger Greptile}

Document how to produce the body envelope (label 200): the segmenter never emits it, so defer to Option A of infer_image-from-mask.md (NV-Segment CT_BODY -> add_body_envelope) instead of duplicating it. Replace all VISTA-3D references with NV-Segment in both the skill and README §4.3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

The body-region one-hots are consumed only when include_body_region is true, which is set in config_network_ddpm.json. rflow-ct (config_network_rflow.json) sets it false and ignores the fields, so they can be omitted. Note this in the JSON examples and field notes in both the skill and README §4.3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

A new/unseen label is not mandatory — finetuning to a new site/dataset with only existing MAISI classes follows the same pipeline. Reframe the skill intro, description, and the weighted_loss section as new-class-only/optional. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

weighted_loss_label up-weights the L1 loss on any label index (e.g. a tumor), new or existing — it is not required for adding a class and not new-class-only. Decouple it from the label_dict.json rename in the skill and README §4.3. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Can-Zhao · 2026-06-11T05:47:03Z

@zephyrie Hi Michael, may I ask if you can help run the ci/cd test? I was removed from the maintainer...Thank you!!

addsouza-nvidia · 2026-06-11T17:23:53Z

This update helps clarify creating data for fine-tuning the model. The original README missed critical information on how the embeddings, pseudo label and combined label is created from original dicom and label images

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Can-Zhao and others added 2 commits June 10, 2026 15:49

docs: rename skill to controlnet_finetune_data-prep

998d4d6

Rename skills/finetune_data-prep.md -> controlnet_finetune_data-prep.md, update the skill `name` field and the README link to match. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Can-Zhao marked this pull request as ready for review June 10, 2026 22:56

greptile-apps Bot reviewed Jun 10, 2026

View reviewed changes

Comment thread skills/finetune_image-from-mask_data-prep.md

Comment thread skills/finetune_image-from-mask_data-prep.md

Comment thread skills/finetune_image-from-mask_data-prep.md

Can-Zhao and others added 5 commits June 10, 2026 16:06

docs: rename skill to finetune_image-from-mask_data-prep

088e706

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

docs: fix markdownlint MD051/MD028 in finetune data-prep skill

52c3756

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

zephyrie merged commit ec8d146 into NVIDIA-Medtech:main Jun 12, 2026
1 check passed

Can-Zhao deleted the docs/clarify-finetune-data-prep branch June 12, 2026 02:29

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

docs: clarify ControlNet finetuning data prep (§4.3) + add data-prep …#39

docs: clarify ControlNet finetuning data prep (§4.3) + add data-prep …#39
zephyrie merged 8 commits into
NVIDIA-Medtech:mainfrom
Can-Zhao:docs/clarify-finetune-data-prep

Can-Zhao commented Jun 10, 2026

Uh oh!

greptile-apps Bot commented Jun 10, 2026 •

edited

Loading

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Can-Zhao commented Jun 11, 2026 •

edited

Loading

Uh oh!

addsouza-nvidia commented Jun 11, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Can-Zhao commented Jun 10, 2026

Uh oh!

greptile-apps Bot commented Jun 10, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Greptile Summary

Confidence Score: 5/5

Important Files Changed

Flowchart

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Can-Zhao commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

addsouza-nvidia commented Jun 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

greptile-apps Bot commented Jun 10, 2026 •

edited

Loading

Can-Zhao commented Jun 11, 2026 •

edited

Loading

addsouza-nvidia commented Jun 11, 2026 •

edited

Loading