Skip to content

Add SCAIL-Pose2 / SCAIL-2 workflow support#2035

Open
rookiestar28 wants to merge 32 commits into
kijai:mainfrom
rookiestar28:main
Open

Add SCAIL-Pose2 / SCAIL-2 workflow support#2035
rookiestar28 wants to merge 32 commits into
kijai:mainfrom
rookiestar28:main

Conversation

@rookiestar28

@rookiestar28 rookiestar28 commented Jun 21, 2026

Copy link
Copy Markdown

Summary

This PR adds native SCAIL-Pose2 / SCAIL-2 conditioning support for WanVideo workflows, including replacement and animation dual-mode routing, context-window handling, replacement mask/sample initialization fixes, and a new example workflow.

It also includes a small NLF bbox formatting fix so multi-person bbox candidates are preserved instead of collapsed.

What changed

SCAIL-Pose2 / SCAIL-2 integration

  • Added SCAIL-2 loader detection for native pose/mask embedding weights.
  • Added SCAIL-2 routing helpers for sampler/model integration.
  • Added native SCAIL-2 forward planning for:
    • reference latents and masks
    • additional references
    • pose latents
    • driving masks
    • replacement-mode RoPE/cache planning
    • control strength scaling
    • context-window slicing
  • Added WanVideoAddSCAIL2ConditionEmbeds to materialize SCAIL-Pose2 payloads into WanVideo image embeds.
  • Routed SCAIL-2 conditioning through WanVideoSampler into the Wan model forward pass.

Replacement mode stability

  • Preserved replacement condition structure instead of sanitizing away required pose/reference information.
  • Kept raw driving-video condition routing for replacement workflows.
  • Added conservative binary latent-mask resizing for SCAIL-Pose2 replacement masks so thin subject masks survive latent downsampling.
  • Added spatial and temporal mask growth for replacement masks.
  • Added mask-aware samples initialization:
    • preserved/background regions come from encoded driving-video samples
    • replacement subject regions remain random/noised as intended
  • Added context-window alignment for samples/noise masks so long-video windows stay frame-aligned.

Dual-mode samples behavior

  • Centralized SCAIL-Pose2 samples-disable metadata and helper logic.
  • WanVideoEncode.samples can remain wired in dual-mode workflows.
  • In replacement mode, samples stay active for replacement/background preservation.
  • In animation and other non-replacement modes, the samples payload is marked disabled and ignored by the sampler before video-to-video samples handling.
  • Documented this behavior in the README, including the scope of add_noise_to_samples.

Workflow / docs

  • Added example_workflows/wanvideo_2_1_14B_SCAIL2_replacement_and_animate_dual_mode_example_01.json.
  • Updated README with SCAIL-Pose2 dual-mode samples guidance.

NLF bbox

  • Added bbox formatting support that preserves multiple NLF person candidates.
  • Added regression coverage for the multi-person bbox output shape.

Tests

Local validation run:

python -m unittest discover -s tests -p "test_*.py"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant