Added EverAnimate Support by 0xBeycan · Pull Request #2023 · kijai/ComfyUI-WanVideoWrapper

0xBeycan · 2026-05-28T22:53:57Z

No description provided.

younestft · 2026-06-01T17:12:11Z

**I tested the EverAnimate PR locally and found two temporal alignment issues in the long-generation loop when using the default EverAnimate settings and had Codex (GPT 5.5high) fix them for me:

num_video_anchor_latents = 4
num_motion_latents = 1
frame_window_size = 81
pose_images connected
face_images connected

The PR adds extra latent anchor slots for EverAnimate, but pose and face conditioning were still built for the original WanAnimate window length.

Issue 1: pose latent mismatch

Error:

RuntimeError: The size of tensor a (24) must match the size of tensor b (21) at non-singleton dimension 2

Location:

wanvideo/modules/model.py
wananimate_pose_embedding()
x_[:, :, 1:].add_(pose_latents_, alpha=strength)

Cause:

With N=4 anchors, noise.shape[1] becomes latent_window_size + 4. The model applies pose embeddings to x_[:, :, 1:], so it expects noise.shape[1] - 1 pose latent positions.

For frame_window_size=81:

pose_input_slice length = 21
expected pose length = 24
missing = 3 = N - 1

Fix used locally:

pose_input_slice = vae.encode([pose_image_slice], device, tiled=tiled_vae, pbar=False).to(dtype)
if everanim_mode:
expected_pose_len = noise.shape[1] - 1
current_pose_len = pose_input_slice.shape[2]
if current_pose_len < expected_pose_len:
pad_len = expected_pose_len - current_pose_len
pose_pad = torch.zeros(
pose_input_slice.shape[0],
pose_input_slice.shape[1],
pad_len,
pose_input_slice.shape[3],
pose_input_slice.shape[4],
device=pose_input_slice.device,
dtype=pose_input_slice.dtype,
)
pose_input_slice = torch.cat([pose_pad, pose_input_slice], dim=2)
elif current_pose_len > expected_pose_len:
pose_input_slice = pose_input_slice[:, :, :expected_pose_len]**

Issue 2: face adapter temporal mismatch

After fixing pose, face conditioning failed with:

einops.EinopsError:
Error while processing rearrange-reduction pattern "B (L S) H D -> (B L) S H D".
Input tensor shape: torch.Size([1, 39000, 40, 128]).
Additional info: {'L': 22}.
Shape mismatch, can't divide axis of length 39000 in chunks of 22

Location:

wanvideo/modules/wananimate/face_blocks.py
FaceBlock.forward()
q = rearrange(q, "B (L S) H D -> (B L) S H D", L=T)

Cause:

The latent sequence had 25 temporal groups, but the face encoder produced 22. Again, this is short by N - 1 latent groups. Since the face encoder temporal stride is 4 RGB frames per latent group, N=4 needs 12 blank RGB frames prepended.

Fix used locally:

if wananim_face_pixels is None and wananim_ref_masks is not None:
face_images_in = torch.zeros(1, 3, frame_window_size, 512, 512, device=device, dtype=torch.float32)
elif wananim_face_pixels is not None:
face_images_in = face_images[:, :, start:end].to(device, torch.float32) if face_images is not None else None

if everanim_mode and face_images_in is not None:
extra_face_frames = max(0, (everanim_N - 1) * 4)
if extra_face_frames > 0:
face_pad = torch.full(
(
face_images_in.shape[0],
face_images_in.shape[1],
extra_face_frames,
face_images_in.shape[3],
face_images_in.shape[4],
),
-1.0,
device=face_images_in.device,
dtype=face_images_in.dtype,
)
face_images_in = torch.cat([face_pad, face_images_in], dim=2)

I also found a small callback bug in the same loop:

callback_latent = (latent_model_input.to(device) - noise_pred.to(device) * t.to(device) / 1000)

"t" is not the loop timestep variable in this branch. I changed it to:

callback_latent = (latent_model_input.to(device) - noise_pred.to(device) * timestep.to(device) / 1000)

With these patches, the long EverAnimate loop is temporally consistent for N=4.

Aligns the EverAnimate looping path with the canonical diffsynth reference (wan_video_svi.py) so the N-anchor + M-motion streaming scheme matches what the rank-32 LoRA was trained with. nodes_sampler.py: - Pose latents: front-pad by N-1 in everanim mode. Kijai's model adds pose to x_[:, :, 1:] (skip 1), but EverAnimate prepends N anchors, so the model expects noise.shape[1]-1 pose positions. Padding lands the real pose on x_[N:], reproducing the canonical after_patch_embedding (x[:, :, N:] += pose). Fixes RuntimeError on pose embedding. - Face pixels: prepend (N-1)*4 blank frames in everanim mode so the face encoder yields noise.shape[1] temporal groups (4x compression + the model's +1 pad), matching canonical pad_face=N. Fixes einops mismatch in FaceBlock. - Window stepping: derive refert_num from the motion carry M, not the anchor count N. The old (N-1)*4+1 caused 4*(N-M) duplicated RGB frames at every window boundary; now step == kept content (contiguous output). - Motion mask: keep motion mask at 0 (soft context) instead of 1 on continuation, matching canonical (only anchors get mask=1). mask=1 was out-of-distribution for the LoRA. - Callback: use 'timestep' (this loop's variable) instead of undefined 't', which raised NameError when a preview/callback was attached. everanimate/nodes.py: - frame_window_size default 81 -> 77 to match EverAnimate's trained clip length (frames_per_clip=77 -> 20 content latents per window). Co-Authored-By: Claude Opus 4.8 <noreply@anthropic.com>

Added EverAnimate Support

1972d87

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Added EverAnimate Support#2023

Added EverAnimate Support#2023
0xBeycan wants to merge 2 commits into
kijai:mainfrom
0xBeycan:everanimate-integration

0xBeycan commented May 28, 2026

Uh oh!

younestft commented Jun 1, 2026 •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

0xBeycan commented May 28, 2026

Uh oh!

younestft commented Jun 1, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

younestft commented Jun 1, 2026 •

edited

Loading