ERROR] Error in Phase 1 (Encoding): Tensor type unknown to einops <class 'tuple'>

Think I have had this error before, but forget how to fix it. Any hints?



   ███████╗███████╗███████╗██████╗ ██╗   ██╗██████╗     ██████╗       ███████╗
   ██╔════╝██╔════╝██╔════╝██╔══██╗██║   ██║██╔══██╗    ╚════██╗      ██╔════╝
   ███████╗█████╗  █████╗  ██║  ██║██║   ██║██████╔╝     █████╔╝      ███████╗
   ╚════██║██╔══╝  ██╔══╝  ██║  ██║╚██╗ ██╔╝██╔══██╗    ██╔═══╝       ╚════██║
   ███████║███████╗███████╗██████╔╝ ╚████╔╝ ██║  ██║    ███████╗  ██╗ ███████║
   ╚══════╝╚══════╝╚══════╝╚═════╝   ╚═══╝  ╚═╝  ╚═╝    ╚══════╝  ╚═╝ ╚══════╝
   v2.5.24                                    © ByteDance Seed · NumZ · AInVFX
   ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

[11:16:45.505] ℹ️ OS: Windows (10.0.26200) | GPU: NVIDIA GeForce RTX 5090 (32GB)
[11:16:45.505] ℹ️ Python: 3.13.6 | PyTorch: 2.10.0+cu130 | FlashAttn: v2 ✓ | SageAttn: v2 ✓ | Triton: ✓
[11:16:45.505] ℹ️ CUDA: 13.0 | cuDNN: 91200 | ComfyUI: 0.19.3
[11:16:45.505]
[11:16:45.505]  ━━━━━━━━━ Model Preparation ━━━━━━━━━
[11:16:45.505] 📊 Before model preparation:
[11:16:45.506] 📊   [VRAM] 0.01GB allocated / 0.03GB reserved / Peak: 0.01GB / 30.15GB free / 31.84GB total
[11:16:45.506] 📊   [RAM] 2.93GB process / 11.22GB others / 81.60GB free / 95.75GB total
[11:16:45.506] 📊 Resetting VRAM peak memory statistics
[11:16:45.506] 📥 Checking and downloading models if needed...
[11:16:45.506] 🔧 DiT model found: D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\models\SEEDVR2\seedvr2_ema_7b-Q4_K_M.gguf
[11:16:45.506] 🔧 DiT model already validated (cache): D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\models\SEEDVR2\seedvr2_ema_7b-Q4_K_M.gguf
[11:16:45.507] 🔧 VAE model found: D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\models\SEEDVR2\ema_vae_fp16.safetensors
[11:16:45.508] 🔧 VAE model already validated (cache): D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\models\SEEDVR2\ema_vae_fp16.safetensors
[11:16:45.508] 🔧 Generation context initialized: DiT=cuda:0, VAE=cuda:0, Offload=[DiT offload=cpu, VAE offload=cpu, Tensor offload=cpu], LOCAL_RANK=0
[11:16:45.508] 🎯 Unified compute dtype: torch.bfloat16 across entire pipeline for maximum performance
[11:16:45.509] 🏃 Configuring inference runner...
[11:16:45.509] 🏃 Creating new runner: DiT=seedvr2_ema_7b-Q4_K_M.gguf, VAE=ema_vae_fp16.safetensors
[11:16:45.523] 🚀 Creating DiT model structure on meta device
[11:16:45.635] 🎨 Creating VAE model structure on meta device
[11:16:45.672] 🎨 VAE downsample factors configured (spatial: 8x, temporal: 4x)
[11:16:45.675] 🔄 Moving text_pos_embeds from CPU to CUDA:0 (DiT inference)
[11:16:45.676] 🔄 Moving text_neg_embeds from CPU to CUDA:0 (DiT inference)
[11:16:45.676] 🚀 Loaded text embeddings for DiT
[11:16:45.677] 📊 After model preparation:
[11:16:45.677] 📊   [VRAM] 0.01GB allocated / 0.03GB reserved / Peak: 0.01GB / 30.15GB free / 31.84GB total
[11:16:45.677] 📊   [RAM] 2.93GB process / 11.22GB others / 81.60GB free / 95.75GB total
[11:16:45.677] 📊 Resetting VRAM peak memory statistics
[11:16:45.677] ⚡ Model preparation: 0.17s
[11:16:45.677] ⚡   └─ Model structures prepared: 0.15s
[11:16:45.677] ⚡     └─ DiT structure created: 0.10s
[11:16:45.677] ⚡     └─ VAE structure created: 0.04s
[11:16:45.677] ⚡   └─ Config loading: 0.01s
[11:16:45.678] 🔧   Initializing video transformation pipeline for 1920px (shortest edge)
[11:16:45.793] 🔧   Target dimensions: 1920x1920 (no padding needed)
[11:16:45.798]
[11:16:45.798] 🎬 Starting upscaling generation...
[11:16:45.798] 🎬   Input: 65 frames, 1024x1024px → Output: 1920x1920px (shortest edge: 1920px)
[11:16:45.798] 🎬   Batch size: 65 (uniform), Temporal overlap: 2, Seed: 42, Channels: RGB
[11:16:45.798]
[11:16:45.798]  ━━━━━━━━ Phase 1: VAE encoding ━━━━━━━━
[11:16:45.799] ♻️ Reusing pre-initialized video transformation pipeline
[11:16:45.799] 🎨 Materializing VAE weights to CPU (offload device): D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\models\SEEDVR2\ema_vae_fp16.safetensors
[11:16:45.801] 🎯 Converting VAE weights to torch.bfloat16 during loading
[11:16:46.261] 🎨 Materializing VAE: 250 parameters, 478.07MB total
[11:16:46.265] 🎨 VAE materialized directly from meta with loaded weights
[11:16:46.265] 🎨 VAE model set to eval mode (gradients disabled)
[11:16:46.266] 🎨 Configuring VAE causal slicing for temporal processing
[11:16:46.266] 🎨 Configuring VAE memory limits for causal convolutions
[11:16:46.267] 🎯 Model precision: VAE=torch.bfloat16, compute=torch.bfloat16
FETCH ComfyRegistry Data: 85/140
[11:16:46.269] 🎨 Using seed: 1000042 (VAE uses seed+1000000 for deterministic sampling)
[11:16:46.270] 🔄 Moving VAE from CPU to CUDA:0 (inference requirement)
[11:16:46.481] 📊 After VAE loading for encoding:
[11:16:46.481] 📊   [VRAM] 0.48GB allocated / 0.50GB reserved / Peak: 0.48GB / 29.68GB free / 31.84GB total
[11:16:46.481] 📊   [RAM] 2.93GB process / 11.22GB others / 81.60GB free / 95.75GB total
[11:16:46.481] 📊   Memory changes: VRAM +0.47GB
[11:16:46.481] 📊 Resetting VRAM peak memory statistics
[11:16:46.481] 🎨 Encoding batch 1/1
[11:16:46.482] 🔄   Moving video_batch_1 from CPU to CUDA:0, torch.float32 → torch.bfloat16 (VAE encoding)
[11:16:46.775] 📹   Sequence of 65 frames
[11:16:46.897] 🎨   Using VAE tiled encoding (Tile: (1024, 1024), Overlap: (128, 128))
[11:16:46.897] 🎨 Encoding 4 tiles (Tile: (1024, 1024), Overlap: (128, 128))
[11:16:46.906] 🎨   Encoding tiles 1-4 / 4
[11:16:47.526] ❌ [ERROR] Error in Phase 1 (Encoding): Tensor type unknown to einops <class 'tuple'>
[11:16:47.527] 🔄 Moving VAE from CUDA:0 to CPU (VAE offload)
[11:16:47.637] ✅ Cleared 21 VAE memory buffers
[11:16:47.637] 🧹 Starting full cleanup
[11:16:47.637] 🧹 Cleaning up DiT components
[11:16:47.639] ✅ Cleared 36 RoPE LRU caches
[11:16:47.639] 🧹 DiT on meta device - keeping structure for cache
[11:16:47.642] 🧹 DiT model deleted
[11:16:47.643] 🧹 Cleaning up VAE components
[11:16:47.644] 🧹 VAE model deleted
[11:16:47.644] 🧹 Clearing memory caches (deep)...
[11:16:48.262] ✅ Completed full cleanup
[11:16:48.262] 🧹 Cleaned up text embeddings: texts_pos, texts_neg
!!! Exception during processing !!! Tensor type unknown to einops <class 'tuple'>
Traceback (most recent call last):
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\execution.py", line 534, in execute
    output_data, output_ui, has_subgraph, has_pending_tasks = await get_output_data(prompt_id, unique_id, obj, input_data_all, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                                                              ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\execution.py", line 334, in get_output_data
    return_values = await _async_map_node_over_list(prompt_id, unique_id, obj, input_data_all, obj.FUNCTION, allow_interrupt=True, execution_block_cb=execution_block_cb, pre_execute_cb=pre_execute_cb, v3_data=v3_data)
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\execution.py", line 308, in _async_map_node_over_list
    await process_inputs(input_dict, i)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\execution.py", line 296, in process_inputs
    result = f(**inputs)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\comfy_api\internal\__init__.py", line 149, in wrapped_func
    return method(locked_class, **inputs)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\comfy_api\latest\_io.py", line 1789, in EXECUTE_NORMALIZED
    to_return = cls.execute(*args, **kwargs)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\src\interfaces\video_upscaler.py", line 580, in execute
    raise e
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\src\interfaces\video_upscaler.py", line 469, in execute
    ctx = encode_all_batches(
        runner,
    ...<11 lines>...
        color_correction=color_correction
    )
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\src\core\generation_phases.py", line 489, in encode_all_batches
    cond_latents = runner.vae_encode([transformed_video])
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\utils\_contextlib.py", line 124, in decorate_context
    return func(*args, **kwargs)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\src\core\infer.py", line 179, in vae_encode
    latent = self.vae.encode(sample, tiled=self.encode_tiled, tile_size=self.encode_tile_size,
             ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                            tile_overlap=self.encode_tile_overlap).latent
                            ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\src\models\video_vae_v3\modules\attn_video_vae.py", line 1685, in encode
    p = super().encode(x, return_dict=return_dict, tiled=tiled, tile_size=tile_size,
        ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                      tile_overlap=tile_overlap).latent_dist
                      ^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\python_embeded\Lib\site-packages\diffusers\utils\accelerate_utils.py", line 46, in wrapper
    return method(self, *args, **kwargs)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\src\models\video_vae_v3\modules\attn_video_vae.py", line 1186, in encode
    h = self.tiled_encode(x, tile_size=tile_size, tile_overlap=tile_overlap)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\src\models\video_vae_v3\modules\attn_video_vae.py", line 1403, in tiled_encode
    encoded_tile = self.slicing_encode(tile_sample)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\src\models\video_vae_v3\modules\attn_video_vae.py", line 1259, in slicing_encode
    self._encode(
    ~~~~~~~~~~~~^
        torch.cat((x[:, :, :1], x_slices[0]), dim=2),
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
        memory_state=MemoryState.INITIALIZING,
        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    )
    ^
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\src\models\video_vae_v3\modules\attn_video_vae.py", line 1218, in _encode
    h = self.encoder(_x, memory_state=memory_state)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\src\models\video_vae_v3\modules\attn_video_vae.py", line 849, in forward
    sample = self.mid_block(sample, memory_state=memory_state)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1776, in _wrapped_call_impl
    return self._call_impl(*args, **kwargs)
           ~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\python_embeded\Lib\site-packages\torch\nn\modules\module.py", line 1787, in _call_impl
    return forward_call(*args, **kwargs)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\ComfyUI\custom_nodes\seedvr2_videoupscaler\src\models\video_vae_v3\modules\attn_video_vae.py", line 663, in forward
    hidden_states = rearrange(
        hidden_states, "(b f) c h w -> b c f h w", f=video_length
    )
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\python_embeded\Lib\site-packages\einops\einops.py", line 616, in rearrange
    return reduce(tensor, pattern, reduction="rearrange", **axes_lengths)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\python_embeded\Lib\site-packages\einops\einops.py", line 535, in reduce
    backend = get_backend(tensor)
  File "D:\ComfyUI_windows_latest\ComfyUI_windows_portable\python_embeded\Lib\site-packages\einops\_backends.py", line 62, in get_backend
    raise RuntimeError(f"Tensor type unknown to einops {type(tensor)}")
RuntimeError: Tensor type unknown to einops <class 'tuple'>

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

ERROR] Error in Phase 1 (Encoding): Tensor type unknown to einops <class 'tuple'> #576

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

ERROR] Error in Phase 1 (Encoding): Tensor type unknown to einops <class 'tuple'> #576

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions