LTX2TextEncoder silently falls back to wrong tokenizer → every prompt produces identical output

## Summary
  
When generating with `prince-canuma/LTX-2.3-distilled` (model-repo) plus `Lightricks/LTX-2` (text-encoder-repo), `LTX2TextEncoder.load()` sile$
  
Repro is trivial. This affects both `distilled` and `dev` pipelines in T2V and I2V modes — effectively all generation is prompt-agnostic on th$
  
## Version

- `mlx-video` at `9ab4826d20e39286af13a26615c33b403d48be72` (current `main`, installed via `uv pip install git+https://github.com/Blaizzy/mlx-$
- Python 3.11.15, MLX 0.31.1, transformers 5.5.0
- Model repo: `prince-canuma/LTX-2.3-distilled` 
- Text encoder repo: `Lightricks/LTX-2` 
- Platform: macOS 15 (Apple Silicon, MLX/Metal)
  
## Reproduction  
  
Generate two videos with radically different prompts, same seed, T2V mode:

```bash
export HF_HOME=/path/to/hf/cache

python -m mlx_video.models.ltx_2.generate \
  --prompt "a cat sitting on a red couch in a cozy living room, warm lighting" \
  --model-repo prince-canuma/LTX-2.3-distilled \
  --text-encoder-repo Lightricks/LTX-2 \
  --pipeline distilled \
  --num-frames 25 --fps 24 --height 512 --width 512 --seed 1337 \
  --tiling none \
  --output-path /tmp/cat.mp4

python -m mlx_video.models.ltx_2.generate \
  --prompt "a rocket launching from a desert with smoke and fire, bright sunny sky" \
  --model-repo prince-canuma/LTX-2.3-distilled \
  --text-encoder-repo Lightricks/LTX-2 \
  --pipeline distilled \
  --num-frames 25 --fps 24 --height 512 --width 512 --seed 1337 \
  --tiling none \
  --output-path /tmp/rocket.mp4
    
md5 /tmp/cat.mp4 /tmp/rocket.mp4

```  

**Expected:** two visibly different videos with different MD5s.
        
**Actual:** two bit-identical files.
         
```
MD5 (/tmp/cat.mp4)    = 566cc11c069b40656311c6846365484f
MD5 (/tmp/rocket.mp4) = 566cc11c069b40656311c6846365484f   
```

Same behavior in `--pipeline dev` (tested with `--steps 15 --cfg-scale 3.0`), same behavior in I2V with `--image /path/to/image.png`.
    
## Root cause

In `mlx_video/models/ltx_2/text_encoder.py` around lines 877–897, the tokenizer load has a three-step fallback chain:

```python
tokenizer_path = model_path / "tokenizer"
if tokenizer_path.exists():
    self.processor = AutoTokenizer.from_pretrained(
        str(tokenizer_path), trust_remote_code=True
    )
else:
    try:
        self.processor = AutoTokenizer.from_pretrained(
            text_encoder_path, trust_remote_code=True
        )
    except Exception:
        self.processor = AutoTokenizer.from_pretrained( 
            "google/gemma-3-12b-it", trust_remote_code=True
        )
self.processor.padding_side = "left"
```
    
With `prince-canuma/LTX-2.3-distilled` as `model_path` and `Lightricks/LTX-2` as `text_encoder_path`:

1. **Step 1 fails**: `prince-canuma/LTX-2.3-distilled` does not ship a top-level `tokenizer/` subdirectory.
2. **Step 2 fails**: `text_encoder_path` points at the **root** of the `Lightricks/LTX-2` repo, but the tokenizer files in that repo live in `$
3. **Step 3 succeeds** and silently loads `google/gemma-3-12b-it` — which is a valid Gemma tokenizer, but its vocab does not match the LTX-fin$

Downstream effect, verified by adding prints to `LTX2TextEncoder.encode()`:
    
```
Prompt: 'a cat on a couch'
  input_ids[0]: [0, 0, 0, 0, 0, ..., 0, 3]
  attention_mask sum: 1
        
Prompt: 'a rocket launching'
  input_ids[0]: [0, 0, 0, 0, 0, ..., 0, 3]
  attention_mask sum: 1
```
            
Every prompt → 1023 pad tokens + one token id `3`. `num_valid = 1` at every call site that checks the attention mask. The V2 feature extracto$ 

For comparison, loading the tokenizer from the correct path `Lightricks/LTX-2/tokenizer/` produces:
    
```
input_ids:       [0, 0, ..., 0, 2, 236746, 5866, 580, 496, 29919]
attention_mask:  [0, 0, ..., 0, 1, 1,      1,    1,   1,   1]
```

Real tokens, real attention mask, real embedding differences across prompts. Video output then differs across prompts as expected.

## Suggested fix

The simplest fix is to try `text_encoder_path / "tokenizer"` as an additional candidate before the exception handler, and remove (or at least $
  
```python
tokenizer_candidates = [
    model_path / "tokenizer",
    Path(str(text_encoder_path)) / "tokenizer",
    Path(str(text_encoder_path)),
]  
            
for candidate in tokenizer_candidates:
    if isinstance(candidate, Path) and not candidate.exists():
        continue
    try:
        self.processor = AutoTokenizer.from_pretrained(
            str(candidate), trust_remote_code=True
        )
        break
    except Exception as e:
        last_err = e
        continue
else:
    raise RuntimeError(
        f"Could not load a tokenizer from model_path={model_path} or "
        f"text_encoder_path={text_encoder_path}. Last error: {last_err}"
    )
```
    
Removing the `google/gemma-3-12b-it` fallback entirely is probably the right call — it's almost always the wrong tokenizer for the LTX-fine-tu$
    
## Workaround (for users hitting this now)
            
Monkey-patch `LTX2TextEncoder.load()` at runtime to override `self.processor` with the correct tokenizer from the `text_encoder_path / "tokeni$
    
## Impact
    
Prior to discovering this, I spent several hours running seed sweeps, sigma-schedule patches, and resolution experiments trying to "fix prompt$
            
Happy to submit a PR if that would help.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

LTX2TextEncoder silently falls back to wrong tokenizer → every prompt produces identical output #26

Summary

Version

Reproduction

Root cause

Suggested fix

Workaround (for users hitting this now)

Impact

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Uh oh!

LTX2TextEncoder silently falls back to wrong tokenizer → every prompt produces identical output #26

Description

Summary

Version

Reproduction

Root cause

Suggested fix

Workaround (for users hitting this now)

Impact

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions