Add Photon model and pipeline support #12456

DavidBert · 2025-10-09T13:21:05Z

This commit adds support for the Photon image generation model:

PhotonTransformer2DModel: Core transformer architecture
PhotonPipeline: Text-to-image generation pipeline
Attention processor updates for Photon-specific attention mechanism
Conversion script for loading Photon checkpoints
Documentation and tests

What does this PR do?

Fixes # (issue)

Before submitting

This PR fixes a typo or improves the docs (you can dismiss the other checks if that's the case).
Did you read the contributor guideline?
Did you read our philosophy doc (important for complex PRs)?
Was this discussed/approved via a GitHub issue or the forum? Please add a link to it if that's the case.
Did you make sure to update the documentation with your changes? Here are the
documentation guidelines, and
here are tips on formatting docstrings.
Did you write any new necessary tests?

Who can review?

Anyone in the community is free to review the PR once the tests have passed. Feel free to tag
members/contributors who may be interested in your PR.

This commit adds support for the Photon image generation model: - PhotonTransformer2DModel: Core transformer architecture - PhotonPipeline: Text-to-image generation pipeline - Attention processor updates for Photon-specific attention mechanism - Conversion script for loading Photon checkpoints - Documentation and tests

DavidBert · 2025-10-09T13:21:46Z

scripts/convert_photon_to_diffusers.py

+    print("✓ Created scheduler config")
+
+
+def download_and_save_vae(vae_type: str, output_path: str):


I'm not sure on this one: I'm saving the VAE weights while they are already available on the Hub (Flux VAE and DC-AE).
Is there a way to avoid storing them and instead look directly for the original ones?

For now, it's okay to keep this as is. This way, everything is under the same model repo.

DavidBert · 2025-10-09T13:22:22Z

scripts/convert_photon_to_diffusers.py

+    print(f"✓ Saved VAE to {vae_path}")
+
+
+def download_and_save_text_encoder(output_path: str):


Same here for the Text Encoder.

sayakpaul · 2025-10-09T13:40:52Z

scripts/convert_photon_to_diffusers.py

+    print("✓ Created scheduler config")
+
+
+def download_and_save_vae(vae_type: str, output_path: str):


For now, it's okay to keep this as is. This way, everything is under the same model repo.

src/diffusers/pipelines/photon/pipeline_output.py

src/diffusers/models/attention_processor.py

sayakpaul · 2025-10-09T13:43:15Z

src/diffusers/models/transformers/transformer_photon.py

+from einops import rearrange
+from einops.layers.torch import Rearrange


We need to get rid of the einops dependency and use native PyTorch ops here.

I changed it for native Pytorch. Out of curiosity why do you recommend avoiding using einops?

We try to avoid additional dependencies especially when things can be done in native PyTorch.

sayakpaul · 2025-10-09T13:43:30Z

src/diffusers/models/transformers/transformer_photon.py

+    return xq_out.reshape(*xq.shape).type_as(xq)
+
+
+class EmbedND(nn.Module):


Does this share similarity with Flux?

Yes it comes from the BFL original implementation.
I tried to modify and use the logic from transformer_flux.py but I didn't manage to make it work without heavy changes and additional complexity.
I added a comment to explicitely say that it come from there. Is it OK for you or do you want me to continue trying to use the code from transformer_flux.py?

Oh okay. Then it's fine to keep it here. I would maybe rename it to PhotoEmbedND and leave a note that it's inspired from Flux. WDYT?

src/diffusers/models/transformers/transformer_photon.py

sayakpaul · 2025-10-09T13:47:11Z

src/diffusers/models/transformers/transformer_photon.py

+
+                - `sample` (`torch.Tensor`): Output latent image of shape `(B, C, H, W)`.
+        """
+        if attention_kwargs is not None:


Could we unify the structure of this block similar to how it's done in QwenImage, for example (barring the bits related to ControlNet, of course)?

diffusers/src/diffusers/models/transformers/transformer_qwenimage.py

Line 600 in a9df12a

if attention_kwargs is not None:

I moved all the logic in this block.

src/diffusers/pipelines/photon/pipeline_photon.py

sayakpaul

Thanks for the clean PR! I left some initial feedback for you. LMK if that makes sense.

Also, it would be great to see some samples of Photon!

DavidBert commented Oct 9, 2025

View reviewed changes