Lumina-Video

Official repository for Lumina-Video, a preliminary tryout of the Lumina series for Video Generation

📽️ Gallery

Text to Video results

1.mp4	2.mp4	3.mp4
4.mp4	5.mp4	6.mp4
7.mp4	8.mp4	9.mp4
10.mp4	11.mp4	12.mp4

Text to Video+Audio results

1.mp4	2.mp4	3.mp4
4.mp4	5.mp4	6.mp4

📰 News

[2025-02-10] 🎉🎉🎉 Technical Report is released! 🎉🎉🎉
[2025-02-09] 🎉🎉🎉 Lumina-Video is released! 🎉🎉🎉

⚙️ Installation

See INSTALL.md for detailed instructions.

🤗 Checkpoints

T2V models

resolution	fps	max frames	Huggingface
960	24	96	Alpha-VLLM/Lumina-Video-f24R960

⛽ Inference

Preparations

Download the checkpoints before continue. You can use the following code to download the checkpoints to the ./ckpts directory

huggingface-cli download --resume-download Alpha-VLLM/Lumina-Video-f24R960 --local-dir ./ckpts/f24R960

Inference

You can quickly run video generation using the command below:

# Example for generatingan video with 4s duration, fps=24, resolution=1248x704
python -u generate.py \
    --ckpt ./ckpts/f24R960 \
    --resolution 1248x704 \
    --fps 24 \
    --frames 96 \
    --prompt "your prompt here" \
    --neg_prompt "" \
    --sample_config f24F96R960  # set to "f24F96R960-MultiScale" for efficient multi-scale inference

QAs

Q1: Why using the 1248x704 resolution?

A1: The resolution is originally expected to be 1280x720. However, to ensure compatibility with the largest patch size (smallest scale), both the width and height must be divisible by 32. As a result, the resolution is adjusted to 1248x704.

Q2: Does the model support flexible aspect ratio?

A2: Yes, you can use the following code for checking all usable resolutions

# Python
from imgproc import generate_crop_size_list

target_size = 960
patch_size = 32
max_num_patches = (target_size // patch_size) ** 2
crop_size_list = generate_crop_size_list(max_num_patches, patch_size)

print(crop_size_list)

Training

Preparations

Before starting the training process, two preparation steps are required to optimize training efficiency and enable motion conditioning:

Pre-extract and cache VAE latents for video data: This significantly enhances training speed.
Compute motion scores for videos: These are used for micro-conditioning input during training.

Pre-Extract VAE Latents

The code for pre-extracting and caching VAE latents can be found in the ./tools/pre_extract directory. For an example of how to run this, refer to the run.sh script.

Compute Motion Score

We use UniMatch to estimate optical flow, with the average optical flow serving as the motion score. This code is primarily derived from Open-Sora, and we'd like to thank them for their excellent work!

The code for computing motion scores is available in the ./tools/unimatch directory. To see how to run it, refer to the run.sh script.

Training

Once the data has been prepared, you're ready to start training! For an example, you can refer to the training directory, which demonstrates how to train with:

FPS: 8
Duration: 4 seconds
Resolution: widthxheight≈256x256
Training Techniques: Image-text joint training and multi-scale training applied together.

📑 Open-source Plan

Inference code
Training code

📃 Citation

@misc{luminavideo,
      title={Lumina-Video: Efficient and Flexible Video Generation with Multi-scale Next-DiT}, 
      author={Dongyang Liu and Shicheng Li and Yutong Liu and Zhen Li and Kai Wang and Xinyue Li and Qi Qin and Yufei Liu and Yi Xin and Zhongyu Li and Bin Fu and Chenyang Si and Yuewen Cao and Conghui He and Ziwei Liu and Yu Qiao and Qibin Hou and Hongsheng Li and Peng Gao},
      year={2025},
      eprint={2502.06782},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2502.06782}, 
}

Name		Name	Last commit message	Last commit date
Latest commit History 12 Commits
assets		assets
configs		configs
data		data
models		models
tools		tools
train_exps/f8F32R256		train_exps/f8F32R256
transport		transport
utils		utils
.gitattributes		.gitattributes
.isort.cfg		.isort.cfg
.pre-commit-config.yaml		.pre-commit-config.yaml
INSTALL.md		INSTALL.md
LICENSE		LICENSE
Lumina Video Report V1.pdf		Lumina Video Report V1.pdf
README.md		README.md
generate.py		generate.py
imgproc.py		imgproc.py
requirements.txt		requirements.txt
train.py		train.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Lumina-Video

📽️ Gallery

Text to Video results

Text to Video+Audio results

📰 News

⚙️ Installation

🤗 Checkpoints

⛽ Inference

Preparations

Inference

QAs

Training

Preparations

Pre-Extract VAE Latents

Compute Motion Score

Training

📑 Open-source Plan

📃 Citation

About

Releases

Packages

Contributors 2

Languages

License

Alpha-VLLM/Lumina-Video

Folders and files

Latest commit

History

Repository files navigation

Lumina-Video

📽️ Gallery

Text to Video results

Text to Video+Audio results

📰 News

⚙️ Installation

🤗 Checkpoints

⛽ Inference

Preparations

Inference

QAs

Training

Preparations

Pre-Extract VAE Latents

Compute Motion Score

Training

📑 Open-source Plan

📃 Citation

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages