[RFC] v1.2.0 Roadmap

## Path to v1.2.0

- [x] Ascend NPU Support: Context parallelism, FLUX/Qwen-Image @DefTruth  @gameofdimension #651 #653 
- [x] introduce accumulated_rel_l1_diff to reduce accumulated cache error, using in official TeaCache and EasyCache. #444 
- [x] Introduce LeMiCa/EasyCache style custom step compute mask, like: "111110100100000100000010000001", 1: Full compute, 0: dynamic/static cache (hybrid with a autotune function) @DefTruth #444 
- [x] Context Parallelism for any tokens (any resolution, any prompt tokens) @DefTruth #462 
- [x] Support All Gather for any tokens (any resolution, any prompt tokens), for UAA @DefTruth #465 
- [x] Optimize the performance of UAA while using torch.compile (due to the graph break intro by `if branch`) #474 
- [x] Parallelize VAE @DefTruth @tingkuanpei  #645 
- [x] Parallelize Text Encoder @gameofdimension @DefTruth #569 
- [x] Manually Compute and Comm overlap (Attention level or Model level) for Ulysses and UAA, e.g: AsyncUlyssesQKVProj @tingkuanpei @DefTruth 
- [x] Cache and Parallelism support for HunyuanVideo-1.5、FLUX.2、Z-Image @DefTruth @gameofdimension 
- [x] Fused Per Tensor FP8 All2All via triton/cuda kernel @DefTruth  @triple-Mu #524 
- [x] Any Head num support for Ulysses, e.g., Z-Image @DefTruth 
- [x] More CIs @DefTruth 
- [x] official readthedocs.io
- [x] Performance benchmark, NVDIA A800, L20, NPU, etc. @DefTruth #684 
- [ ] GPU CIs: model tests #688 
- [x] mkdocs CIs: check mkdocs build --strict @DefTruth #680 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[RFC] v1.2.0 Roadmap #440

Path to v1.2.0

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

[RFC] v1.2.0 Roadmap #440

Description

Path to v1.2.0

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions