Yihan Xie1, Jianxiang An1, Boxiang Yun2, Chenglin Yang1, Jun Xiao1, Guangyu Guo2,1,
Jiawen Yao2, Wei Liu2, Yuan Gao2, Ke Yan2, Weiwei Cao2, Zhilin Zheng2
Tony C. W. MOK2, Kai Cao4, Yu Shi5, Jiuyu Zhang5, Jian Zhou6
Beng Chin Ooi1, Yingda Xia†2, Ling Zhang2
1Zhejiang University 2DAMO Academy, Alibaba Group 3Hupan Lab
4Shanghai Institution of Pancreatic Disease 5Shengjing Hospital of China Medical University 6Sun Yat-sen University Cancer Center
Welcome to TumorChain!
Our goal is to advance clinical tumor analysis through reliable multimodal reasoning at scale. This project presents a cohesive three-part framework—Dataset, Benchmark, and Model—to enable safe, explainable, and reproducible tumor assessment in high-stakes settings.
- Establish a closed-loop multimodal reasoning pipeline that standardizes the path from findings to impressions to pathology.
- Create high-quality benchmarks and reproducible evaluation protocols to enable cross-institution comparison and robust generalization.
- Deliver an interpretable, calibrated, and traceable multimodal framework that reduces hallucinations and supports real-world clinical decision-making.
We introduce TumorCoT-1.5M — a large-scale dataset comprising 1.5 million Chain-of-Thought (CoT) labeled VQA prompts, paired with 3D CT scans, featuring stepwise reasoning and cross-modal alignments along the findings–impression–pathology trajectory.
TumorChain is a multi-modal, iterative interleaved reasoning framework for 3D CT tumor analysis that fuses a 3D vision encoder, organ segmentation model, auxiliary classification model, an MLP projector, and a large language model (LLM) to perform stepwise, evidence-grounded reasoning from findings to impressions to pathology, with traceable evidence and calibrated uncertainty.
😊 We will release our task definitions, benchmarks, and evaluation protocols in the near future to advance safe, explainable, and reproducible multimodal reasoning for high-stakes tumor analysis. 🚀



