I am Xiangtai Li. I work on computer vision, multi-modal learning, and related problems.
I am working as a Senior Research Scientist at TikTok (ByteDance), Singapore.
Our team works on applications and research for TikTok Live. Topics cover multi-modal large language models, diffusion models, and LLM reasoning.
Previously, I worked as a Research Fellow at MMLab@NTU, S-Lab, advised by Prof. Chen Change Loy.
I obtained my PhD degree from Peking University (PKU) under the supervision of Prof. Yunhai Tong, and my bachelor’s degree from Beijing University of Posts and Telecommunications (BUPT).
My research topics focus on three main aspects:
-
Multi-modal learning with LLMs (MLLM): unified modeling, benchmarking, dataset pipeline building, RL-based post-training.
-
Image/video generation and editing, controllable image/video generation.
Previously, I worked on image/video segmentation and detection, and open vocabulary learning.
Moreover, the code and models for my work (about 98%), including the projects I have deeply contributed to, are open-sourced on GitHub.
I serve as a regular reviewer for many conferences and journals, including CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, IJCAI, IEEE-TIP, IEEE-TPAMI, IJCV, IEEE-TSCVT, IEEE-TMM, and IEEE-TGRS.
I also serve as an Area Chair for ICLR-2025/2026, CVPR-2026, ICML-2025, ICCV-2025, NeurIPS-2025, AAAI-2025/2026, WACV-2026, and ECCV-2026.
In addition, I also serve as an Associate Editor for T-PAMI.
I am looking for interns with LLM/Diffusion model infra background.
I am also looking for algorithm engineers with vLLM/SGLong background knowledge.
My email addresses are xiangtai94@gmail.com and xiangtai.li@bytedance.com. Welcome to discuss.