I am Xiangtai Li. I work on computer vision, multi-modal learning, and related problems.
I am working as a Staff Research Scientist at TikTok (ByteDance), Singapore.
Our team works on applications development and research for TikTok Live. Our products and models are used by TT-Live directly and have impact on billions of users.
Topics cover multi-modal large language models, diffusion models, and LLM reasoning.
Previously, I worked as a Research Fellow at MMLab@NTU, S-Lab, advised by Prof. Chen Change Loy.
I obtained my PhD degree from Peking University (PKU) under the supervision of Prof. Yunhai Tong, and my bachelor’s degree from Beijing University of Posts and Telecommunications (BUPT).
My research topics focus on two main aspects:
-
Multi-modal learning with LLMs (MLLM): unified modeling, benchmarking, dataset pipeline building, RL-based post-training, diffusion language models.
-
Image/video generation and editing, controllable image/video generation.
Moreover, the code and models for my work (about 98%), including the projects I have deeply contributed to, are open-sourced on GitHub.
I serve as a regular reviewer for many conferences and journals, including CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, IJCAI, IEEE-TIP, IEEE-TPAMI, IJCV, IEEE-TSCVT, IEEE-TMM, and IEEE-TGRS.
I also serve as an Area Chair for ICLR-2025/2026, CVPR-2026, ICML-2025, ICCV-2025, NeurIPS-2025, AAAI-2025/2026, WACV-2026, and ECCV-2026.
In addition, I also serve as an Associate Editor for T-PAMI.
I am looking for strong interns with LLM/Diffusion infra and/or AIGC background, location: Beijing and Singapore. [Urgent!] Candidates with strong infra ability first. (ByteIntern/筋斗云实习生)
My email addresses are xiangtai94@gmail.com and xiangtai.li@bytedance.com. Feel free to contact me directly.