I am Xiangtai Li, I work on computer vision, multi-modal learning and related problems.
I am working as a Research Scientist in Tiktok (Bytedance), Singapore.
Our team works on the application and research on Tiktok Live. Topics cover multi-modal large langauge models, diffusion models and LLM reasoning.
Previously, I worked as a Research Fellow at MMLab@NTU, S-Lab advised by Prof.Chen Change Loy.
I obtained my PhD degree at Peking University (PKU) under the supervision of Prof.Yunhai Tong, and my bachelor’s degree at Beijing University of Posts and Telecommunications (BUPT).
My research topics are in three aspects:
-
Multi-modal learning with LLMs (MLLM): unified modeling, benchmarking, dataset pipeline building.
-
Image/video generation and editing, controllable image/video generation.
-
Multi-modal agent system design.
Previously, I did some works on image/video segmentation and detection, open vocabulary learning.
Moreover, the code and models for my works (maybe 98%), including the ones I have deeply contributed to, are open-sourced on GitHub.
I serve as a regular reviewer for lots of conference and journals, including CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, IJCAI, IEEE-TIP, IEEE-TPAMI, IJCV, IEEE-TSCVT, IEEE-TMM, IEEE-TGRS.
I also serve as an Area Chair for ICLR-2025, ICML-2025, ICCV-2025, NeurIPS-2025, AAAI-2025/2026, WACV-2026.
I am looking for several self-motivated research interns on MLLMs and diffusion model backgrounds.
I am also looking for full-time research engineers working on multi-modal large language models.
(My E-mail is xiangtai94@gmail.com and xiangtai.li@bytedance.com).