I am Xiangtai Li, I work on computer vision, multi-modal learning and related problems.
I am working as a Research Scientist in Bytedance Seed (Tiktok), Singapore.
Previously, I worked as a Research Fellow at MMLab@NTU, S-Lab advised by Prof.Chen Change Loy.
I obtained my PhD degree at Peking University (PKU) under the supervision of Prof.Yunhai Tong, and my bachelorβs degree at Beijing University of Posts and Telecommunications (BUPT).
Previously, I worked as research intern or research scientist in DeepMotion (Now Xiaomi Car) / JD Exploration Academy / Sensetime Research / Shanghai AI Laboratory / Skywork 2050 Research, with several research outputs on top conference and journals.
My research topics are:
Multi-modal learning with LLMs (MLLM): Benchmarking, New Architecture Design, Unified Modeling.
Large Language Models (LLM) and Auto-regressive model.
Image/Video Generation, Editing and Synthesis. (Diffusion Models)
Previously, I did some works on image/video segmentation and detection, open vocabulary learning.
Moreover, the code and models for my works (maybe 98%), including the ones I have deeply contributed to, are open-sourced on GitHub.
I serve as a regular reviewer for lots of conference and journals, including CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, IJCAI, IEEE-TIP, IEEE-TPAMI, IJCV, IEEE-TSCVT, IEEE-TMM, IEEE-TGRS, Remote Sensing.
I also serve as an area chair for ICLR-2025, ICML-2025, and ICCV-2025.
π₯ News
- 2025.01 Β π₯π₯ Checkout our recent works on video MLLM, Sa2VA, combine both SAM-2 and LLaVA in one-shot.
- 2024.12 Β π₯π₯ Serving as an Area Chair for both ICML-2025 and ICCV-2025!
- 2024.12 Β ππ Several works on AAAI-2025 and 3DV-2025. Point Cloud Mamba, Point RWKV, LDM-Seg, ReasonSeg3D.
- 2024.09οΌ Β ππ Several works on NeurIPS-2024. OMG-LLaVA, MotionBooth (spotlight), SemFlow, MamabaAD. Thanks for all co-authorsβ help!
- 2024.07οΌ Β ππ Our Transformer Survey is finally accepted by T-PAMI. Arxiv.
- 2024.07: π₯π₯ The training code of Edge-SAM and corresponding app, βCutchaβ in IOS shop, are available now, link. Code.
- 2024.07οΌ π₯π₯ Checkout our recent Universal Dense MLLM Model, OMG-LLaVA, project, code.
- 2024.07οΌ Β ππ DVIS-DAQ, Open-Vocabulary SAM, FaceAdapter, and GenView are accepted by ECCV-2024. All code and models are released.
- 2024.06οΌ π₯π₯ Checkout our recent works on MLLM and new architecture design, OMG-LLaVA, RWKV-SAM, MotionBooth, SeTok and Reason3D.
π Publications
* means equal contribution.
Several Works
Code can be found in this.
π Educations
-
2017.09 - 2022.07, PhD in Peking University (PKU).
-
2013.09 - 2017.07, Bachelor in Beijing University of Posts and Telecommunications (BUPT).
π¬ Invited Talks
- 2024.03 Invited talk on Open-Vocabulary Segmentation and Segment Anything at VALSE, online. Slide, Video.
- 2023.08 Invited talk on Video Segmentation at VALSE, online. Slides, Video.
- 2022.05 Invited talk on Panoptic Segmentation and Beyond in Baidu PaddleSeg Group.
π» Internships
-
SenseTime, mentored by Dr. Guangliang Cheng and Dr. Jianping Shi.
-
JD AI (remote cooperation), mentored by Dr. Yibo Yang and Prof. Dacheng Tao.
-
DeepMotion (Now Xiaomi Car), mentored by Dr. Kuiyuan Yang.