I am Xiangtai Li, I work on computer vision, multi-modal learning and related problems.
I am working as a Research Scientist in Bytedance Seed (Tiktok), Singapore.
Previously, I worked as a Research Fellow at MMLab@NTU, S-Lab advised by Prof.Chen Change Loy.
I obtained my PhD degree at Peking University (PKU) under the supervision of Prof.Yunhai Tong, and my bachelorโs degree at Beijing University of Posts and Telecommunications (BUPT).
Previously, I worked as research intern or research scientist in DeepMotion (Now Xiaomi Car) / JD Exploration Academy / Sensetime Research / Shanghai AI Laboratory / Skywork 2050 Research, with several research outputs on top conference and journals.
My research topics are:
Large Language Models (LLM) and Auto-regressive model.
Multi-modal learning with LLMs (MLLM): Benchmarking, New Architecture Design, Unified Modeling.
Diffusion Models, Image/Video Generation, Editing and Synthesis.
Previously, I did some works on image/video segmentation and detection, open vocabulary learning.
Moreover, the code and models for my works (maybe 98%), including the ones I have deeply contributed to, are open-sourced on GitHub.
I serve as a regular reviewer for lots of conference and journals, including CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, IJCAI, IEEE-TIP, IEEE-TPAMI, IJCV, IEEE-TSCVT, IEEE-TMM, IEEE-TGRS, Remote Sensing.
I also serve as an area chair for ICLR-2025, ICML-2025.
Remote discussion and cooperation is welcome!
๐ฅ News
- 2024.09๏ผ ย ๐๐ Several works are accepted by NeurIPS-2024. OMG-LLaVA, MotionBooth (spotlight), SemFlow, MamabaAD. Thanks for all co-authorsโ help!
- 2024.07๏ผ ย ๐๐ Our Transformer Survey is finally accepted by T-PAMI. Arxiv.
- 2024.07: ๐ฅ๐ฅ The training code of Edge-SAM and corresponding app, โCutchaโ in IOS shop, are available now, link. Code.
- 2024.07๏ผ ๐ฅ๐ฅ Checkout our recent Universal Dense MLLM Model, OMG-LLaVA, project, code.
- 2024.07๏ผ ย ๐๐ DVIS-DAQ, Open-Vocabulary SAM, FaceAdapter, and GenView are accepted by ECCV-2024. All code and models are released.
- 2024.06๏ผ ๐ฅ๐ฅ Checkout our recent works on MLLM and new architecture design, OMG-LLaVA, RWKV-SAM, MotionBooth, SeTok and Reason3D.
๐ Publications
* means equal contribution.
Several Works
Code can be found in this.
๐ Educations
-
2017.09 - 2022.07, PhD in Peking University (PKU).
-
2013.09 - 2017.07, Bachelor in Beijing University of Posts and Telecommunications (BUPT).
๐ฌ Invited Talks
- 2024.03 Invited talk on Open-Vocabulary Segmentation and Segment Anything at VALSE, online. Slide, Video.
- 2023.08 Invited talk on Video Segmentation at VALSE, online. Slides, Video.
- 2022.05 Invited talk on Panoptic Segmentation and Beyond in Baidu PaddleSeg Group.
๐ป Internships
-
SenseTime, mentored by Dr. Guangliang Cheng and Dr. Jianping Shi.
-
JD AI (remote cooperation), mentored by Dr. Yibo Yang and Prof. Dacheng Tao.
-
DeepMotion (Now Xiaomi Car), mentored by Dr. Kuiyuan Yang.