I am Xiangtai Li, I work on computer vision, multi-modal learning and related problems.

I am working as a Research Scientist in Bytedance Seed (Tiktok), Singapore.

Previously, I worked as a Research Fellow at MMLab@NTU, S-Lab advised by Prof.Chen Change Loy.

I obtained my PhD degree at Peking University (PKU) under the supervision of Prof.Yunhai Tong, and my bachelorโ€™s degree at Beijing University of Posts and Telecommunications (BUPT).

Previously, I worked as research intern or research scientist in DeepMotion (Now Xiaomi Car) / JD Exploration Academy / Sensetime Research / Shanghai AI Laboratory / Skywork 2050 Research, with several research outputs on top conference and journals.

My research topics are:

Large Language Models (LLM) and Auto-regressive model.

Multi-modal learning with LLMs (MLLM): Benchmarking, New Architecture Design, Unified Modeling.

Diffusion Models, Image/Video Generation, Editing and Synthesis.

Previously, I did some works on image/video segmentation and detection, open vocabulary learning.

Moreover, the code and models for my works (maybe 98%), including the ones I have deeply contributed to, are open-sourced on GitHub.

I serve as a regular reviewer for lots of conference and journals, including CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, IJCAI, IEEE-TIP, IEEE-TPAMI, IJCV, IEEE-TSCVT, IEEE-TMM, IEEE-TGRS, Remote Sensing.

I also serve as an area chair for ICLR-2025, ICML-2025.

Remote discussion and cooperation is welcome!

๐Ÿ”ฅ News

  • 2024.09๏ผš ย ๐ŸŽ‰๐ŸŽ‰ Several works are accepted by NeurIPS-2024. OMG-LLaVA, MotionBooth (spotlight), SemFlow, MamabaAD. Thanks for all co-authorsโ€™ help!
  • 2024.07๏ผš ย ๐ŸŽ‰๐ŸŽ‰ Our Transformer Survey is finally accepted by T-PAMI. Arxiv.
  • 2024.07: ๐Ÿ”ฅ๐Ÿ”ฅ The training code of Edge-SAM and corresponding app, โ€œCutchaโ€ in IOS shop, are available now, link. Code.
  • 2024.07๏ผš ๐Ÿ”ฅ๐Ÿ”ฅ Checkout our recent Universal Dense MLLM Model, OMG-LLaVA, project, code.
  • 2024.07๏ผš ย ๐ŸŽ‰๐ŸŽ‰ DVIS-DAQ, Open-Vocabulary SAM, FaceAdapter, and GenView are accepted by ECCV-2024. All code and models are released.
  • 2024.06๏ผš ๐Ÿ”ฅ๐Ÿ”ฅ Checkout our recent works on MLLM and new architecture design, OMG-LLaVA, RWKV-SAM, MotionBooth, SeTok and Reason3D.

๐Ÿ“ Publications

* means equal contribution.

Several Works

  • OMG-Seg: Is One Model Good Enough For All Segmentation?, Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy CVPR 2024 One model to perform image/video/open-vocabulary/multi-dataset/interactive segmentation in one shot. | Project Page
  • Transformer-Based Visual Segmentation: A Survey, Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy T-PAMI 2024 The first survey that summarizes the transformer-based segmentation method from technical views. | Github
  • Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation, Xiangtai Li, Haobo Yuan, Wenwei Zhang, Guangliang Cheng, Jiangmiao Pang, Chen Change Loy, ICCV 2023 The first unified SOTA universal video segmentation model. | Project
  • Towards Open Vocabulary Learning: A Survey, Jianzong Wu*, Xiangtai Li*, Shilin Xu*, Haobo Yuan*, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, Dacheng Tao T-PAMI 2024 The first survey on open-vocabulary learning. | Project
  • Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation, Xiangtai Li*, Wenwei Zhang*, Jiangmiao Pang*, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change Loy, CVPR 2022 (Oral, top2%) The first unified video segmentation model and codebase for VPS, VIS, VSS | Code
  • Semantic Flow for Fast and Accurate Scene Parsing, Xiangtai Li, Ansheng You, Zhen Zhu, Houlong Zhao, Maoke Yang, Kuiyuan Yang, Yunhai Tong, ECCV 2020 (Oral, top2%) The first real-time model over 80% mIoU on Cityscapes test set. | Code
  • TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers , Qianyu Zhou*, Xiangtai Li* , Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, Dacheng Tao, T-PAMI 2023 The first End-to-End Vision Transformer for Video Object Detection and STOA results on Video Object Detection | Code
  • Code can be found in this.

    ๐Ÿ“– Educations

    • 2017.09 - 2022.07, PhD in Peking University (PKU).

    • 2013.09 - 2017.07, Bachelor in Beijing University of Posts and Telecommunications (BUPT).

    ๐Ÿ’ฌ Invited Talks

    • 2024.03 Invited talk on Open-Vocabulary Segmentation and Segment Anything at VALSE, online. Slide, Video.
    • 2023.08 Invited talk on Video Segmentation at VALSE, online. Slides, Video.
    • 2022.05 Invited talk on Panoptic Segmentation and Beyond in Baidu PaddleSeg Group.

    ๐Ÿ’ป Internships

    • SenseTime, mentored by Dr. Guangliang Cheng and Dr. Jianping Shi.

    • JD AI (remote cooperation), mentored by Dr. Yibo Yang and Prof. Dacheng Tao.

    • DeepMotion (Now Xiaomi Car), mentored by Dr. Kuiyuan Yang.