I am Xiangtai Li, I work on computer vision, multi-modal learning and related problems.

I am working as a Research Scientist in Bytedance Seed (Tiktok), Singapore.

Previously, I worked as a Research Fellow at MMLab@NTU, S-Lab advised by Prof.Chen Change Loy.

I obtained my PhD degree at Peking University (PKU) under the supervision of Prof.Yunhai Tong, and my bachelor’s degree at Beijing University of Posts and Telecommunications (BUPT).

Previously, I worked as research intern or research scientist in DeepMotion (Now Xiaomi Car) / JD Exploration Academy / Sensetime Research / Shanghai AI Laboratory / Skywork 2050 Research, with several research outputs on top conference and journals.

My research topics are:

Multi-modal learning with LLMs (MLLM): Benchmarking, New Architecture Design, Unified Modeling.

Large Language Models (LLM) and Auto-regressive model.

Image/Video Generation, Editing and Synthesis. (Diffusion Models)

Previously, I did some works on image/video segmentation and detection, open vocabulary learning.

Moreover, the code and models for my works (maybe 98%), including the ones I have deeply contributed to, are open-sourced on GitHub.

I serve as a regular reviewer for lots of conference and journals, including CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, IJCAI, IEEE-TIP, IEEE-TPAMI, IJCV, IEEE-TSCVT, IEEE-TMM, IEEE-TGRS, Remote Sensing.

I also serve as an area chair for ICLR-2025, ICML-2025, and ICCV-2025.

πŸ”₯ News

  • 2025.01 Β  πŸ”₯πŸ”₯ Checkout our recent works on video MLLM, Sa2VA, combine both SAM-2 and LLaVA in one-shot.
  • 2024.12 Β  πŸ”₯πŸ”₯ Serving as an Area Chair for both ICML-2025 and ICCV-2025!
  • 2024.12 Β πŸŽ‰πŸŽ‰ Several works on AAAI-2025 and 3DV-2025. Point Cloud Mamba, Point RWKV, LDM-Seg, ReasonSeg3D.
  • 2024.09: Β πŸŽ‰πŸŽ‰ Several works on NeurIPS-2024. OMG-LLaVA, MotionBooth (spotlight), SemFlow, MamabaAD. Thanks for all co-authors’ help!
  • 2024.07: Β πŸŽ‰πŸŽ‰ Our Transformer Survey is finally accepted by T-PAMI. Arxiv.
  • 2024.07: πŸ”₯πŸ”₯ The training code of Edge-SAM and corresponding app, β€œCutcha” in IOS shop, are available now, link. Code.
  • 2024.07: πŸ”₯πŸ”₯ Checkout our recent Universal Dense MLLM Model, OMG-LLaVA, project, code.
  • 2024.07: Β πŸŽ‰πŸŽ‰ DVIS-DAQ, Open-Vocabulary SAM, FaceAdapter, and GenView are accepted by ECCV-2024. All code and models are released.
  • 2024.06: πŸ”₯πŸ”₯ Checkout our recent works on MLLM and new architecture design, OMG-LLaVA, RWKV-SAM, MotionBooth, SeTok and Reason3D.

πŸ“ Publications

* means equal contribution.

Several Works

  • OMG-Seg: Is One Model Good Enough For All Segmentation?, Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy CVPR 2024 One model to perform image/video/open-vocabulary/multi-dataset/interactive segmentation in one shot. | Project Page
  • Transformer-Based Visual Segmentation: A Survey, Xiangtai Li, Henghui Ding, Haobo Yuan, Wenwei Zhang, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy T-PAMI 2024 The first survey that summarizes the transformer-based segmentation method from technical views. | Github
  • Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation, Xiangtai Li, Haobo Yuan, Wenwei Zhang, Guangliang Cheng, Jiangmiao Pang, Chen Change Loy, ICCV 2023 The first unified SOTA universal video segmentation model. | Project
  • Towards Open Vocabulary Learning: A Survey, Jianzong Wu*, Xiangtai Li*, Shilin Xu*, Haobo Yuan*, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, Dacheng Tao T-PAMI 2024 The first survey on open-vocabulary learning. | Project
  • Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation, Xiangtai Li*, Wenwei Zhang*, Jiangmiao Pang*, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change Loy, CVPR 2022 (Oral, top2%) The first unified video segmentation model and codebase for VPS, VIS, VSS | Code
  • Semantic Flow for Fast and Accurate Scene Parsing, Xiangtai Li, Ansheng You, Zhen Zhu, Houlong Zhao, Maoke Yang, Kuiyuan Yang, Yunhai Tong, ECCV 2020 (Oral, top2%) The first real-time model over 80% mIoU on Cityscapes test set. | Code
  • TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers , Qianyu Zhou*, Xiangtai Li* , Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, Dacheng Tao, T-PAMI 2023 The first End-to-End Vision Transformer for Video Object Detection and STOA results on Video Object Detection | Code
  • Code can be found in this.

    πŸ“– Educations

    • 2017.09 - 2022.07, PhD in Peking University (PKU).

    • 2013.09 - 2017.07, Bachelor in Beijing University of Posts and Telecommunications (BUPT).

    πŸ’¬ Invited Talks

    • 2024.03 Invited talk on Open-Vocabulary Segmentation and Segment Anything at VALSE, online. Slide, Video.
    • 2023.08 Invited talk on Video Segmentation at VALSE, online. Slides, Video.
    • 2022.05 Invited talk on Panoptic Segmentation and Beyond in Baidu PaddleSeg Group.

    πŸ’» Internships

    • SenseTime, mentored by Dr. Guangliang Cheng and Dr. Jianping Shi.

    • JD AI (remote cooperation), mentored by Dr. Yibo Yang and Prof. Dacheng Tao.

    • DeepMotion (Now Xiaomi Car), mentored by Dr. Kuiyuan Yang.