I am Xiangtai Li (李η₯₯ζ³°) and I work on computer vision and related problems.

Previously, I worked as a Research Fellow at MMLab@NTU, S-Lab advised by Prof.Chen Change Loy and research scientist intern in JD AI / Sensetime / Shanghai AI Laboratory.

I obtained my PhD degree at Peking University under the supervision of Prof.Yunhai Tong, and my bachelor’s degree at Beijing University of Posts and Telecommunications.

My research topics are:

1, Scene understanding tasks. (Image/Video/3D detection and segmentation)

2, Multi-modal learning with LLMs. (Connect visual perception with Large language models)

3, Image/Video Generation/Synthesis/Editing. (Diffusion models)

Besides, I am also very interested at aerial image analysis since I am a fun of history and military games.

Moreover, most of my works, including the ones I have profoundly contributed to, are open-sourced on GitHub.

Find the code and models here.

Feel free to contact me via email at xiangtai94@gmail.com or lxtpku@pku.edu.cn.

πŸ”₯ News

  • 2024.04: πŸ”₯πŸ”₯ Checkout our Mamba works. Point Cloud Mamba, MambaAD, and DGMamba.
  • 2024.04: πŸ”₯πŸ”₯ Checkout our open-sourced codebase ADer for the state-of-the-art anomaly detection AD methods.
  • 2024.04: πŸ”₯πŸ”₯ Checkout our new video segmentation work DVIS-DAQ, which achieves the new state-of-the-art results on multiple video segmentation benchmark.
  • 2024.03: πŸ”₯πŸ”₯ The codebase of OMG-Seg is open-sourced! link. This is the first codebase support joint image/video/multi-data/interactive segmentation co-training and testing!
  • 2024.03: Β πŸŽ‰πŸŽ‰ Give a talk of open-world segmentation (Beyond SAM) at VALSE, Slides Video.
  • 2024.02: Β πŸŽ‰πŸŽ‰ OMG-Seg is accepted by CVPR-24. Along with OMG-Seg, five works are accepted by CVPR-24! BA-SAM, RTMO, Skeleton-in-Context, and language-driven video inpainting.
  • 2024.02: Checkout several recent works on segmentation and recognition, OMG-Seg, Open-Vocabulary SAM and RAP-SAM.
  • 2024.01: Β πŸŽ‰πŸŽ‰ Our survey on Open Vocabulary Learning is accepted by T-PAMI.
  • 2023.12: Checkout EdgeSAM, a mobile SAM that can run on iPhone!
  • 2023.10: Checkout our recent works on Open-Vocabulary Detection and Segmentation. DST-Det, CLIPSelf, MosaicFusion.

πŸ“ Publications

Full Publications Per Year can be found in Here.

* means equal contribution.

Code can be found in this.

Selected Arxiv

  • Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively, Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, Chen Change Loy arxiv Combine SAM and CLIP in one model. | Code
  • EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM, Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai arxiv The first mobile SAM model that run on the iPhone. | Code
  • Transformer-Based Visual Segmentation: A Survey, Xiangtai Li, Henghui Ding, Wenwei Zhang, Haobo Yuan, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy arxiv The first comprehensive survey on transformer-based segmentation model. | Project
  • Selected Conference

  • OMG-Seg: Is One Model Good Enough For All Segmentation?, Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy CVPR 2024 One model to perform image/video/open-vocabulary/multi-dataset/interactive segmentation in one shot. | Project Page
  • Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation, Xiangtai Li, Haobo Yuan, Wenwei Zhang, Guangliang Cheng, Jiangmiao Pang, Chen Change Loy, ICCV 2023 The first unified SOTA universal video segmentation model. | Project
  • Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation, Jianzong Wu*, Xiangtai Li*, Henghui Ding, Xia Li, Guangliang Cheng, Yunhai Tong, Chen Change Loy, ICCV 2023 Query-based Open Vocabulary Segmentation aided by Caption. | Project
  • Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation, Xiangtai Li*, Wenwei Zhang*, Jiangmiao Pang*, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change Loy, CVPR 2022 (Oral, top2%) The first unified video segmentation model and codebase for VPS, VIS, VSS | Code
  • Semantic Flow for Fast and Accurate Scene Parsing, Xiangtai Li, Ansheng You, Zhen Zhu, Houlong Zhao, Maoke Yang, Kuiyuan Yang, Yunhai Tong, ECCV 2020 (Oral, top2%) The first real-time model over 80% mIoU on Cityscapes test set. | Code
  • GFF: Gated Fully Fusion for Semantic Segmentation, Xiangtai Li, Houlong Zhao, Lei Han, Yunhai Tong, Kuiyuan Yang, AAAI 2020 (Oral, top3%) | Code
  • Selected Journal

  • Towards Open Vocabulary Learning: A Survey , Jianzong Wu*, Xiangtai Li*, Shilin Xu*, Haobo Yuan*, Henghui Ding, Yibo Yang, Xia Li, Jiangning Zhang, Yunhai Tong, Xudong Jiang, Bernard Ghanem, Dacheng Tao T-PAMI-2024 The first comprehensive survey on open-vocabulary learning. | Project Page
  • TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers , Qianyu Zhou*, Xiangtai Li* , Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, Dacheng Tao, T-PAMI-2022 End-to-End Vision Transformer for Video Object Detection | Code
  • πŸŽ– Honors and Awards

    • National Scholarship, Ministry of Education of China in PKU (year 2020-2021) (year 2019-2020).
    • President Scholarship of PKU (year 2020-2021).
    • 2017, 2022 Beijing Excellent Graduates.
    • 2017, 2022 BUPT Excellent Graduates, PKU Excellent Graduates.
    • 2021.11 Winner of Segmenting and Tracking Every Point and Pixel: 6th Workshop on ICCV-2021 Track2 (Project Leader and First Author).

    πŸ“– Educations

    • 2017.09 - 2022.07, PhD in Peking University (PKU).
    • 2013.09 - 2017.07, Bachelor in Beijing University of Posts and Telecommunications (BUPT).

    πŸ’¬ Invited Talks

    • 2024.03 Invited talk on Open-Vocabulary Segmentation and Segment Anything at VALSE, online. Slide, Video.
    • 2023.08 Invited talk on Video Segmentation at VALSE, online. Slides, Video.
    • 2022.05 Invited talk on Panoptic Segmentation and Beyond in Baidu PaddleSeg Group.
    • 2021.12 Invited talk on Video Segmentation in DiDi Auto-Driving Group.
    • 2021.10 Invited talk on Aligned Segmentation HuaWei Noah Auto-Driving Group.

    πŸ’» Internships

    • SenseTime, mentored by Dr.Guangliang Cheng and Dr. Jianping Shi.
    • JD AI (remote cooperation), mentored by Dr. Yibo Yang and Prof. Dacheng Tao.
    • DeepMotion (Now Xiaomi Car), mentored by Dr. Kuiyuan Yang.
    • Regular Conference Reviewer for CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, IJCAI and Journal Reviewer For IEEE-TIP, IEEE-TPAMI, IJCV.