I am Xiangtai Li (李η₯₯ζ³°) and I work as a Research Fellow at MMLab@NTU, S-Lab advised by Prof.Chen Change Loy.

I obtained my PhD degree at Peking University under the supervision of Prof.Yunhai Tong, and my bachelor’s degree at Beijing University of Posts and Telecommunications. I also worked closely with Prof.Zhouchen Lin and Prof.Dacheng Tao.

My research work focuses on several directions, including pixel-wise scene understanding for video/image scene understanding (such as semantic/instance/panoptic segmentation and object detection), their zero/few shot variants, general deep learning methods with applications (such as vision transformer, efficient model design, and neural collapse), and vision meets language (including open vocabulary learning, visual prompting, and visual grounding) and foundation model tuning.

Besides, I am also very interested at aerial image analysis since I am a fun of history and military games.

During my PhD, I conducted research on image/video semantic/instance/panoptic segmentation, as well as several related problems.

Currently, I am looking for Research Scientist Position in the Industry (start from 2024).

Feel free to connect me at lxtpku@pku.edu.cn and xiangtai.li@ntu.edu.sg.

πŸ”₯ News

  • 2023.09: Β πŸŽ‰πŸŽ‰ Two NeurIPS Paper are accepted as SpotLight. PSG4D and Point-In-Context.
  • 2023.08: Give a talk of video segmentation at Valse and Slides.
  • 2023.07: Β πŸŽ‰πŸŽ‰ Three paper in ICCV-23: Tube-Link, Betrayed Caption and EMO-Net. One Oral Paper in ICCV-23 workshop. See you in Paris!! SFNet-Lite is accepted by IJCV.
  • 2023.06: Checkout our new paper on point cloud in-context learning and the first survey on Open Vocabulary Learning.
  • 2023.03: Checkout our new survey on transformer-based segmentation and detection, Also Video Talk, Chinese, Link.
  • 2023.03:Please checkout our new work, Tube-Link, the first universal video segmentation framework that outperforms specific video segmentation methods (VIS,VSS,VPS).
  • 2023.03:One paper on Panoptic Video Scene Graph Generation (PVSG) is accepted by CVPR-2023.
  • 2022.11:Two paper on Video Scene Understanding is accepted by T-PAMI.
  • 2022.09:One paper on Neural Collapse is accepted by NeurIPS-2022.
  • 2022.08: Β πŸŽ‰πŸŽ‰ Join the MMLab@NTU S-Lab! Our four works (Video K-Net, PanopticPartFormer, FashionFormer, and PolyphonicFormer in CVPR-22/ECCV-22) code are all released. Check out my github homepage.
  • 2022.07: Β πŸŽ‰πŸŽ‰ Our SFNet-Lite (extension of SFNet-ECCV20) achieve the best mIoU and speed trade-off. on multiple driving datasets. SFNet-Lite can obtain 80.1 mIoU while running at 50 FPS, 78.8 mIoU while running at 120 FPS. Code.
  • 2022.07: Β πŸŽ‰πŸŽ‰ Three papers are accepted by ECCV-2022. One paper is accepted by ICIP-2022.
  • 2022.07: Β πŸŽ‰πŸŽ‰ Graduated From PKU.
  • 2022.03: Β πŸŽ‰πŸŽ‰ Video K-Net is accepted by CVPR-2022 as oral presentation.

πŸ“ Publications

Full Publications Per Year can be found in Here.

* means equal contribution.

Code can be found in this.

Selected Arxiv

  • Transformer-Based Visual Segmentation: A Survey, Xiangtai Li, Henghui Ding, Wenwei Zhang, Haobo Yuan, Jiangmiao Pang, Guangliang Cheng, Kai Chen, Ziwei Liu, Chen Change Loy arxiv The first comprehensive survey on transformer-based segmentation model. | Project
  • Selected Conference

  • Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation, Xiangtai Li, Haobo Yuan, Wenwei Zhang, Guangliang Cheng, Jiangmiao Pang, Chen Change Loy, ICCV 2023 The first unified SOTA universal video segmentation model. | Project
  • Explore In-Context Learning for 3D Point Cloud Understanding, Zhongbin Fang, Xiangtai Li, Xia Li, Joachim M. Buhmann, Chen Change Loy, Mengyuan Liu NeurIPS 2023, spotlight The first work to explore in-context learning in point cloud. | Project
  • Betrayed by Captions: Joint Caption Grounding and Generation for Open Vocabulary Instance Segmentation, Jianzong Wu*, Xiangtai Li*, Henghui Ding, Xia Li, Guangliang Cheng, Yunhai Tong, Chen Change Loy, ICCV 2023 Query-based Open Vocabulary Segmentation aided by Caption. | Project
  • Panoptic-PartFormer: Learning a Unified Model for Panoptic Part Segmentation, Xiangtai Li, Shilin Xu, Yibo Yang, Guangliang Cheng, Yunhai Tong, Dacheng Tao, ECCV 2022 The first unified part-aware panoptic segmentation model | Code
  • Fashionformer: A Simple, Effective and Unified Baseline for Human Fashion Segmentation and Recognition, Shilin Xu*, Xiangtai Li*, Jingbo Wang, Guangliang Cheng, Yunhai Tong, Dacheng Tao, ECCV 2022 | Code
  • PolyphonicFormer: Unified Query Learning for Depth-aware Video Panoptic Segmentation, Haobo Yuan*, Xiangtai Li*, Yibo Yang, Guangliang Cheng, Jing Zhang, Yunhai Tong, Lefei Zhang, Dacheng Tao, ECCV 2022 Winner of ICCV-2021 BMTT workshop, The first unified depth aware video panoptic segmentation model | Code
  • Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation, Xiangtai Li*, Wenwei Zhang*, Jiangmiao Pang*, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change Loy, CVPR 2022 (Oral, top2%) The first unified video segmentation model and codebase for VPS, VIS, VSS | Code
  • Semantic Flow for Fast and Accurate Scene Parsing, Xiangtai Li, Ansheng You, Zhen Zhu, Houlong Zhao, Maoke Yang, Kuiyuan Yang, Yunhai Tong, ECCV 2020 (Oral, top2%) The first real-time model over 80% mIoU on Cityscapes test set. | Code
  • GFF: Gated Fully Fusion for Semantic Segmentation, Xiangtai Li, Houlong Zhao, Lei Han, Yunhai Tong, Kuiyuan Yang, AAAI 2020 (Oral, top3%) | Code
  • Selected Journal

  • TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers , Qianyu Zhou*, Xiangtai Li* , Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, Dacheng Tao, T-PAMI-2022 End-to-End Vision Transformer for Video Object Detection | Code
  • Improving Video Instance Segmentation via Temporal Pyramid Routing, Xiangtai Li, Hao He, Yibo Yang, Henghui Ding, Kuiyuan Yang, Guangliang Cheng, Yunhai Tong, Dacheng Tao T-PAMI-2022 The first dynamic network for video scene understanding | Code
  • SFNet: Faster, Accurate, and Domain Agnostic Semantic Segmentation via Semantic Flow , Xiangtai Li, Jiangning Zhang, Yibo Yang, Guangliang Cheng, Yunhai Tong, Kuiyuan Yang, Dacheng Tao, IJCV-2023 | Code
  • πŸŽ– Honors and Awards

    • National Scholarship, Ministry of Education of China in PKU (year 2020-2021) (year 2019-2020)
    • President Scholarship of PKU (year 2020-2021)
    • 2017, 2022 Beijing Excellent Graduates
    • 2017, 2022 BUPT/PKU Excellent Graduates
    • 2021.11 Winner of Segmenting and Tracking Every Point and Pixel: 6th Workshop on ICCV-2021 Track2 (Project Leader and First Author)

    πŸ“– Educations

    • 2017.09 - 2022.06, PhD in Peking University (PKU)
    • 2013.09 - 2017.06, Bachelor in Beijing University of Posts and Telecommunications (BUPT)

    πŸ’¬ Invited Talks

    • 2022.05 Invited talk on Panoptic Segmentation and Beyond in Baidu PaddleSeg Group
    • 2021.12 Invited talk on Video Segmentation in DiDi Auto-Driving Group
    • 2021.10 Invited talk on Aligned Segmentation HuaWei Noah Auto-Driving Group

    πŸ’» Internships