I am Xiangtai Li (李η₯₯ζ³°), working as a Research Scientist at ByteDance/Tiktok (Singapore) for computer vision and related problems.

Previously, I worked as a Research Fellow at MMLab@NTU, S-Lab advised by Prof.Chen Change Loy and research intern or research scientist in JD Exploration Academy / Sensetime / Shanghai AI Laboratory / 2050 Research.

I obtained my PhD degree at Peking University (PKU) under the supervision of Prof.Yunhai Tong, and my bachelor’s degree at Beijing University of Posts and Telecommunications (BUPT).

My research topics are:

Multi-modal learning with LLMs. (Connect visual perception with Large language models) and Image, Video Generation/Synthesis/Editing.

Previously, I did some works on Image/Video/3D detection and segmentation.

Moreover, the code and models for my works (maybe 98%), including the ones I have profoundly contributed to, are open-sourced on GitHub.

Find the code and models here.

πŸ”₯ News

  • 2024.07: Β πŸŽ‰πŸŽ‰Our Transformer Survey is finally accepted by T-PAMI. Arxiv.
  • 2024.07: πŸ”₯πŸ”₯ Checkout our recent Universal Dense MLLM Model, OMG-LLaVA, project, code.
  • 2024.07: Β πŸŽ‰πŸŽ‰ DVIS-DAQ, Open-Vocabulary SAM, FaceAdapter, and GenView are accepted by ECCV-2024.
  • 2024.06: πŸ”₯πŸ”₯ Checkout our recent works on diffusion models, MotionBooth, SemFlow.
  • 2024.06: πŸ”₯πŸ”₯ Checkout our recent works on MLLM and new architecture design, OMG-LLaVA, RWKV-SAM, MotionBooth, SeTok and Reason3D.
  • 2024.04: πŸ”₯πŸ”₯ Checkout our new video segmentation work DVIS-DAQ, which achieves the new state-of-the-art results on multiple video segmentation benchmarks.
  • 2024.04: πŸ”₯πŸ”₯ Checkout Point Cloud Mamba, the first SSMs-model that performs better than PointMLP and PointTransformer!
  • 2024.03: πŸ”₯πŸ”₯ The codebase of OMG-Seg is open-sourced! link. This is the first codebase support joint image/video/multi-data/interactive segmentation co-training and testing!
  • 2024.03: Β πŸŽ‰πŸŽ‰ Give a talk of open-world segmentation (Beyond SAM) at VALSE, Slides Video.
  • 2024.02: Β πŸŽ‰πŸŽ‰ OMG-Seg is accepted by CVPR-24. Along with OMG-Seg, five works are accepted by CVPR-24! BA-SAM, RTMO, Skeleton-in-Context, and language-driven video inpainting.
  • 2024.02: Checkout several recent works on segmentation and recognition, OMG-Seg, Open-Vocabulary SAM and RAP-SAM.
  • 2024.01: Β πŸŽ‰πŸŽ‰ Our survey on Open Vocabulary Learning is accepted by T-PAMI.

πŸ“ Publications

* means equal contribution.

Top-5 My Favourite Works

  • OMG-Seg: Is One Model Good Enough For All Segmentation?, Xiangtai Li, Haobo Yuan, Wei Li, Henghui Ding, Size Wu, Wenwei Zhang, Yining Li, Kai Chen, Chen Change Loy CVPR 2024 One model to perform image/video/open-vocabulary/multi-dataset/interactive segmentation in one shot. | Project Page
  • Video K-Net: A Simple, Strong, and Unified Baseline for Video Segmentation, Xiangtai Li*, Wenwei Zhang*, Jiangmiao Pang*, Kai Chen, Guangliang Cheng, Yunhai Tong, Chen Change Loy, CVPR 2022 (Oral, top2%) The first unified video segmentation model and codebase for VPS, VIS, VSS | Code
  • Tube-Link: A Flexible Cross Tube Baseline for Universal Video Segmentation, Xiangtai Li, Haobo Yuan, Wenwei Zhang, Guangliang Cheng, Jiangmiao Pang, Chen Change Loy, ICCV 2023 The first unified SOTA universal video segmentation model. | Project
  • Semantic Flow for Fast and Accurate Scene Parsing, Xiangtai Li, Ansheng You, Zhen Zhu, Houlong Zhao, Maoke Yang, Kuiyuan Yang, Yunhai Tong, ECCV 2020 (Oral, top2%) The first real-time model over 80% mIoU on Cityscapes test set. | Code
  • TransVOD: End-to-end Video Object Detection with Spatial-Temporal Transformers , Qianyu Zhou*, Xiangtai Li* , Lu He, Yibo Yang, Guangliang Cheng, Yunhai Tong, Lizhuang Ma, Dacheng Tao, T-PAMI-2022 The first End-to-End Vision Transformer for Video Object Detection and STOA results on Video Object Detection | Code
  • Recent Works

    These are several interesting works that I deeply involved in the past months.

  • OMG-LLaVA: Bridging Image-level, Object-level, Pixel-level Reasoning and Understanding , Tao Zhang, Xiangtai Li, Hao Fei, Haobo Yuan, Shengqiong Wu, Shunping Ji, Chen Change Loy, Shuicheng Yan, Arxiv-2024 Unify Image-level, Object-leve, and Pixel-level instruction tuning in one framework. | Code
  • MotionBooth: Motion-Aware Customized Text-to-Video Generation, Jianzong Wu, Xiangtai Li, Yanhong Zeng, Jiangning Zhang, Qianyu Zhou, Yining Li, Yunhai Tong, Kai Chen Arxiv-2024 A novel motion-aware object customization for video generation. | Code
  • SemFlow: Binding Semantic Segmentation and Image Synthesis via Rectified Flow , Chaoyang Wang, Xiangtai Li , Lu Qi, Henghui Ding, Yunhai Tong, Ming-Hsuan Yang, Arxiv-2024 Binding Semantic Segmentation and Synthesis using LDM and Rectified Flow | Code
  • Reason3D: Searching and Reasoning 3D Segmentation via Large Language Model , Kuan-Chih Huang, Xiangtai Li , Lu Qi, Shuicheng Yan, Ming-Hsuan Yang, Arxiv-2024 LLM meets 3D reasoning segmentation | Code
  • Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively , Haobo Yuan, Xiangtai Li, Chong Zhou, Yining Li, Kai Chen, Chen Change Loy ECCV-2024 Bind SAM and CLIP in one model and achieve open vocabulary recognition and segmentation. | Code
  • EdgeSAM: Prompt-In-the-Loop Distillation for On-Device Deployment of SAM , Chong Zhou, Xiangtai Li, Chen Change Loy, Bo Dai Arxiv-2024 The mobile SAM model runs on iPhone. | Code
  • DVIS-DAQ: Improving Video Segmentation via Dynamic Anchor Queries , Yikang Zhou, Tao Zhang, Shunping Ji, Shuicheng Yan, Xiangtai Li , ECCV-2024 Dynamic anchor query design for long and complex video segmentation | Code
  • Generalizable Entity Grounding via Assistance of Large Language Model, Lu Qi, Yi-Wen Chen, Lehan Yang, Tiancheng Shen, Xiangtai Li, Weidong Guo, Yu Xu, Ming-Hsuan Yang, Arxiv-2024 Add LLM with entity-level segmentation and grounding | Code
  • Point Cloud Mamba: Point Cloud Learning via State Space Model , Tao Zhang, Xiangtai Li , Haobo Yuan, Shunping Ji, Shuicheng Yan, Arxiv-2024 Mamba-like point cloud model that outperform Transformers and MLP on both efficiency and accuracy. | Code
  • Code can be found in this.

    Full publication can be found in Google Scholar

    πŸŽ– Honors and Awards

    • National Scholarship, Ministry of Education of China in PKU (year 2020-2021) (year 2019-2020).
    • President Scholarship of PKU (year 2020-2021).
    • 2017, 2022 Beijing Excellent Graduates.
    • 2017, 2022 BUPT Excellent Graduates, PKU Excellent Graduates.
    • 2021.11 Winner of Segmenting and Tracking Every Point and Pixel: 6th Workshop on ICCV-2021 Track2 (Project Leader and First Author).

    πŸ“– Educations

    • 2017.09 - 2022.07, PhD in Peking University (PKU).
    • 2013.09 - 2017.07, Bachelor in Beijing University of Posts and Telecommunications (BUPT).

    πŸ’¬ Invited Talks

    • 2024.03 Invited talk on Open-Vocabulary Segmentation and Segment Anything at VALSE, online. Slide, Video.
    • 2023.08 Invited talk on Video Segmentation at VALSE, online. Slides, Video.
    • 2022.05 Invited talk on Panoptic Segmentation and Beyond in Baidu PaddleSeg Group.
    • 2021.12 Invited talk on Video Segmentation in DiDi Auto-Driving Group.
    • 2021.10 Invited talk on Aligned Segmentation HuaWei Noah Auto-Driving Group.

    πŸ’» Internships

    • SenseTime, mentored by Dr.Guangliang Cheng and Dr. Jianping Shi.
    • JD AI (remote cooperation), mentored by Dr. Yibo Yang and Prof. Dacheng Tao.
    • DeepMotion (Now Xiaomi Car), mentored by Dr. Kuiyuan Yang.
    • Regular Conference Reviewer for CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, IJCAI and Journal Reviewer For IEEE-TIP, IEEE-TPAMI, IJCV.