I am Xiangtai Li (ζη₯₯ζ³°), working as a Research Scientist at ByteDance/Tiktok (Singapore) for computer vision and related problems.
Previously, I worked as a Research Fellow at MMLab@NTU, S-Lab advised by Prof.Chen Change Loy and research intern or research scientist in JD Exploration Academy / Sensetime / Shanghai AI Laboratory / 2050 Research.
I obtained my PhD degree at Peking University (PKU) under the supervision of Prof.Yunhai Tong, and my bachelorβs degree at Beijing University of Posts and Telecommunications (BUPT).
My research topics are:
Multi-modal learning with LLMs. (Connect visual perception with Large language models) and Image, Video Generation/Synthesis/Editing.
Previously, I did some works on Image/Video/3D detection and segmentation.
Moreover, the code and models for my works (maybe 98%), including the ones I have profoundly contributed to, are open-sourced on GitHub.
Find the code and models here.
π₯ News
- 2024.07οΌ Β ππOur Transformer Survey is finally accepted by T-PAMI. Arxiv.
- 2024.07οΌ π₯π₯ Checkout our recent Universal Dense MLLM Model, OMG-LLaVA, project, code.
- 2024.07οΌ Β ππ DVIS-DAQ, Open-Vocabulary SAM, FaceAdapter, and GenView are accepted by ECCV-2024.
- 2024.06οΌ π₯π₯ Checkout our recent works on diffusion models, MotionBooth, SemFlow.
- 2024.06οΌ π₯π₯ Checkout our recent works on MLLM and new architecture design, OMG-LLaVA, RWKV-SAM, MotionBooth, SeTok and Reason3D.
- 2024.04οΌ π₯π₯ Checkout our new video segmentation work DVIS-DAQ, which achieves the new state-of-the-art results on multiple video segmentation benchmarks.
- 2024.04οΌ π₯π₯ Checkout Point Cloud Mamba, the first SSMs-model that performs better than PointMLP and PointTransformer!
- 2024.03οΌ π₯π₯ The codebase of OMG-Seg is open-sourced! link. This is the first codebase support joint image/video/multi-data/interactive segmentation co-training and testing!
- 2024.03οΌ Β ππ Give a talk of open-world segmentation (Beyond SAM) at VALSE, Slides Video.
- 2024.02οΌ Β ππ OMG-Seg is accepted by CVPR-24. Along with OMG-Seg, five works are accepted by CVPR-24! BA-SAM, RTMO, Skeleton-in-Context, and language-driven video inpainting.
- 2024.02: Checkout several recent works on segmentation and recognition, OMG-Seg, Open-Vocabulary SAM and RAP-SAM.
- 2024.01: Β ππ Our survey on Open Vocabulary Learning is accepted by T-PAMI.
π Publications
* means equal contribution.
Top-5 My Favourite Works
Recent Works
These are several interesting works that I deeply involved in the past months.
Code can be found in this.
Full publication can be found in Google Scholar
π Honors and Awards
- National Scholarship, Ministry of Education of China in PKU (year 2020-2021) (year 2019-2020).
- President Scholarship of PKU (year 2020-2021).
- 2017, 2022 Beijing Excellent Graduates.
- 2017, 2022 BUPT Excellent Graduates, PKU Excellent Graduates.
- 2021.11 Winner of Segmenting and Tracking Every Point and Pixel: 6th Workshop on ICCV-2021 Track2 (Project Leader and First Author).
π Educations
- 2017.09 - 2022.07, PhD in Peking University (PKU).
- 2013.09 - 2017.07, Bachelor in Beijing University of Posts and Telecommunications (BUPT).
π¬ Invited Talks
- 2024.03 Invited talk on Open-Vocabulary Segmentation and Segment Anything at VALSE, online. Slide, Video.
- 2023.08 Invited talk on Video Segmentation at VALSE, online. Slides, Video.
- 2022.05 Invited talk on Panoptic Segmentation and Beyond in Baidu PaddleSeg Group.
- 2021.12 Invited talk on Video Segmentation in DiDi Auto-Driving Group.
- 2021.10 Invited talk on Aligned Segmentation HuaWei Noah Auto-Driving Group.
π» Internships
- SenseTime, mentored by Dr.Guangliang Cheng and Dr. Jianping Shi.
- JD AI (remote cooperation), mentored by Dr. Yibo Yang and Prof. Dacheng Tao.
- DeepMotion (Now Xiaomi Car), mentored by Dr. Kuiyuan Yang.
- Regular Conference Reviewer for CVPR, ICCV, ECCV, ICLR, AAAI, NeurIPS, ICML, IJCAI and Journal Reviewer For IEEE-TIP, IEEE-TPAMI, IJCV.