Open-Vocabulary SAM & RWKV-SAM

This project introduces two innovative models, Open-Vocabulary SAM (OVSAM) and RWKV-SAM, designed to advance the capabilities of image segmentation. OVSAM extends the Segment Anything Model (SAM) to simultaneously perform interactive segmentation and recognize over twenty thousand classes. RWKV-SAM explores a more efficient architecture for high-quality, high-speed segmentation.

arXiv Project Page HuggingFace Model


Core Projects

Open-Vocabulary SAM (OVSAM)

OVSAM is a SAM-inspired model designed for simultaneous interactive segmentation and recognition. It leverages two unique knowledge transfer modules:

  1. SAM2CLIP: Adapts SAM's segmentation knowledge into the CLIP model via distillation and learnable transformer adapters.
  2. CLIP2SAM: Transfers CLIP's vast recognition knowledge into SAM, enhancing its ability to identify and label segmented objects.

This dual-transfer approach allows OVSAM to segment and recognize a vast vocabulary of object classes interactively.

OVSAM overview

RWKV-SAM

RWKV-SAM introduces an efficient segmentation backbone and a complete training pipeline to enable high-quality segmentation with significantly improved performance. Compared to equivalent transformer-based models, RWKV-SAM achieves more than a 2x speedup while delivering superior segmentation results across various datasets.

For more details, please see the RWKV-SAM documentation.


📰 News

  • Jul. 2, 2024: Open-Vocabulary SAM has been accepted by ECCV 2024.
  • Jun. 27, 2024: The RWKV-SAM code and model have been released. Read the paper.

📚 Citation

If you find this work useful for your research, please consider citing our papers:

@inproceedings{yuan2024ovsam,
    title={Open-Vocabulary SAM: Segment and Recognize Twenty-thousand Classes Interactively},
    author={Yuan, Haobo and Li, Xiangtai and Zhou, Chong and Li, Yining and Chen, Kai and Loy, Chen Change},
    booktitle={ECCV},
    year={2024}
}
@article{yuan2024mamba,
  title={Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model},
  author={Yuan, Haobo and Li, Xiangtai and Qi, Lu and Zhang, Tao and Yang, Ming-Hsuan and Yan, Shuicheng and Loy, Chen Change},
  journal={arXiv preprint},
  year={2024}
}