RWKV-SAM: High-Quality and High-Efficiency Segmentation

RWKV-SAM is an innovative approach within the OVSAM project that focuses on improving the efficiency of the Segment Anything Model. It introduces a novel segmentation backbone based on the RWKV (Receptance Weighted Key Value) architecture and a comprehensive training pipeline to achieve high-quality results with significantly reduced computational cost.

Overview

Traditional transformer-based models like the one in the original SAM can be computationally intensive, especially for high-resolution images. RWKV-SAM addresses this by replacing the standard Vision Transformer (ViT) backbone with an RWKV-based architecture. This change leads to:

High Efficiency: Achieves more than a 2x speedup compared to the same-scale transformer model.
High Quality: Maintains or even surpasses the segmentation performance of traditional models on various benchmark datasets.

RWKV-SAM overview

Demo

RWKV-SAM excels at promptable segmentation on high-resolution images, demonstrating both quality and speed.

RWKV-SAM demo

Citation

If you use RWKV-SAM in your research, please cite the following paper:

@article{yuan2024rwkvsam,
    title={Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model},
    author={Yuan, Haobo and Li, Xiangtai and Qi, Lu and Zhang, Tao and Yang, Ming-Hsuan and Yan, Shuicheng and Loy, Chen Change},
    journal={arXiv preprint},
    year={2024}
}