RWKV-SAM: High-Quality and High-Efficiency Segmentation
RWKV-SAM is an innovative approach within the OVSAM project that focuses on improving the efficiency of the Segment Anything Model. It introduces a novel segmentation backbone based on the RWKV (Receptance Weighted Key Value) architecture and a comprehensive training pipeline to achieve high-quality results with significantly reduced computational cost.
Overview
Traditional transformer-based models like the one in the original SAM can be computationally intensive, especially for high-resolution images. RWKV-SAM addresses this by replacing the standard Vision Transformer (ViT) backbone with an RWKV-based architecture. This change leads to:
- High Efficiency: Achieves more than a 2x speedup compared to the same-scale transformer model.
- High Quality: Maintains or even surpasses the segmentation performance of traditional models on various benchmark datasets.
Demo
RWKV-SAM excels at promptable segmentation on high-resolution images, demonstrating both quality and speed.
Citation
If you use RWKV-SAM in your research, please cite the following paper:
@article{yuan2024rwkvsam,
title={Mamba or RWKV: Exploring High-Quality and High-Efficiency Segment Anything Model},
author={Yuan, Haobo and Li, Xiangtai and Qi, Lu and Zhang, Tao and Yang, Ming-Hsuan and Yan, Shuicheng and Loy, Chen Change},
journal={arXiv preprint},
year={2024}
}