Backbone Configurations
This project utilizes several different backbone architectures for the image encoder component, each configured for a specific purpose within the training pipeline.
1. SAM Backbone (SAMBackbone
)
The original Vision Transformer (ViT) from the Segment Anything Model serves as the primary segmentation feature extractor and the "teacher" model in the distillation process.
- Class:
seg.models.backbones.SAMBackbone
orprojects.rwkvsam.models.backbones.SAMBackbone
- Variants:
vit_h
(Huge),vit_l
(Large),vit_b
(Base) - Usage: Primarily used for its powerful, fine-grained spatial features ideal for segmentation. In OVSAM, it acts as the teacher model during SAM2CLIP training.
Example Config (seg/configs/sam2clip/sam_vith_dump.py
):
model = dict(
...
backbone=dict(
type='SAMBackbone',
model_name='vit_h',
fix=True, # Usually frozen during training
init_cfg=dict(
type='sam_pretrain',
checkpoint='vit_h'
)
),
...
)
2. OpenCLIP Backbone (OpenCLIPBackbone
)
The vision encoder from the OpenCLIP library is used for its strong image-text alignment and recognition capabilities. It serves as the "student" model in distillation.
- Class:
seg.models.backbones.OpenCLIPBackbone
- Variants:
RN50x16
,ViT-L-14
, etc. (See OpenCLIP documentation for full list) - Usage: Used for its powerful semantic features. It is enhanced with SAM's spatial knowledge via SAM2CLIP distillation.
**Example Config (seg/configs/clip2sam/clip2sam_coco_rn50x16.py
):
model = dict(
...
backbone=dict(
type='OpenCLIPBackbone',
model_name='RN50x16',
fix=True, # Frozen during CLIP2SAM training
init_cfg=dict(
type='clip_pretrain',
checkpoint='openai'
)
),
...
)
3. VITAMIN / RWKV Backbone (VITAMINBackbone
)
This backbone is the core of the RWKV-SAM project, designed for high-efficiency segmentation. It combines convolutional stages with an RWKV-based attention mechanism.
- Class:
projects.rwkvsam.models.backbones.VITAMINBackbone
- Variants:
small
,base
, etc. - Usage: Serves as a direct, more efficient replacement for the SAM ViT backbone.
**Example Config (projects/rwkvsam/configs/backbone_dist/rwkvsam1001_000_vith_vitamin_rwkv_small_mlp2.py
):
model = dict(
...
backbone_student=dict(
type='VITAMINBackbone',
img_size=(224, 224),
model_variant='small',
attn_type='rwkv', # Specifies the RWKV attention mechanism
attn_cfg=dict(
mlp_ratio=2,
),
...
),
...
)