Backbone Configurations

This project utilizes several different backbone architectures for the image encoder component, each configured for a specific purpose within the training pipeline.

1. SAM Backbone (SAMBackbone)

The original Vision Transformer (ViT) from the Segment Anything Model serves as the primary segmentation feature extractor and the "teacher" model in the distillation process.

  • Class: seg.models.backbones.SAMBackbone or projects.rwkvsam.models.backbones.SAMBackbone
  • Variants: vit_h (Huge), vit_l (Large), vit_b (Base)
  • Usage: Primarily used for its powerful, fine-grained spatial features ideal for segmentation. In OVSAM, it acts as the teacher model during SAM2CLIP training.

Example Config (seg/configs/sam2clip/sam_vith_dump.py):

model = dict(
    ...
    backbone=dict(
        type='SAMBackbone',
        model_name='vit_h',
        fix=True,  # Usually frozen during training
        init_cfg=dict(
            type='sam_pretrain',
            checkpoint='vit_h'
        )
    ),
    ...
)

2. OpenCLIP Backbone (OpenCLIPBackbone)

The vision encoder from the OpenCLIP library is used for its strong image-text alignment and recognition capabilities. It serves as the "student" model in distillation.

  • Class: seg.models.backbones.OpenCLIPBackbone
  • Variants: RN50x16, ViT-L-14, etc. (See OpenCLIP documentation for full list)
  • Usage: Used for its powerful semantic features. It is enhanced with SAM's spatial knowledge via SAM2CLIP distillation.

**Example Config (seg/configs/clip2sam/clip2sam_coco_rn50x16.py):

model = dict(
    ...
    backbone=dict(
        type='OpenCLIPBackbone',
        model_name='RN50x16',
        fix=True,  # Frozen during CLIP2SAM training
        init_cfg=dict(
            type='clip_pretrain',
            checkpoint='openai'
        )
    ),
    ...
)

3. VITAMIN / RWKV Backbone (VITAMINBackbone)

This backbone is the core of the RWKV-SAM project, designed for high-efficiency segmentation. It combines convolutional stages with an RWKV-based attention mechanism.

  • Class: projects.rwkvsam.models.backbones.VITAMINBackbone
  • Variants: small, base, etc.
  • Usage: Serves as a direct, more efficient replacement for the SAM ViT backbone.

**Example Config (projects/rwkvsam/configs/backbone_dist/rwkvsam1001_000_vith_vitamin_rwkv_small_mlp2.py):

model = dict(
    ...
    backbone_student=dict(
        type='VITAMINBackbone',
        img_size=(224, 224),
        model_variant='small',
        attn_type='rwkv',  # Specifies the RWKV attention mechanism
        attn_cfg=dict(
            mlp_ratio=2,
        ),
        ...
    ),
    ...
)