Backbone Configurations

This project utilizes several different backbone architectures for the image encoder component, each configured for a specific purpose within the training pipeline.

1. SAM Backbone (`SAMBackbone`)

The original Vision Transformer (ViT) from the Segment Anything Model serves as the primary segmentation feature extractor and the "teacher" model in the distillation process.

Class: seg.models.backbones.SAMBackbone or projects.rwkvsam.models.backbones.SAMBackbone
Variants: vit_h (Huge), vit_l (Large), vit_b (Base)
Usage: Primarily used for its powerful, fine-grained spatial features ideal for segmentation. In OVSAM, it acts as the teacher model during SAM2CLIP training.

Example Config (seg/configs/sam2clip/sam_vith_dump.py):

model = dict(
    ...
    backbone=dict(
        type='SAMBackbone',
        model_name='vit_h',
        fix=True,  # Usually frozen during training
        init_cfg=dict(
            type='sam_pretrain',
            checkpoint='vit_h'
        )
    ),
    ...
)

2. OpenCLIP Backbone (`OpenCLIPBackbone`)

The vision encoder from the OpenCLIP library is used for its strong image-text alignment and recognition capabilities. It serves as the "student" model in distillation.

Class: seg.models.backbones.OpenCLIPBackbone
Variants: RN50x16, ViT-L-14, etc. (See OpenCLIP documentation for full list)
Usage: Used for its powerful semantic features. It is enhanced with SAM's spatial knowledge via SAM2CLIP distillation.

**Example Config (seg/configs/clip2sam/clip2sam_coco_rn50x16.py):

model = dict(
    ...
    backbone=dict(
        type='OpenCLIPBackbone',
        model_name='RN50x16',
        fix=True,  # Frozen during CLIP2SAM training
        init_cfg=dict(
            type='clip_pretrain',
            checkpoint='openai'
        )
    ),
    ...
)

3. VITAMIN / RWKV Backbone (`VITAMINBackbone`)

This backbone is the core of the RWKV-SAM project, designed for high-efficiency segmentation. It combines convolutional stages with an RWKV-based attention mechanism.

Class: projects.rwkvsam.models.backbones.VITAMINBackbone
Variants: small, base, etc.
Usage: Serves as a direct, more efficient replacement for the SAM ViT backbone.

**Example Config (projects/rwkvsam/configs/backbone_dist/rwkvsam1001_000_vith_vitamin_rwkv_small_mlp2.py):

model = dict(
    ...
    backbone_student=dict(
        type='VITAMINBackbone',
        img_size=(224, 224),
        model_variant='small',
        attn_type='rwkv',  # Specifies the RWKV attention mechanism
        attn_cfg=dict(
            mlp_ratio=2,
        ),
        ...
    ),
    ...
)

Backbone Configurations

1. SAM Backbone (SAMBackbone)

2. OpenCLIP Backbone (OpenCLIPBackbone)

3. VITAMIN / RWKV Backbone (VITAMINBackbone)

1. SAM Backbone (`SAMBackbone`)

2. OpenCLIP Backbone (`OpenCLIPBackbone`)

3. VITAMIN / RWKV Backbone (`VITAMINBackbone`)