Code Structure Overview

This project is organized into several key directories, each serving a distinct purpose. Understanding this structure is helpful for navigating the codebase and extending it.

ext/ - External Libraries

This directory contains third-party or foundational libraries that are integrated directly into the project. These are often core components upon which the new models are built.

  • open_clip/: A copy of the OpenCLIP library, providing the powerful CLIP models used for open-vocabulary recognition.
  • sam/: Contains the core building blocks from the original Segment Anything Model, such as the image encoder, prompt encoder, and mask decoder architectures.
  • rwkv/: Code related to the RWKV (Receptance Weighted Key Value) architecture, which is used as an efficient backbone in the RWKV-SAM project.
  • class_names/: Utility files defining class IDs and names for various datasets like COCO and LVIS.

seg/ - Core OVSAM Implementation

This directory holds the primary source code for the Open-Vocabulary SAM (OVSAM) model.

  • configs/: Configuration files for training and evaluating OVSAM components, including sam2clip, clip2sam, and the final ovsam model.
  • models/: Implementation of custom models, including backbones (OpenCLIPBackbone, SAMBackbone), necks (MultiLayerTransformerNeck), heads (OVSAMHead), and detectors that tie everything together (OVSAM, CLIP2SAM).
  • datasets/: Custom dataset loaders and data processing pipelines.
  • evaluation/: Custom evaluation metrics.

projects/rwkvsam/ - RWKV-SAM Project

This is a self-contained sub-project focusing on a high-efficiency segmentation model. It follows a similar structure to the seg/ directory but contains code specific to the RWKV-based architecture.

  • README.md: An introduction to the RWKV-SAM model.
  • configs/: Configuration files for training and evaluating RWKV-SAM.
  • models/: Implementations of RWKV-specific models like the VITAMINBackbone.
  • datasets/ & evaluation/: Dataset and evaluation code specific to experiments run for this project.

tools/ - Scripts and Utilities

This directory contains the main executable scripts for interacting with the models.

  • train.py: The main script for training models.
  • test.py: The main script for testing and evaluating models.
  • gen_cls.py: A utility to pre-compute and cache language embeddings for class names.
  • dist.sh: A wrapper script to launch the Python scripts in a distributed (multi-GPU) environment.