Code Structure Overview
This project is organized into several key directories, each serving a distinct purpose. Understanding this structure is helpful for navigating the codebase and extending it.
ext/ - External Libraries
This directory contains third-party or foundational libraries that are integrated directly into the project. These are often core components upon which the new models are built.
open_clip/: A copy of the OpenCLIP library, providing the powerful CLIP models used for open-vocabulary recognition.sam/: Contains the core building blocks from the original Segment Anything Model, such as the image encoder, prompt encoder, and mask decoder architectures.rwkv/: Code related to the RWKV (Receptance Weighted Key Value) architecture, which is used as an efficient backbone in the RWKV-SAM project.class_names/: Utility files defining class IDs and names for various datasets like COCO and LVIS.
seg/ - Core OVSAM Implementation
This directory holds the primary source code for the Open-Vocabulary SAM (OVSAM) model.
configs/: Configuration files for training and evaluating OVSAM components, includingsam2clip,clip2sam, and the finalovsammodel.models/: Implementation of custom models, including backbones (OpenCLIPBackbone,SAMBackbone), necks (MultiLayerTransformerNeck), heads (OVSAMHead), and detectors that tie everything together (OVSAM,CLIP2SAM).datasets/: Custom dataset loaders and data processing pipelines.evaluation/: Custom evaluation metrics.
projects/rwkvsam/ - RWKV-SAM Project
This is a self-contained sub-project focusing on a high-efficiency segmentation model. It follows a similar structure to the seg/ directory but contains code specific to the RWKV-based architecture.
README.md: An introduction to the RWKV-SAM model.configs/: Configuration files for training and evaluating RWKV-SAM.models/: Implementations of RWKV-specific models like theVITAMINBackbone.datasets/&evaluation/: Dataset and evaluation code specific to experiments run for this project.
tools/ - Scripts and Utilities
This directory contains the main executable scripts for interacting with the models.
train.py: The main script for training models.test.py: The main script for testing and evaluating models.gen_cls.py: A utility to pre-compute and cache language embeddings for class names.dist.sh: A wrapper script to launch the Python scripts in a distributed (multi-GPU) environment.