Dataset Configurations

Dataset configurations define how data is loaded, processed, and fed to the model during training and evaluation. These are specified in the train_dataloader, val_dataloader, and test_dataloader dictionaries in the config files.

Key Components of a Dataloader Config

  • batch_size: Number of samples per GPU.
  • num_workers: Number of subprocesses to use for data loading.
  • dataset: A dictionary that defines the dataset itself, including:
    • type: The dataset class name (e.g., CocoDataset, SAMDataset).
    • data_root: The root directory where the dataset is stored.
    • ann_file: Path to the annotation file (relative to data_root).
    • data_prefix: Path to the image directory (relative to data_root).
    • pipeline: A list of data transformation and augmentation steps.

Example: COCO Instance Segmentation Dataset

Here is an example from projects/rwkvsam/configs/_base_/datasets/coco/coco_instance.py:

# dataset settings
data_root = 'data/coco/'
dataset_type = 'CocoDataset'

train_pipeline = [
    dict(type='LoadImageFromFile'),
    dict(type='LoadAnnotations', with_bbox=True, with_mask=True),
    dict(type='Resize', scale=(1333, 800), keep_ratio=True),
    dict(type='RandomFlip', prob=0.5),
    dict(type='PackDetInputs')
]

train_dataloader = dict(
    batch_size=2,
    num_workers=2,
    sampler=dict(type='DefaultSampler', shuffle=True),
    dataset=dict(
        type=dataset_type,
        data_root=data_root,
        ann_file='annotations/instances_train2017.json',
        data_prefix=dict(img='train2017/'),
        filter_cfg=dict(filter_empty_gt=True, min_size=32),
        pipeline=train_pipeline
    )
)

The Data Pipeline

The pipeline is a crucial component that defines a sequence of operations applied to each data sample:

  1. LoadImageFromFile: Loads the image from the file path.
  2. LoadAnnotations: Loads annotations (bounding boxes, masks) from the annotation file.
  3. Resize: Resizes the image and its corresponding annotations.
  4. RandomFlip: Applies random horizontal flipping for data augmentation.
  5. PackDetInputs: Collects all data into a standardized format (DetDataSample) that the model expects.

Available Dataset Configurations

This project provides base configurations for a variety of datasets located in seg/configs/_base_/datasets/ and projects/rwkvsam/configs/_base_/datasets/, including:

  • COCO: For instance segmentation and open-vocabulary tasks.
  • LVIS: For large-vocabulary instance segmentation.
  • SAM: For the large-scale SAM dataset used in distillation.
  • ADE20k: For semantic segmentation.
  • Specialized Datasets: DIS5K, ThinObject5K, and EntitySeg for more specific segmentation tasks.