Evaluation Pipeline

Glue Factory separates evaluation into two steps to ensure reproducibility and efficiency:

Export: The model processes the dataset, and predictions (keypoints, descriptors, matches) are saved to disk (HDF5 format).
Evaluate: Metrics are computed on the cached predictions.

This separation allows you to re-run evaluation with different robust estimators (RANSAC settings) without re-running the neural network.

Supported Benchmarks

Run the evaluation script corresponding to the benchmark:

python -m gluefactory.eval.<benchmark> --conf <config_name>

Benchmarks: hpatches, megadepth1500, scannet1500, eth3d.

--overwrite: Force re-export of predictions.
--overwrite_eval: Force re-calculation of metrics (skips export if predictions exist).
--tag <name>: Custom tag for the results (defaults to config name).
--plot: Generate and save recall plots.

You can switch the robust estimator used for pose/homography estimation in the config:

eval:
    estimator: poselib  # or opencv, pycolmap
    ransac_th: 1.0      # Inlier threshold (pixels)

Setting ransac_th: -1 allows the script to search for the optimal threshold during evaluation.