Example: Evaluating on ImageNet

TensorNets includes scripts to evaluate the performance of its classification models on the ImageNet dataset. This is crucial for verifying model accuracy and reproducing the results reported in academic papers.

The primary script for this task is examples/evaluate_imagenet.py.

Overview of the Evaluation Script

The evaluate_imagenet.py script is designed to:

Load an ImageNet dataset in TFRecords format.
Instantiate a specified model from TensorNets.
Load the model's pre-trained weights.
Run the evaluation loop to calculate Top-1 and Top-5 accuracy.
Report the final metrics, including accuracy, MACs (Multiply-Accumulate operations), and model size.

How to Run an Evaluation

To run the evaluation, you need to provide several command-line arguments:

--dataset_dir: The path to your ImageNet dataset in TFRecords format.
--model_name: The name of the TensorNets model you want to evaluate (e.g., ResNet50).
--eval_image_size: The input image size for the model (e.g., 224 for ResNet50).
--normalize: The normalization type to use. This is an integer that maps to a specific preprocessing function (see examples/imagenet_preprocessing.py for details).
--batch_size: The number of images to process in each batch.

Example Command:

python examples/evaluate_imagenet.py \
    --model_name=ResNet50 \
    --dataset_dir=/path/to/imagenet/tfrecords \
    --eval_image_size=224 \
    --normalize=1 \
    --batch_size=200

Batch Evaluation

The repository includes a shell script, examples/evaluate_imagenet_all.sh, which demonstrates how to run evaluations for all models in parallel across multiple GPUs. This script is a great reference for finding the correct parameters (eval_image_size, normalize) for each model.

Here is a snippet from the script showing the command for Inception3:

# From evaluate_imagenet_all.sh
python evaluate_imagenet.py --model_name=Inception3 --eval_image_size=299 --normalize=2

By examining this script, you can easily find the right settings to reproduce the performance metrics reported in the performance documentation.