Frame Extraction Algorithm

The goal of frame extraction is to select a concise set of images that best represent the visual narrative of the video. Processing every frame would be computationally expensive and redundant. The Video Analyzer uses a difference-based algorithm to identify these keyframes.

1. Target Frame Calculation

First, the system determines how many frames it should aim to extract.

The target is calculated based on the video's duration and the configured frames_per_minute setting (default is 60).
This number is capped by the --max-frames command-line argument if provided.
The system ensures it extracts at least one frame and no more than the total number of frames in the video.

2. Adaptive Sampling

To avoid comparing every single frame with its predecessor, the processor uses an adaptive sampling interval. It steps through the video at a pace faster than the final target, creating a pool of candidate frames. This approach significantly reduces the processing load while still providing broad coverage of the video's timeline.

3. Frame Difference Analysis

For each sampled frame, a difference score is calculated against the previous sampled frame.

Grayscale Conversion: Frames are converted to grayscale for a more efficient and color-agnostic comparison.
Absolute Difference: OpenCV's absdiff function calculates the pixel-wise difference between the two frames.
Mean Score: The mean of these differences is taken as the final score. A high score indicates a significant change between the frames.
Thresholding: Only frames with a score above a predefined FRAME_DIFFERENCE_THRESHOLD (default 10.0) are considered significant and stored as candidates.

4. Final Selection Process

From the pool of high-scoring candidate frames, the final set is selected.

Top N Selection: The candidates are sorted by their difference scores in descending order, and the top N frames are selected, where N is the target number of frames calculated in step 1.
Even Sampling (with --max-frames): If the --max-frames argument is used, the selection logic changes. Instead of just taking the top N highest-scoring frames (which might be clustered together), the system selects N frames that are evenly spaced throughout the list of all significant candidates. This ensures a more representative summary of the entire video, even if some high-action sequences are less represented.

Limitations

Missed Changes: Significant visual changes that occur between sampling intervals may be missed.
Rapid Sequences: A quick succession of major changes might only result in one frame being selected from that sequence.
Even Sampling Trade-offs: When using --max-frames, the even sampling might skip a very high-scoring frame in favor of a lower-scoring one to maintain chronological spacing.