Frame Extraction Algorithm
The goal of frame extraction is to select a concise set of images that best represent the visual narrative of the video. Processing every frame would be computationally expensive and redundant. The Video Analyzer uses a difference-based algorithm to identify these keyframes.
1. Target Frame Calculation
First, the system determines how many frames it should aim to extract.
- The target is calculated based on the video's duration and the configured
frames_per_minute
setting (default is 60). - This number is capped by the
--max-frames
command-line argument if provided. - The system ensures it extracts at least one frame and no more than the total number of frames in the video.
2. Adaptive Sampling
To avoid comparing every single frame with its predecessor, the processor uses an adaptive sampling interval. It steps through the video at a pace faster than the final target, creating a pool of candidate frames. This approach significantly reduces the processing load while still providing broad coverage of the video's timeline.
3. Frame Difference Analysis
For each sampled frame, a difference score is calculated against the previous sampled frame.
- Grayscale Conversion: Frames are converted to grayscale for a more efficient and color-agnostic comparison.
- Absolute Difference: OpenCV's
absdiff
function calculates the pixel-wise difference between the two frames. - Mean Score: The mean of these differences is taken as the final score. A high score indicates a significant change between the frames.
- Thresholding: Only frames with a score above a predefined
FRAME_DIFFERENCE_THRESHOLD
(default 10.0) are considered significant and stored as candidates.
4. Final Selection Process
From the pool of high-scoring candidate frames, the final set is selected.
- Top N Selection: The candidates are sorted by their difference scores in descending order, and the top
N
frames are selected, whereN
is the target number of frames calculated in step 1. - Even Sampling (with
--max-frames
): If the--max-frames
argument is used, the selection logic changes. Instead of just taking the topN
highest-scoring frames (which might be clustered together), the system selectsN
frames that are evenly spaced throughout the list of all significant candidates. This ensures a more representative summary of the entire video, even if some high-action sequences are less represented.
Limitations
- Missed Changes: Significant visual changes that occur between sampling intervals may be missed.
- Rapid Sequences: A quick succession of major changes might only result in one frame being selected from that sequence.
- Even Sampling Trade-offs: When using
--max-frames
, the even sampling might skip a very high-scoring frame in favor of a lower-scoring one to maintain chronological spacing.