CLI Usage Guide

This guide covers all configuration options and command-line arguments for the video-analyzer tool, along with practical examples for different use cases.

Basic Usage

Local Analysis with Ollama (Default)

This is the simplest way to run the analyzer, assuming Ollama is installed and running.

video-analyzer path/to/video.mp4

Using an OpenAI-Compatible API (e.g., OpenRouter)

To use a remote service, you must specify the client, API key, and API URL.

video-analyzer path/to/video.mp4 --client openai_api --api-key your-key --api-url https://openrouter.ai/api/v1

Command-Line Arguments

The video-analyzer command accepts the following arguments to override default and file-based configurations.

Argument Description Default Example
video_path Path to the input video file. (Required) my_video.mp4
--config Path to your custom configuration directory. config/ --config /path/to/my_config/
--output Output directory for analysis results. output/ --output ./results/
--client Client to use for LLM analysis. ollama --client openai_api
--ollama-url URL for the Ollama service. http://localhost:11434 --ollama-url http://192.168.1.10:11434
--api-key API key for an OpenAI-compatible service. None --api-key sk-xxx...
--api-url Base API URL for an OpenAI-compatible service. None --api-url https://openrouter.ai/api/v1
--model Name of the vision model to use. llama3.2-vision --model gpt-4o
--duration Duration of the video in seconds to process from the start. Full video --duration 60
--keep-frames If set, extracted frames are not deleted after analysis. False --keep-frames
--whisper-model Whisper model size or path. medium --whisper-model large
--start-stage Stage to start processing from (1, 2, or 3). 1 --start-stage 2
--max-frames Maximum number of frames to process. Samples frames evenly across the video. No limit --max-frames 10
--log-level Set the logging verbosity. INFO --log-level DEBUG
--prompt A specific question to ask about the video. "" --prompt "What is the primary subject?"
--language Language for audio transcription (e.g., 'en', 'es'). Auto-detect --language en
--device Device for the Whisper model. cpu --device cuda
--temperature Temperature for LLM generation (0.0-1.0). 0.2 --temperature 0.7

Processing Stages

The --start-stage argument allows you to resume a failed analysis or re-run parts of the process. The stages are:

  1. Frame and Audio Processing: Extracts keyframes and transcribes audio.
  2. Frame Analysis: Sends each keyframe to the LLM for description.
  3. Video Reconstruction: Combines frame analyses and the transcript into a final summary.

For this to work correctly, you must run with --keep-frames on the initial run so that the frames are available for later stages.

Common Use Cases

High-Quality Analysis with a Custom Prompt

Use a larger Whisper model and a powerful cloud-based LLM for the best results.

video-analyzer video.mp4 \
    --client openai_api \
    --api-key your-key \
    --api-url https://openrouter.ai/api/v1 \
    --model anthropic/claude-3.5-sonnet:free \
    --whisper-model large \
    --prompt "Focus on the interactions between people."

Resume from Frame Analysis Stage

If the initial frame extraction succeeded but the LLM analysis failed, you can resume from stage 2.

# First run (failed during analysis)
video-analyzer video.mp4 --keep-frames

# Second run (resume from stage 2)
video-analyzer video.mp4 --start-stage 2 --keep-frames

Analyze a Long Video with Evenly Sampled Frames

Use --max-frames to ensure you get a representative sample from across the entire video, rather than just the most active scenes at the beginning.

video-analyzer long_video.mp4 --max-frames 10

This will extract 10 frames evenly spaced across the video's duration.

GPU-Accelerated Transcription

If you have a compatible NVIDIA GPU, you can accelerate the Whisper transcription process.

video-analyzer video.mp4 --device cuda