Configuration

The Video Analyzer uses a flexible, cascading configuration system that allows you to customize its behavior at different levels.

Configuration Priority

The settings are applied in the following order of precedence, with later settings overriding earlier ones:

  1. Default Configuration: The base settings defined in video_analyzer/config/default_config.json.
  2. User Configuration File: A config.json file you create. By default, the tool looks in a config/ directory in your current working path, but you can specify a different location with the --config argument.
  3. Command-Line Arguments: Any arguments you provide when running video-analyzer will have the highest priority.

Configuration File Structure

Your user config.json file should follow the same structure as the default configuration. Here is the default config file for reference:

{
    "clients": {
        "default": "ollama",
        "temperature": 0.0,
        "ollama": {
            "url": "http://localhost:11434",
            "model": "llama3.2-vision"
        },
        "openai_api": {
            "api_key": "",
            "model": "meta-llama/llama-3.2-11b-vision-instruct",
            "api_url": "https://openrouter.ai/api/v1"
        }
    },
    "prompt_dir": "prompts",
    "prompts": [
        {
            "name": "Frame Analysis",
            "path": "frame_analysis/frame_analysis.txt"
        },
        {
            "name": "Video Reconstruction",
            "path": "frame_analysis/describe.txt"
        }
    ],
    "output_dir": "output",
    "frames": {
        "per_minute": 60,
        "analysis_threshold": 10.0,
        "min_difference": 5.0,
        "max_count": 30,
        "start_stage": 1,
        "max_frames": 2147483647
    },
    "response_length": {
        "frame": 300,
        "reconstruction": 1000,
        "narrative": 500
    },
    "audio": {
        "whisper_model": "medium",
        "sample_rate": 16000,
        "channels": 1,
        "quality_threshold": 0.2,
        "chunk_length": 30,
        "language_confidence_threshold": 0.8,
        "language": "en",
        "device": "cpu"
    },
    "keep_frames": false,
    "prompt": ""
}

Configuration Options Explained

clients

  • default: The default LLM client to use (ollama or openai_api). Can be overridden by --client.
  • temperature: Controls the randomness of the LLM's output. Higher values (e.g., 0.8) are more creative, while lower values (e.g., 0.0) are more deterministic.
  • ollama: Settings for the local Ollama client.
    • url: The URL of your Ollama service.
    • model: The default Ollama model to use.
  • openai_api: Settings for any OpenAI-compatible API client.
    • api_key: Your API key.
    • model: The default model name for the service.
    • api_url: The base URL of the API endpoint.

prompt_dir & prompts

  • prompt_dir: Path to a directory containing custom prompts. See the Custom Prompts guide for more details.
  • prompts: A list defining the prompts used by the application.

output_dir

  • The directory where analysis.json and other artifacts are saved.

frames

  • per_minute: The target number of keyframes to extract for every minute of video.
  • max_count: An older setting for the maximum number of frames to extract. It's recommended to use the --max-frames argument instead for more predictable behavior.

audio

  • whisper_model: The size of the Whisper model to use (tiny, base, small, medium, large).
  • language: The language code (e.g., 'en', 'fr') to use for transcription. If null or not set, the language is auto-detected.
  • device: The hardware to use for transcription (cpu, cuda, mps).

General Settings

  • keep_frames: If true, the extracted frame images will not be deleted after the analysis is complete.
  • prompt: A default question or prompt to apply to all analyses. Can be overridden with the --prompt argument.