CLI Usage Guide
This guide covers all configuration options and command-line arguments for the video-analyzer
tool, along with practical examples for different use cases.
Basic Usage
Local Analysis with Ollama (Default)
This is the simplest way to run the analyzer, assuming Ollama is installed and running.
video-analyzer path/to/video.mp4
Using an OpenAI-Compatible API (e.g., OpenRouter)
To use a remote service, you must specify the client, API key, and API URL.
video-analyzer path/to/video.mp4 --client openai_api --api-key your-key --api-url https://openrouter.ai/api/v1
Command-Line Arguments
The video-analyzer
command accepts the following arguments to override default and file-based configurations.
Argument | Description | Default | Example |
---|---|---|---|
video_path |
Path to the input video file. | (Required) | my_video.mp4 |
--config |
Path to your custom configuration directory. | config/ |
--config /path/to/my_config/ |
--output |
Output directory for analysis results. | output/ |
--output ./results/ |
--client |
Client to use for LLM analysis. | ollama |
--client openai_api |
--ollama-url |
URL for the Ollama service. | http://localhost:11434 |
--ollama-url http://192.168.1.10:11434 |
--api-key |
API key for an OpenAI-compatible service. | None |
--api-key sk-xxx... |
--api-url |
Base API URL for an OpenAI-compatible service. | None |
--api-url https://openrouter.ai/api/v1 |
--model |
Name of the vision model to use. | llama3.2-vision |
--model gpt-4o |
--duration |
Duration of the video in seconds to process from the start. | Full video | --duration 60 |
--keep-frames |
If set, extracted frames are not deleted after analysis. | False |
--keep-frames |
--whisper-model |
Whisper model size or path. | medium |
--whisper-model large |
--start-stage |
Stage to start processing from (1, 2, or 3). | 1 |
--start-stage 2 |
--max-frames |
Maximum number of frames to process. Samples frames evenly across the video. | No limit | --max-frames 10 |
--log-level |
Set the logging verbosity. | INFO |
--log-level DEBUG |
--prompt |
A specific question to ask about the video. | "" |
--prompt "What is the primary subject?" |
--language |
Language for audio transcription (e.g., 'en', 'es'). | Auto-detect | --language en |
--device |
Device for the Whisper model. | cpu |
--device cuda |
--temperature |
Temperature for LLM generation (0.0-1.0). | 0.2 |
--temperature 0.7 |
Processing Stages
The --start-stage
argument allows you to resume a failed analysis or re-run parts of the process. The stages are:
- Frame and Audio Processing: Extracts keyframes and transcribes audio.
- Frame Analysis: Sends each keyframe to the LLM for description.
- Video Reconstruction: Combines frame analyses and the transcript into a final summary.
For this to work correctly, you must run with --keep-frames
on the initial run so that the frames are available for later stages.
Common Use Cases
High-Quality Analysis with a Custom Prompt
Use a larger Whisper model and a powerful cloud-based LLM for the best results.
video-analyzer video.mp4 \
--client openai_api \
--api-key your-key \
--api-url https://openrouter.ai/api/v1 \
--model anthropic/claude-3.5-sonnet:free \
--whisper-model large \
--prompt "Focus on the interactions between people."
Resume from Frame Analysis Stage
If the initial frame extraction succeeded but the LLM analysis failed, you can resume from stage 2.
# First run (failed during analysis)
video-analyzer video.mp4 --keep-frames
# Second run (resume from stage 2)
video-analyzer video.mp4 --start-stage 2 --keep-frames
Analyze a Long Video with Evenly Sampled Frames
Use --max-frames
to ensure you get a representative sample from across the entire video, rather than just the most active scenes at the beginning.
video-analyzer long_video.mp4 --max-frames 10
This will extract 10 frames evenly spaced across the video's duration.
GPU-Accelerated Transcription
If you have a compatible NVIDIA GPU, you can accelerate the Whisper transcription process.
video-analyzer video.mp4 --device cuda