Video Analyzer: Overview
Video Analyzer is a powerful command-line tool that leverages large language vision models (LLMs) and audio transcription to create detailed, narrative descriptions of video content. By intelligently extracting key frames, transcribing audio with high accuracy, and feeding this information to models like Llama 3.2 Vision, it can generate comprehensive summaries of what's happening in a video.
This tool is designed for developers, researchers, and content creators who need to programmatically understand and summarize video files. It can run entirely on local hardware for privacy and cost-effectiveness or connect to cloud-based, OpenAI-compatible APIs for enhanced speed and scale.
Key Features
- 💻 Local-First Operation: Can run completely locally using Ollama, requiring no cloud services or API keys for core functionality.
- ☁️ Cloud Flexibility: Seamlessly integrates with any OpenAI API-compatible service (like OpenRouter or OpenAI) for access to a wide range of models.
- 🎬 Intelligent Frame Extraction: Automatically identifies and extracts the most significant frames from a video, focusing on moments of change and action.
- 🔊 High-Quality Transcription: Utilizes OpenAI's Whisper (via
faster-whisper
) for accurate audio transcription, with automatic handling of poor-quality audio. - 👁️ Contextual Frame Analysis: Each key frame is analyzed with the context of previous frames, building a coherent narrative throughout the video.
- 📝 Comprehensive Descriptions: Combines visual analysis and audio transcripts to generate a detailed, natural language description of the video's content.
- 📊 Structured JSON Output: Produces a detailed JSON file containing analysis metadata, the full audio transcript, frame-by-frame descriptions, and the final summary.
- ⚙️ Highly Configurable: Offers extensive customization through a cascading configuration system, allowing control via command-line arguments or a
config.json
file.
The Problem It Solves
Manually reviewing and summarizing video content is a time-consuming and subjective process. Video Analyzer automates this task by applying the analytical power of modern AI models to both the visual and auditory components of a video. This enables scalable video content analysis, automated metadata generation, and new ways to search and interact with video libraries.
License
This project is licensed under the Apache License 2.0.