LLM Integration

To support both local and cloud-based models, the Video Analyzer uses a flexible client-based architecture. This design makes it easy to switch between different LLM providers or even add new ones in the future.

The Base Client

All clients inherit from a base abstract class, LLMClient, defined in video_analyzer/clients/llm_client.py. This class enforces a common interface for all client implementations.

class LLMClient(ABC):
    def encode_image(self, image_path: str) -> str:
        # Common base64 encoding for all clients
        with open(image_path, "rb") as image_file:
            return base64.b64encode(image_file.read()).decode('utf-8')

    @abstractmethod
    def generate(self,
        prompt: str,
        image_path: Optional[str] = None,
        stream: bool = False,
        model: str = "llama3.2-vision",
        temperature: float = 0.2,
        num_predict: int = 256) -> Dict[Any, Any]:
        pass

This ensures that the core application logic can interact with any client in the same way, simply by calling the generate method.

Client Implementations

The project includes two primary client implementations.

1. Ollama Client (ollama.py)

This client is used for running vision models locally via the Ollama service. It is the default client.

  • Endpoint: It communicates with the local Ollama API, typically at http://localhost:11434.
  • Image Handling: It encodes the frame image into a Base64 string and sends it within the images array in the JSON payload, which is the format Ollama expects.
  • Configuration: Requires the Ollama service URL. This is configured in config.json under the clients.ollama.url key or with the --ollama-url command-line argument.

2. Generic OpenAI API Client (generic_openai_api.py)

This client is designed to be compatible with any service that adheres to the OpenAI API specification. This includes OpenAI itself, as well as proxy services like OpenRouter.

  • Endpoint: The API URL is configurable, allowing it to point to different services (e.g., https://api.openai.com/v1 or https://openrouter.ai/api/v1).
  • Image Handling: It sends the image as a image_url content block with a data:image/jpeg;base64,... URI, as specified by the OpenAI API for vision models.
  • Authentication: It uses a Bearer token for authentication, requiring an API key.
  • Configuration: Requires an api_key and api_url. These are configured in config.json under the clients.openai_api section or with the --api-key and --api-url arguments.