LLM Integration
To support both local and cloud-based models, the Video Analyzer uses a flexible client-based architecture. This design makes it easy to switch between different LLM providers or even add new ones in the future.
The Base Client
All clients inherit from a base abstract class, LLMClient
, defined in video_analyzer/clients/llm_client.py
. This class enforces a common interface for all client implementations.
class LLMClient(ABC):
def encode_image(self, image_path: str) -> str:
# Common base64 encoding for all clients
with open(image_path, "rb") as image_file:
return base64.b64encode(image_file.read()).decode('utf-8')
@abstractmethod
def generate(self,
prompt: str,
image_path: Optional[str] = None,
stream: bool = False,
model: str = "llama3.2-vision",
temperature: float = 0.2,
num_predict: int = 256) -> Dict[Any, Any]:
pass
This ensures that the core application logic can interact with any client in the same way, simply by calling the generate
method.
Client Implementations
The project includes two primary client implementations.
1. Ollama Client (ollama.py
)
This client is used for running vision models locally via the Ollama service. It is the default client.
- Endpoint: It communicates with the local Ollama API, typically at
http://localhost:11434
. - Image Handling: It encodes the frame image into a Base64 string and sends it within the
images
array in the JSON payload, which is the format Ollama expects. - Configuration: Requires the Ollama service URL. This is configured in
config.json
under theclients.ollama.url
key or with the--ollama-url
command-line argument.
2. Generic OpenAI API Client (generic_openai_api.py
)
This client is designed to be compatible with any service that adheres to the OpenAI API specification. This includes OpenAI itself, as well as proxy services like OpenRouter.
- Endpoint: The API URL is configurable, allowing it to point to different services (e.g.,
https://api.openai.com/v1
orhttps://openrouter.ai/api/v1
). - Image Handling: It sends the image as a
image_url
content block with adata:image/jpeg;base64,...
URI, as specified by the OpenAI API for vision models. - Authentication: It uses a Bearer token for authentication, requiring an API key.
- Configuration: Requires an
api_key
andapi_url
. These are configured inconfig.json
under theclients.openai_api
section or with the--api-key
and--api-url
arguments.