Customizing Prompts

The Video Analyzer's behavior is heavily guided by a set of prompt templates. The tool is designed to allow users to override these default prompts with their own, enabling fine-grained control over the analysis and summary generation process.

How Prompts are Loaded

The PromptLoader class manages loading prompts. It searches for prompt files in a specific order of priority:

  1. User-Specified Directory: If a prompt_dir is defined in your config.json, the loader will look for prompts there first. This is the recommended way to use custom prompts.
  2. Package Resources: If a custom directory is not provided or the file isn't found there, the loader falls back to the default prompts included with the package.

Configuring a Custom Prompt Directory

To use your own prompts, first create a directory and replicate the structure of the default prompts. Then, update your config.json to point to it:

{
  "prompt_dir": "/path/to/my_prompts",
  "prompts": [
      {
          "name": "Frame Analysis",
          "path": "frame_analysis/frame_analysis.txt"
      },
      {
          "name": "Video Reconstruction",
          "path": "frame_analysis/describe.txt"
      }
  ]
}

Now, when the analyzer looks for the "Frame Analysis" prompt, it will load /path/to/my_prompts/frame_analysis/frame_analysis.txt.

The {prompt} Token

Both default prompts include a {prompt} placeholder. This token is replaced with the text provided via the --prompt command-line argument. When customizing your prompts, be sure to include this token if you want to retain the ability to ask specific questions about the video.

Default Prompts

Below are the contents of the default prompts used by the application.

1. Frame Analysis Prompt

File: video_analyzer/prompts/frame_analysis/frame_analysis.txt

This prompt guides the LLM in analyzing a single frame. It instructs the model to consider the context of previous frames ({PREVIOUS_FRAMES}) and to structure its output consistently.

Frame Description Instructions
Previous Notes Section
[Previous frame descriptions will appear here in chronological order]

{PREVIOUS_FRAMES}

Your Tasks
You are viewing Frame [X] of this video sequence. Your goal is to document what you observe in a way that contributes to a coherent narrative of the entire video.
Step 1: Quick Scan

Watch for key changes from the previous descriptions
Note any new elements or developments
Identify if this is a transition moment or continuation

Step 2: Document Your Frame
Follow this structure for your notes:

Setting/Scene (if changed from previous)

Only describe if there's a notable change from previous frames
Include any new environmental details


Action/Movement

What is happening in this specific moment?
Focus on motion and changes
Note the direction of movement
Describe gestures or expressions


New Information

Document any new objects, people, or text that appears
Note any changes in audio described in previous frames
Record any new dialogue or text shown


Continuity Points

Connect your observations to previous notes
Highlight how this frame advances the narrative
Note if something mentioned in previous frames is no longer visible


Writing Guidelines

Use present tense
Be specific and concise
Avoid interpretation - stick to what you can see
Use clear transitional phrases to connect to previous descriptions
Include timestamp if available

{prompt}

Format Your Notes As:

Frame [X] If there are no existing frames you are Frame 0, otherwise you're the next frame
[Your observations following the structure above]

Key continuation points:
- [List 2-3 elements that the next viewer should particularly watch for]

2. Video Reconstruction Prompt

File: video_analyzer/prompts/frame_analysis/describe.txt

This prompt is used in the final stage to synthesize all frame analyses ({FRAME_NOTES}) and the audio transcript ({TRANSCRIPT}) into a single video summary.

Video Summary Instructions
Available Materials
Frame 1 of the video (viewable)

Available Materials
Complete set of chronological frame descriptions from all previous viewers

{FRAME_NOTES}

Video Transcript

{TRANSCRIPT}


Your Task
You are synthesizing multiple frame descriptions into a cohesive video summary. You have access to the first frame and detailed notes about all subsequent frames.
Step 1: Review Process

First Frame Analysis

Study your available frame in detail
Note opening composition, characters, and setting
Identify the initial tone and context


Notes Review

Read through all frame descriptions chronologically
Mark key transitions and major developments
Identify narrative patterns and themes
Note any inconsistencies or gaps in descriptions


Step 2: Synthesis Guidelines
Create your summary following this structure:

Opening Description

Begin with what you can directly verify from your frame
Establish the initial setting, characters, and situation


Narrative Development

Build the story chronologically
Connect scenes and transitions naturally
Maintain consistent character descriptions
Track significant object movements and changes
Include relevant audio elements mentioned in notes


Technical Elements

Note camera movements described
Include editing transitions
Reference significant visual effects
Mention notable lighting or composition changes


Writing Style Guidelines

Write in present tense
Use clear, active voice
Maintain objective descriptions
Avoid speculation beyond provided notes
Include specific details that build credibility
Connect scenes with smooth transitions
Maintain consistent tone throughout

Quality Check
Before submitting, verify:

The summary flows naturally
No contradictions exist between sections
All major elements from notes are included
The narrative is coherent for someone who hasn't seen the video
Technical terms are used correctly
The opening matches your viewed frame
Transitions between described frames feel natural

{prompt}

Format Your Summary As:

VIDEO SUMMARY
Duration: [if provided in notes]

[Opening paragraph - based on your viewed frame]

[Main body - chronological progression from notes]

[Closing observations - final state/resolution]

Note: This summary is based on direct observation of the first frame combined with detailed notes from subsequent frames.