Advanced Model Options (Ollama)

Hollama allows you to fine-tune the behavior of Ollama models by adjusting their parameters. These controls are accessible from the Controls tab within a session's prompt editor.

Note: These advanced options are currently only available for Ollama models.

To access these options, click the Settings icon next to the model selector.

Below is a list of the available parameters, based on the OllamaOptions interface defined in the source code.

Model Options

These parameters control the generation process and sampling.

interface OllamaOptions {
  // Generation
  mirostat: number;        // Enable Mirostat sampling.
  mirostat_eta: number;    // Mirostat learning rate.
  mirostat_tau: number;    // Mirostat target surprise value.
  num_ctx: number;         // Context window size.
  num_predict: number;     // Max number of tokens to predict.
  repeat_last_n: number;   // How far back to look for repetitions.
  repeat_penalty: number;  // Penalty for repetition.
  temperature: number;     // Controls randomness. Higher is more creative.
  seed: number;            // Random seed for reproducibility.
  stop: string[];          // Sequences where the model will stop generating.
  tfs_z: number;           // Tail-free sampling.
  top_k: number;           // Top-K sampling.
  top_p: number;           // Top-P (nucleus) sampling.
  min_p: number;           // Min-P sampling.

  // Penalties
  penalize_newline: boolean;
  presence_penalty: number;
  frequency_penalty: number;
  typical_p: number;
}

Runtime Options

These parameters affect how the model is loaded and run on the hardware.

interface OllamaOptions {
  // Hardware & Performance
  num_gpu: number;         // Number of GPU layers to use.
  main_gpu: number;        // Main GPU to use.
  low_vram: boolean;       // Use for systems with low VRAM.
  f16_kv: boolean;         // Use 16-bit floats for KV cache.
  numa: boolean;           // Enable NUMA support.
  num_batch: number;       // Batch size for prompt processing.
  num_thread: number;      // Number of threads to use.

  // Memory Management
  use_mmap: boolean;       // Use memory-mapped files.
  use_mlock: boolean;      // Force the model to be kept in RAM.

  // Other
  num_keep: number;        // Number of tokens to keep from the start of the context.
  vocab_only: boolean;
}

For a detailed explanation of each parameter, please refer to the official Ollama documentation on parameters.