Ollama Buddy 0.12.0: User Prompt Library, File Attachments, Vision and Context Tracking

There have been quite a few updates recently. The main highlights include support for attachments, so you can push a file to the chat directly from dired for potential inclusion in your next query.

Vision support has been added for models that can handle it. If you supply the path to an image file in the chat, it will be processed. This means you can now, for example, extract text from images using models like o:gemma3:4b.

I’ve also introduced the ability to save user system prompts. If you have a favorite prompt, or have crafted one that works especially well for you, you can now save it by category and title in a simple Org format for later recall. Prompt recall now works the same way as Fabric patterns and Awesome ChatGPT prompts. This makes it much easier to display the currently used system prompt concisely in the status bar, as it will be based on the prompt title (and thus likely the role).

What else? Oh yes, I received a request for better context tracking. Now, when context is nearing full capacity, or has exceeded it, it will be indicated in the status bar!

That’s probably it for the major changes. There was also some refactoring, but you probably don’t care about that. Anyway, here is the full list of changes:

<2025-05-22 Thu> 0.12.0

Full system prompt in the status bar replaced with a more meaningful simple role title

Previously, the header status bar would show truncated system prompt text like [You are a helpful assistant wh...], making it difficult to quickly identify which prompt was active. Now, the display shows meaningful role titles with source indicators:

The system now intelligently extracts titles from prompt content by recognizing common patterns like “You are a…”, “Act as…”, or “I want you to act as…”. When these patterns aren’t found, it generates a concise title from the first few words.

Behind the scenes, Ollama Buddy now maintains a registry of all system prompts with their titles, sources, and timestamps. This enables new features like system prompt history viewing and better organization across Fabric patterns, Awesome ChatGPT prompts, and user-defined prompts.

The result is a cleaner interface that makes it immediately clear which role your AI assistant is currently embodying, without cluttering the status bar with long, truncated text.

<2025-05-21 Wed> 0.11.1

Quite a bit of refactoring to generally make this project more maintainable and I have added a starter kit of user prompts.

<2025-05-19 Mon> 0.11.0

Added user system prompts management

This feature makes it easier to save, organize, and reuse your favorite system prompts when working with Ollama language models.

System prompts are special instructions that guide the behavior of language models. By setting effective system prompts, you can:

The new ollama-buddy-user-prompts module organizes your system prompts in a clean, category-based system:

The new functionality is accessible through the updated key binding C-c s, which opens a dedicated transient menu with these options:

If you work frequently with Ollama models, you’ve likely discovered the power of well-crafted system prompts. They can dramatically improve the quality and consistency of responses. With this new management system, you can:

<2025-05-14 Wed> 0.10.0

Added file attachment system for including documents in conversations

You can now seamlessly include text files, code, documentation, and more directly in your conversations with local AI models!

Simply use C-c C-a from the chat buffer to attach any file to your current conversation.

The attached files become part of your conversation context, allowing the AI to reference, analyze, or work with their contents directly.

The transient menu has also been updated with a new Attachment Menu

*File Attachments*
  a Attach file
  w Show attachments
  d Detach file
  0 Clear all attachments

Your attachments aren’t just dumped into the conversation - they’re intelligently integrated:

Managing attached files is intuitive with dedicated commands:

Working in Dired? No problem! You can attach files directly from your file browser:

Use the configuration as follows:

(eval-after-load 'dired
  '(progn
     (define-key dired-mode-map (kbd "C-c C-a") #'ollama-buddy-dired-attach-marked-files)))

<2025-05-12 Mon> 0.9.50

Added context size management and monitoring

I’ve added context window management and monitoring capabilities to Ollama Buddy!

This update helps you better understand and manage your model’s context usage, preventing errors and optimizing your conversations.

Enable it with the following:

(setq ollama-buddy-show-context-percentage t)

Usage

After implementing these changes:

  1. Text mode: Shows 1024/4096 style display
  2. Bar mode (default): Shows ███████░░░░ 2048 style display
  3. Use C-c 8 to toggle between modes
  4. The Text mode will change fontification based on your thresholds:
    • Normal: regular fontification
    • (85%+): underlined and bold
    • (100%+): inverse video and bold
  5. The Bar mode will just fill up as normal

The progress bar will visually represent how much of the context window you’re using, making it easier to see at a glance when you’re approaching the limit.

Implementation Details

Context Size Detection

Determining a model’s context size proved more complex than expected. While experimenting with parsing model info JSON, I discovered that context size information can be scattered across different fields. Rather than implementing a complex JSON parser (which may come later), I chose a pragmatic approach:

I created a new defcustom variable ollama-buddy-fallback-context-sizes that includes hard-coded values for popular Ollama models. The fallback mechanism is deliberately simple: substring matching followed by a sensible default of 4096 tokens.

(defcustom ollama-buddy-fallback-context-sizes
  '(("llama3.2:1b" . 2048)
    ("llama3:8b" . 4096)
    ("tinyllama" . 2048)
    ("phi3:3.8b" . 4096)
    ("gemma3:1b" . 4096)
    ("gemma3:4b" . 8192)
    ("llama3.2:3b" . 8192)
    ("llama3.2:8b" . 8192)
    ("llama3.2:70b" . 8192)
    ("starcoder2:3b" . 8192)
    ("starcoder2:7b" . 8192)
    ("starcoder2:15b" . 8192)
    ("mistral:7b" . 8192)
    ("mistral:8x7b" . 32768)
    ("codellama:7b" . 8192)
    ("codellama:13b" . 8192)
    ("codellama:34b" . 8192)
    ("qwen2.5-coder:7b" . 8192)
    ("qwen2.5-coder:3b" . 8192)
    ("qwen3:0.6b" . 4096)
    ("qwen3:1.7b" . 8192)
    ("qwen3:4b" . 8192)
    ("qwen3:8b" . 8192)
    ("deepseek-r1:7b" . 8192)
    ("deepseek-r1:1.5b" . 4096))
  "Mapping of model names to their default context sizes.
Used as a fallback when context size can't be determined from the API."
  :type '(alist :key-type string :value-type integer)
  :group 'ollama-buddy)

This approach may not be perfectly accurate for all models, but it’s sufficient for getting the core functionality working. More importantly, as a defcustom, users can easily customize these values for complete accuracy with their specific models. Users can also set context values within the chat buffer through C-c C (Show Context Information) for each individual model if desired.

This design choice allowed me to focus on the essential features without getting stuck on complex context retrieval logic.

One final thing!, if the num_ctx: Context window size in tokens is set, then that number will also be taken into consideration. An assumption will be made that the model is honouring the context size requested and will incorporated into the context calculations accordingly.

Token Estimation

For token counting, I’ve implemented a simple heuristic: each word (using string-split) is multiplied by 1.3. This follows commonly recommended approximations and works well enough in practice. While this isn’t currently configurable, I may add it as a customization option in the future.

How to Use Context Management in Practice

The C-c C (Show Context Information) command is central to this feature. Rather than continuously monitoring context size while you type (which would be computationally expensive and potentially distracting), I’ve designed the system to calculate context on-demand when you choose.

Typical Workflows

Scenario 1: Paste-and-Send Approach

Let’s say you want to paste a large block of text into the chat buffer. You can simply:

  1. Paste your content
  2. Press the send keybinding
  3. If the context limit is exceeded, you’ll get a warning dialog asking whether to proceed anyway

Scenario 2: Preemptive Checking

For more control, you can check context usage before sending:

  1. Paste your content
  2. Run C-c C to see the current context breakdown
  3. If the context looks too high, you have several options:
    • Trim your current prompt
    • Remove or simplify your system prompt
    • Edit conversation history using Ollama Buddy’s history modification features
    • Switch to a model with a larger context window

Scenario 3: Manage the Max History Length

Want tight control over context size without constantly monitoring the real-time display? Since conversation history is part of the context, you can simply limit ollama-buddy-max-history-length to control the total context size.

For example, when working with small context windows, set ollama-buddy-max-history-length to 1. This keeps only the last exchange (your prompt + model response), ensuring your context remains small and predictable, perfect for maintaining control without manual monitoring.

Scenario 4: Parameter num_ctx: Context window size in tokens

Simply set this parameter and off you go!

Current Status: Experimental

Given the potentially limiting nature of context management, I’ve set this feature to disabled by default.

But to enable set the following :

(setq ollama-buddy-show-context-percentage t)

This means:

As the feature matures and proves its value, I may enable it by default. For now, consider it an experimental addition that users can opt into.

More Details

The status bar now displays your current context usage in real-time. You’ll see a fraction showing used tokens versus the model’s maximum context size (e.g., “2048/8192”). The display automatically updates as your conversation grows.

Context usage changes fontification to help you stay within limits:

Before sending prompts that exceed the context limit, Ollama Buddy now warns you and asks for confirmation. This prevents unexpected errors and helps you manage long conversations more effectively.

There are now three new interactive commands:

C-c $ - Set Model Context Size. Manually configure context sizes for custom or fine-tuned models.

C-c % - Toggle Context Display. Show or hide the context percentage in the status bar.

C-c C - Show Context Information. View a detailed breakdown of:


The system estimates token counts for:

This gives you a complete picture of your context usage before hitting send.

The context monitoring is not enabled by default.

<2025-05-05 Mon> 0.9.44

For some reason, when I moved the .ollama folder to an external disk, the models returned with api/tags were inconsistent, which meant it broke consistent letter assignment. I’m not sure why this happened, but it is probably sensible to sort the models alphabetically anyway, as this has the benefit of naturally grouping together model families.

I also removed the multishot feature of writing to the associated model letter. Now that I have to accommodate more than 26 models, incorporating them into the single-letter Emacs register system is all but impossible. I suspect this feature was not much used, and if you think about it, it wouldn’t have worked anyway with multiple model shots, as the register letter associated with the model would just show the most recent response. Due to these factors, I think I should remove this feature. If someone wants it back, I will probably have to design a bespoke version fully incorporated into the ollama-buddy system, as I can’t think of any other Emacs mechanism that could accommodate this.

<2025-05-05 Mon> 0.9.43

Fix model reference error exceeding 26 models #15

Update ollama-buddy to handle more than 26 models by using prefixed combinations for model references beyond ‘z’. This prevents errors in create-intro-message when the local server hosts a large number of models.

<2025-05-03 Sat> 0.9.42

Added the following to recommended models:

and fixed pull model

<2025-05-02 Fri> 0.9.41

Refactored model prefixing again so that when using only ollama models no prefix is applied and is only applied when online LLMs are selected (for example claude, chatGPT e.t.c)

I think this makes more sense and is cleaner for I suspect the majority who may use this package are probably more interested in just using ollama models and the prefix will probably be a bit confusing.

This could be a bit of a breaking change once again I’m afraid for those ollama users that have switched and are now familiar with prefixing “o:”, sorry!

<2025-05-02 Fri> 0.9.40

Added vision support for those ollama models that can support it!

Image files are now detected within a prompt and then processed if a model can support vision processing. Here’s a quick overview of how it works:

  1. Configuration: Users can configure the application to enable vision support and specify which models and image formats are supported. Vision support is enabled by default.

  2. Image Detection: When a prompt is submitted, the system automatically detects any image files referenced in the prompt.

  3. Vision Processing: If the model supports vision, the detected images are processed in relation to the defined prompt. Note that the detection of a model being vision capable is defined in ollama-buddy-vision-models and can be adjusted as required.

  4. In addition, a menu item has been added to the custom ollama buddy menu :

          [I] Analyze an Image
    

When selected, it will allow you to describe a chosen image. At some stage, I may allow integration into dired, which would be pretty neat. :)

Comments

comments powered by Disqus