Vision Models with Ollama
Use Maid to send images to a vision-capable AI model running on your desktop or home server via Ollama. Vision support is automatic — no extra configuration needed beyond choosing the right model.
How it works
When you select a model in the Ollama provider, Maid queries that model's capabilities from your Ollama server. If the model reports vision support, the image attachment button is automatically enabled in the prompt input bar — no projector files, no extra settings.
All image processing happens on your Ollama host machine, not on your phone. This means you can use large, high-quality vision models (7B, 13B, 34B parameters) that would never fit in a phone's memory.
Compatible vision models
Ollama supports several vision-capable models. Pull any of the following on your host machine to use image chat in Maid:
| Model | Pull command | Notes |
|---|---|---|
| LLaVA 1.6 (7B) | ollama pull llava | Good general vision quality |
| BakLLaVA | ollama pull bakllava | Mistral base, good instruction following |
| Moondream 2 | ollama pull moondream | Lightweight, fast, good for simple tasks |
| Llama 3.2 Vision (11B) | ollama pull llama3.2-vision | Meta's latest; strong reasoning and vision |
| LLaVA-Llama3 (8B) | ollama pull llava-llama3 | Llama 3 base with vision adapter |
For the best results on a capable desktop, llama3.2-vision is recommended. For faster responses on lower-end hardware, moondream is a lightweight option.
Setup
ollama pull llama3.2-visionllama3.2-vision). Maid automatically queries the model's capabilities.Sending images
Once vision is active for the selected model:
- Tap the image icon to the left of the action button. Maid will request photo library access on first use.
- Select one or more images from your gallery.
- Selected images appear as thumbnails above the input field. Tap the × on any thumbnail to remove it before sending.
- Type your message (optional) and tap Send.
Images are encoded as base64 and sent to your Ollama server alongside the text prompt. They are processed entirely on the Ollama host and are not sent to any third-party service.
Troubleshooting
Image button is not active after selecting a model
- Confirm the selected model is a vision-capable model (e.g. llava, moondream, llama3.2-vision).
- Text-only models do not expose vision capability — the button will remain inactive.
- Try re-selecting the model from the dropdown to re-trigger the capability check.
Slow image responses
- Image encoding happens before generation. A brief delay after sending is normal.
- Larger images take longer to encode and consume more context tokens. Use moondream for faster responses on limited hardware.
- Ensure the model fits fully in your GPU VRAM for maximum throughput.