Maid / Guides / Ollama Vision

Vision Models with Ollama

Use Maid to send images to a vision-capable AI model running on your desktop or home server via Ollama. Vision support is automatic — no extra configuration needed beyond choosing the right model.

How it works

When you select a model in the Ollama provider, Maid queries that model's capabilities from your Ollama server. If the model reports vision support, the image attachment button is automatically enabled in the prompt input bar — no projector files, no extra settings.

All image processing happens on your Ollama host machine, not on your phone. This means you can use large, high-quality vision models (7B, 13B, 34B parameters) that would never fit in a phone's memory.

If you haven't set up the Ollama connection yet, start with the Ollama setup guide first.

Compatible vision models

Ollama supports several vision-capable models. Pull any of the following on your host machine to use image chat in Maid:

Model	Pull command	Notes
LLaVA 1.6 (7B)	`ollama pull llava`	Good general vision quality
BakLLaVA	`ollama pull bakllava`	Mistral base, good instruction following
Moondream 2	`ollama pull moondream`	Lightweight, fast, good for simple tasks
Llama 3.2 Vision (11B)	`ollama pull llama3.2-vision`	Meta's latest; strong reasoning and vision
LLaVA-Llama3 (8B)	`ollama pull llava-llama3`	Llama 3 base with vision adapter

For the best results on a capable desktop, llama3.2-vision is recommended. For faster responses on lower-end hardware, moondream is a lightweight option.

Setup

On your Ollama host machine, pull a vision-capable model. For example:ollama pull llama3.2-vision

Make sure Ollama is running and accepting remote connections. If you haven't done this yet, see the Ollama setup guide.

In Maid, go to Settings and select Ollama from the API dropdown. Set the Base URL to your Ollama server (or tap Find Ollama).

From the Model dropdown, select the vision model you pulled (e.g. llama3.2-vision). Maid automatically queries the model's capabilities.

If the model supports vision, the image icon becomes active in the prompt input bar on the chat screen. No additional configuration is needed.

Sending images

Once vision is active for the selected model:

Tap the image icon to the left of the action button. Maid will request photo library access on first use.
Select one or more images from your gallery.
Selected images appear as thumbnails above the input field. Tap the × on any thumbnail to remove it before sending.
Type your message (optional) and tap Send.

Images are encoded as base64 and sent to your Ollama server alongside the text prompt. They are processed entirely on the Ollama host and are not sent to any third-party service.

Troubleshooting

Image button is not active after selecting a model

Confirm the selected model is a vision-capable model (e.g. llava, moondream, llama3.2-vision).
Text-only models do not expose vision capability — the button will remain inactive.
Try re-selecting the model from the dropdown to re-trigger the capability check.

Slow image responses

Image encoding happens before generation. A brief delay after sending is normal.
Larger images take longer to encode and consume more context tokens. Use moondream for faster responses on limited hardware.
Ensure the model fits fully in your GPU VRAM for maximum throughput.

← All guides Ollama setup guide Local vision with llama.cpp