Maise / Guides / Maid Integration

Use Maise with Maid

Combine Maise's on-device TTS and ASR with Maid's local AI inference for a fully offline AI assistant. Local model, local voice, local transcription — no internet required at any step.

What this gives you

Maid runs AI language models directly on your Android device using llama.cpp. Maise handles voice — converting the AI's text responses into speech, and optionally transcribing your voice back to text. Together, they form a complete AI assistant pipeline where nothing leaves your device:

You speak — Maise transcribes your voice using distil-Whisper (on-device).
Maid thinks — The local LLM generates a response using llama.cpp (on-device).
Maise speaks — The response is read aloud using a Kokoro voice (on-device).

No API keys, no subscriptions, no cloud. The entire loop runs on your phone.

Requirements

Maise installed and configured as the system TTS engine (see TTS Setup).
Maid installed with a local GGUF model loaded via the Llama provider (see llama.cpp guide).
Both apps installed on the same device.

Step 1 — Set up Maise as the system TTS engine

If you haven't already, follow the TTS Setup guide to make Maise the preferred engine in Android's TTS settings. Select a voice you want to hear for AI responses — a clear, natural-sounding voice like en-US-heart-kokoro or en-US-nova-kokoro works well for conversational text.

Step 2 — Enable voice in Maid

Open Maid and go to Settings.

Scroll to Voice under the Assistant Settings section. Tap the dropdown and select a voice. Maid pulls the list from the Android TTS framework, so Maise voices appear here automatically once Maise is installed and set as the default engine.

Once a voice is selected, a speaker icon appears on assistant messages. Tap it to hear the response read aloud. The speaker becomes a volume-off icon during playback — tap to stop early.

Only the final response text is spoken — if the model uses a reasoning block (e.g. DeepSeek-R1, LFM Thinking), the internal reasoning trace is silently skipped and only the answer is read aloud.

Step 3 — Use voice input in Maid (optional)

Maid has a built-in microphone button in the chat input bar. Tapping it starts voice dictation using whatever speech recognition service is currently set as the Android default. If you've configured Maise as the system speech recognizer (see ASR Setup), Maid will use Maise's Whisper-based transcription automatically.

In the Maid chat screen, tap the microphone icon in the input bar (visible when the text field is empty).

Speak your message. Maise transcribes your speech on-device using distil-Whisper. When you stop speaking, the transcript appears in the input field.

Review the transcription and tap Send to submit it to the model.

The full offline loop

With both apps configured, here is what the full interaction looks like — entirely on-device, with zero network traffic:

You tap the microphone in Maid and ask a question by voice.

Maise ASR transcribes your speech to text using distil-Whisper.

Maid + llama.cpp generates a response using the local GGUF model.

You tap the speaker icon on the response.

Maise TTS reads the response aloud using your chosen Kokoro voice.

Choosing a model for voice interaction

For voice-based conversations, response latency matters more than it might for text-only use. Shorter responses feel more natural when spoken aloud, and faster models mean less waiting between speaking and hearing the reply.

Best balance — Gemma 2 2B or Qwen2.5 1.5B at Q4_K_M. Fast enough for natural back-and-forth.
Lower-end devices — TinyLlama 1.1B or Gemma 3 1B. Very fast, shorter responses.
Flagship phones — Qwen3 4B or Llama 3.2 3B for better quality while still being usable in conversation.

← All guides Maid llama.cpp guide →