Maise / Guides / Voices

Voices & Languages Reference

Maise ships with 68 Kokoro voices across 9 languages — all bundled in the app, no downloads required. Every voice runs entirely on-device.

How to preview and change voices

Open the Maise app and go to the TTS tab.

Tap the Voice dropdown. Voices are listed with their full ID (e.g. en-US-heart-kokoro).

Select a voice, type some text into the text field, and tap Speak to preview it.

Your selection is saved automatically and becomes the system-wide voice for all apps that use Maise as the TTS engine.

The default voice is en-US-heart-kokoro. Voice IDs follow the pattern <lang>-<name>-kokoro.

Voice catalogue

All 68 voices are listed below by language. Voice names are the short identifier used in the dropdown — prefix with the language code to get the full ID (e.g. en-US-nova-kokoro).

English (US) (en-US)

alloyaoedebellaheartjessicakorenicolenovariversarahskyadamechoericfenrirliammichaelonyxpucksanta

English (UK) (en-GB)

aliceemmaisabellalilydanielfablegeorgelewis

German (de-DE)

doraalexsanta

French (fr-FR)

siwis

Greek (el-GR)

alpha-fbeta-fomega-mpsi-m

Italian (it-IT)

saranicola

Japanese (ja-JP)

alpha-fgongitsunenezumitebukurokumo

Portuguese (BR) (pt-BR)

doraalexsanta

Chinese (Simplified) (zh-CN)

xiaobeixiaonixiaoxiaoxiaoyiyunjianyunxiyunxiayunyang

Voice quality & characteristics

All voices are generated by the Kokoro neural TTS model, which produces natural-sounding speech at 24 kHz. Voice quality is consistent across the catalogue — the differences between voices are in character, accent, and speaking style rather than fidelity.

English (US) has the largest selection with 20 voices covering a range of tones — from warm and conversational (heart, bella) to clear and neutral (nova, alloy). If you're using Maise primarily for AI responses in Maid, voices like heart, nova, or echo tend to work well for conversational text.

For other languages, the number of available voices is smaller but all are production-quality. Japanese voices in particular are well-suited for both conversational and narrative text.

Reporting mispronunciations

Neural TTS can occasionally mispronounce uncommon words, proper nouns, or technical terms. If you encounter a word that is spoken incorrectly, tap Report mispronunciation in the Maise app to open a GitHub issue. Providing the exact text and the voice you were using helps the maintainers reproduce and fix the issue.

← All guides TTS setup guide