logo

Mobile Artificial IntelligenceMobile AI

Babylon.cpp / Guides / CLI Usage

CLI Usage

The babylon binary provides three subcommands — phonemize, tts, and serve — plus global flags for model configuration.

Configuration

On startup, babylon automatically loads config.json from the same directory as the executable. You can override this with --config, and any individual flag overrides its corresponding config key.

{ "phonemizer_model": "models/open-phonemizer.onnx", "dictionary": "models/dictionary.json", "kokoro_model": "models/kokoro-quantized.onnx", "kokoro_voice": "en-US-heart", "kokoro_voices": "models/voices", "vits_model": "models/curie.onnx", "host": "127.0.0.1", "port": 8775 }

Global flags

These flags apply to all subcommands and are processed before dispatch.

FlagArgumentDescription
--config<path>Load a JSON config file
--phonemizer-model<path>Phonemizer ONNX model
--dictionary<path>Pronunciation dictionary JSON
--kokoro-model<path>Kokoro ONNX model
--kokoro-voice<name>Default Kokoro voice name
--kokoro-voices<dir>Directory of voice .bin files
--vits-model<path>VITS ONNX model
-h, --helpShow help

phonemize

Convert text to IPA phonemes using the Open Phonemizer G2P pipeline. Words found in the pronunciation dictionary are looked up directly; unknown words are handled by the neural model.

# IPA output (default) babylon phonemize "Hello world" # → hɛloʊ wɜːld # Kokoro token IDs instead of IPA babylon phonemize --tokens "Hello world" # → [31, 29, 42, 0, 51, 17, 32, 42]
FlagDescription
--tokensPrint Kokoro token IDs instead of the IPA string
-h, --helpShow help

tts

Synthesise speech from text and write a WAV file. The Kokoro engine is the default. Switch to VITS with --vits.

# Kokoro synthesis (default voice, writes output.wav) babylon tts "Hello world" # Specify output path babylon tts "Hello world" -o hello.wav # Choose a voice and adjust speed babylon tts --voice en-US-nova --speed 1.2 "Hello world" -o nova.wav # Use the VITS engine babylon tts --vits "Hello world" -o vits-out.wav
FlagArgumentDescription
--kokoroUse the Kokoro engine (default)
--vitsUse the VITS engine
--engine<name>Select kokoro or vits explicitly
-v, --voice<name>Kokoro voice name (filename in voices dir without .bin)
--speed<float>Speech speed multiplier (default: 1.0)
-o<path>Output WAV file (default: output.wav)
-h, --helpShow help
Voice names are the filenames in the kokoro_voices directory without the .bin extension. For example, --voice en-US-heart maps to models/voices/en-US-heart.bin.

serve

Start a local HTTP server with the built-in web frontend. All configured models are pre-loaded on startup. The web UI is served at GET / from index.html in the same directory as the executable.

# Start on default address (127.0.0.1:8775) babylon serve # Expose on all interfaces, custom port babylon serve --host 0.0.0.0 --port 9000
FlagArgumentDescription
--host<addr>Bind address (default: 127.0.0.1)
--port<port>Port number (default: 8775)
-h, --helpShow help