Babylon.cpp / Guides / CLI Usage

CLI Usage

The babylon binary provides three subcommands — phonemize, tts, and serve — plus global flags for model configuration.

Configuration

On startup, babylon automatically loads config.json from the same directory as the executable. You can override this with --config, and any individual flag overrides its corresponding config key.

{
  "phonemizer_model": "models/open-phonemizer.onnx",
  "dictionary":       "models/dictionary.json",
  "kokoro_model":     "models/kokoro-quantized.onnx",
  "kokoro_voice":     "en-US-heart",
  "kokoro_voices":    "models/voices",
  "vits_model":       "models/curie.onnx",
  "host":             "127.0.0.1",
  "port":             8775
}

Global flags

These flags apply to all subcommands and are processed before dispatch.

Flag	Argument	Description
--config	<path>	Load a JSON config file
--phonemizer-model	<path>	Phonemizer ONNX model
--dictionary	<path>	Pronunciation dictionary JSON
--kokoro-model	<path>	Kokoro ONNX model
--kokoro-voice	<name>	Default Kokoro voice name
--kokoro-voices	<dir>	Directory of voice .bin files
--vits-model	<path>	VITS ONNX model
-h, --help	—	Show help

phonemize

Convert text to IPA phonemes using the Open Phonemizer G2P pipeline. Words found in the pronunciation dictionary are looked up directly; unknown words are handled by the neural model.

# IPA output (default)
babylon phonemize "Hello world"
# → hɛloʊ wɜːld

# Kokoro token IDs instead of IPA
babylon phonemize --tokens "Hello world"
# → [31, 29, 42, 0, 51, 17, 32, 42]

Flag	Description
--tokens	Print Kokoro token IDs instead of the IPA string
-h, --help	Show help

tts

Synthesise speech from text and write a WAV file. The Kokoro engine is the default. Switch to VITS with --vits.

# Kokoro synthesis (default voice, writes output.wav)
babylon tts "Hello world"

# Specify output path
babylon tts "Hello world" -o hello.wav

# Choose a voice and adjust speed
babylon tts --voice en-US-nova --speed 1.2 "Hello world" -o nova.wav

# Use the VITS engine
babylon tts --vits "Hello world" -o vits-out.wav

Flag	Argument	Description
--kokoro	—	Use the Kokoro engine (default)
--vits	—	Use the VITS engine
--engine	<name>	Select kokoro or vits explicitly
-v, --voice	<name>	Kokoro voice name (filename in voices dir without .bin)
--speed	<float>	Speech speed multiplier (default: 1.0)
-o	<path>	Output WAV file (default: output.wav)
-h, --help	—	Show help

Voice names are the filenames in the kokoro_voices directory without the .bin extension. For example, --voice en-US-heart maps to models/voices/en-US-heart.bin.

serve

Start a local HTTP server with the built-in web frontend. All configured models are pre-loaded on startup. The web UI is served at GET / from index.html in the same directory as the executable.

# Start on default address (127.0.0.1:8775)
babylon serve

# Expose on all interfaces, custom port
babylon serve --host 0.0.0.0 --port 9000

Flag	Argument	Description
--host	<addr>	Bind address (default: 127.0.0.1)
--port	<port>	Port number (default: 8775)
-h, --help	—	Show help

← Getting Started REST API →