CLI Usage
The babylon binary provides three subcommands — phonemize, tts, and serve — plus global flags for model configuration.
Configuration
On startup, babylon automatically loads config.json from the same directory as the executable. You can override this with --config, and any individual flag overrides its corresponding config key.
{
"phonemizer_model": "models/open-phonemizer.onnx",
"dictionary": "models/dictionary.json",
"kokoro_model": "models/kokoro-quantized.onnx",
"kokoro_voice": "en-US-heart",
"kokoro_voices": "models/voices",
"vits_model": "models/curie.onnx",
"host": "127.0.0.1",
"port": 8775
}Global flags
These flags apply to all subcommands and are processed before dispatch.
| Flag | Argument | Description |
|---|---|---|
| --config | <path> | Load a JSON config file |
| --phonemizer-model | <path> | Phonemizer ONNX model |
| --dictionary | <path> | Pronunciation dictionary JSON |
| --kokoro-model | <path> | Kokoro ONNX model |
| --kokoro-voice | <name> | Default Kokoro voice name |
| --kokoro-voices | <dir> | Directory of voice .bin files |
| --vits-model | <path> | VITS ONNX model |
| -h, --help | — | Show help |
phonemize
Convert text to IPA phonemes using the Open Phonemizer G2P pipeline. Words found in the pronunciation dictionary are looked up directly; unknown words are handled by the neural model.
# IPA output (default)
babylon phonemize "Hello world"
# → hɛloʊ wɜːld
# Kokoro token IDs instead of IPA
babylon phonemize --tokens "Hello world"
# → [31, 29, 42, 0, 51, 17, 32, 42]| Flag | Description |
|---|---|
| --tokens | Print Kokoro token IDs instead of the IPA string |
| -h, --help | Show help |
tts
Synthesise speech from text and write a WAV file. The Kokoro engine is the default. Switch to VITS with --vits.
# Kokoro synthesis (default voice, writes output.wav)
babylon tts "Hello world"
# Specify output path
babylon tts "Hello world" -o hello.wav
# Choose a voice and adjust speed
babylon tts --voice en-US-nova --speed 1.2 "Hello world" -o nova.wav
# Use the VITS engine
babylon tts --vits "Hello world" -o vits-out.wav| Flag | Argument | Description |
|---|---|---|
| --kokoro | — | Use the Kokoro engine (default) |
| --vits | — | Use the VITS engine |
| --engine | <name> | Select kokoro or vits explicitly |
| -v, --voice | <name> | Kokoro voice name (filename in voices dir without .bin) |
| --speed | <float> | Speech speed multiplier (default: 1.0) |
| -o | <path> | Output WAV file (default: output.wav) |
| -h, --help | — | Show help |
serve
Start a local HTTP server with the built-in web frontend. All configured models are pre-loaded on startup. The web UI is served at GET / from index.html in the same directory as the executable.
# Start on default address (127.0.0.1:8775)
babylon serve
# Expose on all interfaces, custom port
babylon serve --host 0.0.0.0 --port 9000| Flag | Argument | Description |
|---|---|---|
| --host | <addr> | Bind address (default: 127.0.0.1) |
| --port | <port> | Port number (default: 8775) |
| -h, --help | — | Show help |