REST API
When running babylon serve, a local HTTP server exposes four endpoints for G2P and TTS. All responses include Access-Control-Allow-Origin: *.
Starting the server
Launch the server with the serve subcommand. All configured models are pre-loaded before the server accepts connections.
babylon serve
# Listening on http://127.0.0.1:8775
# Custom host / port
babylon serve --host 0.0.0.0 --port 9000The web frontend is served at GET /. Open it in a browser to test phonemization and synthesis interactively.
Endpoints overview
| Method | Path | Description |
|---|---|---|
| GET | / | Web frontend (HTML) |
| GET | /status | Engine availability and voice count |
| GET | /voices | Sorted list of available Kokoro voice names |
| POST | /phonemize | Convert text to IPA or Kokoro token IDs |
| POST | /tts | Synthesise speech, returns audio/wav |
GET /status
Returns the availability of each engine and the number of loaded voices. Model availability is determined by whether the configured file path exists.
curl http://127.0.0.1:8775/statusResponse (application/json):
{
"phonemizer": true,
"kokoro": true,
"vits": false,
"voices": 54
}GET /voices
Returns a sorted JSON array of available Kokoro voice names — the filenames in the kokoro_voices directory without the .bin extension.
curl http://127.0.0.1:8775/voicesResponse:
["en-GB-alice", "en-US-bella", "en-US-heart", "en-US-nova", ...]POST /phonemize
Converts text to IPA phonemes or Kokoro token IDs using the Open Phonemizer G2P pipeline.
| Field | Type | Required | Description |
|---|---|---|---|
| text | string | Yes | Input text to phonemize |
| tokens | boolean | No | Return Kokoro token IDs instead of IPA (default: false) |
# IPA mode
curl -X POST http://127.0.0.1:8775/phonemize \
-H "Content-Type: application/json" \
-d '{"text": "Hello world"}'
# Response
{"phonemes": "hɛloʊ wɜːld"}
# Token mode
curl -X POST http://127.0.0.1:8775/phonemize \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "tokens": true}'
# Response
{"tokens": [31, 29, 42, 0, 51, 17, 32, 42]}POST /tts
Synthesises speech from text. On success, returns a WAV audio binary. On error, returns a JSON object with an error field.
| Field | Type | Required | Description |
|---|---|---|---|
| text | string | Yes | Input text |
| engine | string | No | kokoro (default) or vits |
| voice | string | No | Kokoro voice name; defaults to the config value |
| speed | number | No | Speech speed multiplier (default: 1.0) |
# Synthesise with default voice
curl -X POST http://127.0.0.1:8775/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello world"}' \
--output output.wav
# Choose a voice and speed
curl -X POST http://127.0.0.1:8775/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "voice": "en-US-nova", "speed": 1.2}' \
--output nova.wav
# VITS engine
curl -X POST http://127.0.0.1:8775/tts \
-H "Content-Type: application/json" \
-d '{"text": "Hello world", "engine": "vits"}' \
--output vits.wavWeb frontend
The built-in web UI at GET / communicates entirely with the local REST API and requires no external dependencies. It includes:
- Status indicator — Cyan dot when at least one engine is ready; red when none are configured.
- Engine selector — Switch between Kokoro and VITS; options are disabled when the model is unavailable.
- Voice selector — Populated from GET /voices.
- Speed slider — Kokoro speech speed from 0.5× to 2.0×.
- Phonemize — Calls POST /phonemize and displays IPA.
- Speak — Calls POST /tts and plays the WAV inline.
- Keyboard shortcut — Ctrl+Enter / Cmd+Enter triggers synthesis.