Babylon.cpp / Guides / REST API

REST API

When running babylon serve, a local HTTP server exposes four endpoints for G2P and TTS. All responses include Access-Control-Allow-Origin: *.

Starting the server

Launch the server with the serve subcommand. All configured models are pre-loaded before the server accepts connections.

babylon serve
# Listening on http://127.0.0.1:8775

# Custom host / port
babylon serve --host 0.0.0.0 --port 9000

The web frontend is served at GET /. Open it in a browser to test phonemization and synthesis interactively.

Endpoints overview

Method	Path	Description
GET	/	Web frontend (HTML)
GET	/status	Engine availability and voice count
GET	/voices	Sorted list of available Kokoro voice names
POST	/phonemize	Convert text to IPA or Kokoro token IDs
POST	/tts	Synthesise speech, returns audio/wav

GET /status

Returns the availability of each engine and the number of loaded voices. Model availability is determined by whether the configured file path exists.

curl http://127.0.0.1:8775/status

Response (application/json):

{
  "phonemizer": true,
  "kokoro":     true,
  "vits":       false,
  "voices":     54
}

GET /voices

Returns a sorted JSON array of available Kokoro voice names — the filenames in the kokoro_voices directory without the .bin extension.

curl http://127.0.0.1:8775/voices

Response:

["en-GB-alice", "en-US-bella", "en-US-heart", "en-US-nova", ...]

POST /phonemize

Converts text to IPA phonemes or Kokoro token IDs using the Open Phonemizer G2P pipeline.

Field	Type	Required	Description
text	string	Yes	Input text to phonemize
tokens	boolean	No	Return Kokoro token IDs instead of IPA (default: false)

# IPA mode
curl -X POST http://127.0.0.1:8775/phonemize \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world"}'

# Response
{"phonemes": "hɛloʊ wɜːld"}

# Token mode
curl -X POST http://127.0.0.1:8775/phonemize \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "tokens": true}'

# Response
{"tokens": [31, 29, 42, 0, 51, 17, 32, 42]}

POST /tts

Synthesises speech from text. On success, returns a WAV audio binary. On error, returns a JSON object with an error field.

Field	Type	Required	Description
text	string	Yes	Input text
engine	string	No	kokoro (default) or vits
voice	string	No	Kokoro voice name; defaults to the config value
speed	number	No	Speech speed multiplier (default: 1.0)

# Synthesise with default voice
curl -X POST http://127.0.0.1:8775/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world"}' \
  --output output.wav

# Choose a voice and speed
curl -X POST http://127.0.0.1:8775/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "voice": "en-US-nova", "speed": 1.2}' \
  --output nova.wav

# VITS engine
curl -X POST http://127.0.0.1:8775/tts \
  -H "Content-Type: application/json" \
  -d '{"text": "Hello world", "engine": "vits"}' \
  --output vits.wav

The success response is a raw audio/wav binary. On error, a { "error": "..." } JSON body is returned instead.

Web frontend

The built-in web UI at GET / communicates entirely with the local REST API and requires no external dependencies. It includes:

Status indicator — Cyan dot when at least one engine is ready; red when none are configured.
Engine selector — Switch between Kokoro and VITS; options are disabled when the model is unavailable.
Voice selector — Populated from GET /voices.
Speed slider — Kokoro speech speed from 0.5× to 2.0×.
Phonemize — Calls POST /phonemize and displays IPA.
Speak — Calls POST /tts and plays the WAV inline.
Keyboard shortcut — Ctrl+Enter / Cmd+Enter triggers synthesis.

← CLI Usage C / C++ API →