logo

Mobile Artificial IntelligenceMobile AI

Babylon.cpp / Guides / REST API

REST API

When running babylon serve, a local HTTP server exposes four endpoints for G2P and TTS. All responses include Access-Control-Allow-Origin: *.

Starting the server

Launch the server with the serve subcommand. All configured models are pre-loaded before the server accepts connections.

babylon serve # Listening on http://127.0.0.1:8775 # Custom host / port babylon serve --host 0.0.0.0 --port 9000

The web frontend is served at GET /. Open it in a browser to test phonemization and synthesis interactively.

Endpoints overview

MethodPathDescription
GET/Web frontend (HTML)
GET/statusEngine availability and voice count
GET/voicesSorted list of available Kokoro voice names
POST/phonemizeConvert text to IPA or Kokoro token IDs
POST/ttsSynthesise speech, returns audio/wav

GET /status

Returns the availability of each engine and the number of loaded voices. Model availability is determined by whether the configured file path exists.

curl http://127.0.0.1:8775/status

Response (application/json):

{ "phonemizer": true, "kokoro": true, "vits": false, "voices": 54 }

GET /voices

Returns a sorted JSON array of available Kokoro voice names — the filenames in the kokoro_voices directory without the .bin extension.

curl http://127.0.0.1:8775/voices

Response:

["en-GB-alice", "en-US-bella", "en-US-heart", "en-US-nova", ...]

POST /phonemize

Converts text to IPA phonemes or Kokoro token IDs using the Open Phonemizer G2P pipeline.

FieldTypeRequiredDescription
textstringYesInput text to phonemize
tokensbooleanNoReturn Kokoro token IDs instead of IPA (default: false)
# IPA mode curl -X POST http://127.0.0.1:8775/phonemize \ -H "Content-Type: application/json" \ -d '{"text": "Hello world"}' # Response {"phonemes": "hɛloʊ wɜːld"} # Token mode curl -X POST http://127.0.0.1:8775/phonemize \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "tokens": true}' # Response {"tokens": [31, 29, 42, 0, 51, 17, 32, 42]}

POST /tts

Synthesises speech from text. On success, returns a WAV audio binary. On error, returns a JSON object with an error field.

FieldTypeRequiredDescription
textstringYesInput text
enginestringNokokoro (default) or vits
voicestringNoKokoro voice name; defaults to the config value
speednumberNoSpeech speed multiplier (default: 1.0)
# Synthesise with default voice curl -X POST http://127.0.0.1:8775/tts \ -H "Content-Type: application/json" \ -d '{"text": "Hello world"}' \ --output output.wav # Choose a voice and speed curl -X POST http://127.0.0.1:8775/tts \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "voice": "en-US-nova", "speed": 1.2}' \ --output nova.wav # VITS engine curl -X POST http://127.0.0.1:8775/tts \ -H "Content-Type: application/json" \ -d '{"text": "Hello world", "engine": "vits"}' \ --output vits.wav
The success response is a raw audio/wav binary. On error, a { "error": "..." } JSON body is returned instead.

Web frontend

The built-in web UI at GET / communicates entirely with the local REST API and requires no external dependencies. It includes:

  • Status indicator — Cyan dot when at least one engine is ready; red when none are configured.
  • Engine selector — Switch between Kokoro and VITS; options are disabled when the model is unavailable.
  • Voice selector — Populated from GET /voices.
  • Speed slider — Kokoro speech speed from 0.5× to 2.0×.
  • Phonemize — Calls POST /phonemize and displays IPA.
  • Speak — Calls POST /tts and plays the WAV inline.
  • Keyboard shortcutCtrl+Enter / Cmd+Enter triggers synthesis.