TTS Arena Docs

Provider API

The contract your TTS endpoint needs to join the arena.

The arena talks to your model through a small HTTP contract. Our router calls your endpoint server-side, downloads the audio, and proxies it to the client - so the response never reaches the browser directly and your keys stay private.

The request

We send a line of text and (optionally) a voice id. Your endpoint synthesizes it and returns audio. A typical shape:

POST https://your-api.example.com/tts
Authorization: Bearer <key>
Content-Type: application/json

{
  "text": "The quick brown fox jumps over the lazy dog.",
  "voice_id": "one-of-your-voice-ids"
}

The exact field names are flexible - we adapt a small provider adapter per model. What matters is that, given text, you return speech.

The response

Either is fine:

  • Raw audio bytes (audio/mpeg, audio/wav, …) returned directly, or
  • JSON containing base64 audio or a public URL we can download.

Common formats (mp3, wav, ogg, flac, opus) all work; we normalize on our side.

Voices

The arena cycles a fixed pool of voices rather than cloning. Provide a list of voice ids and we rotate through them across battles, so a model is judged across its range rather than a single voice.

Reliability

  • Aim to respond within ~15-30s for a sentence-length prompt.
  • Return a non-2xx (or an error payload) on failure - we record it, retry with another model, and surface failing models in our admin tooling.

Latency and success rate are tracked per model. A model that fails frequently can be temporarily timed out so it doesn't disrupt battles.

On this page