Overview
The WebSocket endpoint enables real-time audio streaming with low-latency. Unlike the REST API, audio chunks are delivered as they’re generated, making it ideal for:- Live voice assistants
- Real-time call center applications
- Interactive voice response (IVR) systems
- Any application requiring immediate audio feedback
Authentication
Pass your API key in theAuthorization header as a Bearer token when establishing the WebSocket connection.
Concurrency Model
Users can open multiple connections. The concurrency limit only applies when actively processing speech requests.
- You can maintain 10 open connections
- Only 5 can generate speech simultaneously
- Additional requests will throw 429 rate limit error
Request Format
Must be
"speech" for TTS requests.The text to convert to speech.
The specific voice ID to use for synthesis. See available voices below.
ISO language code for the speech output.Supported languages:
en, hi, mr, ta, te, gu, kn, ml, bn, pa, od, asIf
true, adds a WAV header to the audio stream for immediate playback.Audio sample rate in Hz.Supported values:
8000, 16000, 24000Playback speed multiplier. Range:
0.5 to 2.0, where 1.0 is normal speed.Optional custom identifier for tracking requests. Auto-generated if not provided.
Available Voices
| Voice Name | Voice ID |
|---|---|
| Kartik | Ogbs15oBevLzXsUuTtA1 |
| Rahul | Owbs15oBevLzXsUurdA_ |
| Nisha | PAbs15oBevLzXsUu4dCi |
| Tulsi | PQbt15oBevLzXsUuNtD3 |
| Seema | Pgbt15oBevLzXsUubdA6 |
Example Request
Minimal Request
Response Messages
The server sends multiple message types during a speech generation request:Connection Established
Sent immediately after successful authentication.Always
"connected"Unique identifier for this WebSocket connection.
Audio Chunks
Streamed audio data. Multiple chunks are sent per request.Always
"audio_chunk"The request identifier.
Zero-indexed position in the audio stream.
Base64-encoded audio data (PCM 16-bit, 24kHz mono by default).
Request Complete
Signals all audio has been sent.Always
"complete"The request identifier.
Total number of audio chunks sent.
Characters processed (for billing verification).
Error
Indicates a problem with the request.Always
"error"The request identifier (if available).
Error code for programmatic handling.
Human-readable error description.
Error Codes
| Code | Description |
|---|---|
invalid_request | Malformed JSON or missing required fields |
invalid_voice | Specified voice not found |
invalid_language | Unsupported language code |
text_too_long | Text exceeds maximum character limit |
concurrency_limit_exceeded | Too many simultaneous requests |
insufficient_credits | Account balance too low |
internal_error | Server-side processing error |
Complete Example
Best Practices
Connection Management
Connection Management
- Reuse WebSocket connections for multiple requests
- Implement automatic reconnection with exponential backoff
- Send periodic pings to keep connections alive (every 30 seconds)
- Close connections gracefully when no longer needed
Audio Handling
Audio Handling
- Buffer audio chunks before playback for smoother experience
- Default audio format: PCM 16-bit, 24kHz, mono
- Use Web Audio API for browser playback
- Consider using a streaming audio player for real-time playback
Error Handling
Error Handling
- Always handle the
errormessage type - Implement request timeouts (recommended: 30 seconds)
- Queue requests when concurrency limit is reached
- Log
request_idfor debugging and support

