Transcription
Transcription converts user speech to text.- Provider and model: match your language and latency needs.
- Turn taking: automatic or push to talk. See Turn taking.
- Interrupts (automatic mode): let users speak over the agent.
Text-to-Speech (TTS)
TTS converts the agent’s text response to audio.- Provider and model: balance speed and quality.
- Voice: choose one that fits your brand and language.
Practical tips
- Start with defaults and optimize after you have an end-to-end demo.
- Prefer low-latency models for real-time conversations.
- If using your own backend, test locally with a tunnel. See Tunnelling.