This page explains the main agent settings conceptually. Use it as a reference when deciding how to configure transcription, text-to-speech (TTS), and the backend that generates your agent’s responses.

Transcription

Transcription converts user speech to text. Configure provider, model, and turn taking behavior.
  • Provider and model: Choose for your language/latency needs.
  • Turn taking: Automatic or Push to talk. See Turn taking for details.
  • Can interrupt (automatic mode): Allow users to speak over the agent.

Text-to-Speech (TTS)

TTS converts the agent’s text response to audio.
  • Provider and model: Balance speed and quality for your use case.
  • Voice: Pick a voice that matches your brand and language.

Backend

The backend generates the agent’s text response each turn.

Practical tips

  • Start with defaults, optimize after you have an end-to-end demo.
  • Prefer low-latency models for real-time conversations.
  • If using your own backend, test locally with a tunnel. See Tunnelling.

Where to change settings

In the dashboard, open your agent and click Edit on Transcription, Text-to-Speech, or Backend. Changes apply immediately to new turns.