Cartesia (bring your own key)
- Model:
sonic-2
, the model we configure in the Layercode pipeline. - Voices: Starts with the “Mia” preset (
1d3ba41a-96e6-44ad-aabb-9817c56caa68
), with support for any Cartesia voice ID. - Audio formats: Streams 16 kHz PCM by default and can downshift to 8 kHz μ-law for phone use.
- Timestamps: Word-level timestamps are enabled automatically, making Cartesia ideal when you need precise interruption handling.
ElevenLabs (bring your own key)
- Model:
eleven_v2_5_flash
, the streaming model Layercode enables by default. - Voices: Defaults to the “Alloy” voice but accepts any ElevenLabs voice ID plus optional stability/similarity controls.
- Audio formats: Streams 16 kHz PCM for the web and 8 kHz μ-law for telephony scenarios.
- Timestamps: Character-level alignment is requested (
sync_alignment=true
) so you receive live timestamps for captions and interruptions.
Rime (managed by Layercode)
- Model:
mistv2
, the default managed voice inside Layercode. Mist v2 delivers unmatched accuracy, speed, and customization at scale—ideal for high-volume, business-critical conversations. - Voices: Ships with “Ana” out of the box, and we frequently use “Courtney” for fallbacks; any Rime speaker ID is supported to match the tone you need.
- Audio formats: Streams PCM, MP3, or μ-law depending on your transport, so it works for the web and PSTN without extra conversion.
- Timestamps: Provides streaming timestamps for accurate barge-in and captioning, helping you maintain fast turn taking.
Picking the right provider
- Start with Rime if you want instant setup with managed billing.
- Switch to Cartesia when you own a Cartesia account and need high-fidelity voices with detailed timestamps.
- Use ElevenLabs when you need ElevenLabs’ cloned voices or multilingual catalog and can provide your own key.