Text to speech providers

Layercode supports three real-time text to speech (TTS) integrations. Each runs inside the same low-latency pipeline, but the configuration, pricing, and recommended use cases differ. Rime is the only managed (non-BYOK) option; Cartesia and ElevenLabs require your own credentials.

Cartesia (bring your own key)

Model: sonic-2, the model we configure in the Layercode pipeline.
Voices: Starts with the “Mia” preset (1d3ba41a-96e6-44ad-aabb-9817c56caa68), with support for any Cartesia voice ID.
Audio formats: Streams 16 kHz PCM by default and can downshift to 8 kHz μ-law for phone use.
Timestamps: Word-level timestamps are enabled automatically, making Cartesia ideal when you need precise interruption handling.

Use Cartesia when you already manage a Cartesia account and want detailed timestamps with full access to Cartesia’s voice library. Add your Cartesia API key on the Settings → Providers page to activate streaming; without a key we fall back to the managed Rime voice.

ElevenLabs (bring your own key)

Model: eleven_v2_5_flash, the streaming model Layercode enables by default.
Voices: Defaults to the “Alloy” voice but accepts any ElevenLabs voice ID plus optional stability/similarity controls.
Audio formats: Streams 16 kHz PCM for the web and 8 kHz μ-law for telephony scenarios.
Speed: Configure speech speed between 0.7 and 1.2 (defaults to 1.0) for slower or faster delivery.
Timestamps: Character-level alignment is requested (sync_alignment=true) so you receive live timestamps for captions and interruptions.

Choose ElevenLabs when you want to leverage your existing ElevenLabs voices or studio cloning features. Provide your ElevenLabs API key in Settings → Providers; pipelines without a key automatically move to the managed Rime voice.

Rime (managed by Layercode)

Model: mistv2, the default managed voice inside Layercode. Mist v2 delivers unmatched accuracy, speed, and customization at scale—ideal for high-volume, business-critical conversations.
Voices: Ships with “Ana” out of the box, and we frequently use “Courtney” for fallbacks; any Rime speaker ID is supported to match the tone you need.
Audio formats: Streams PCM, MP3, or μ-law depending on your transport, so it works for the web and PSTN without extra conversion.
Timestamps: Provides streaming timestamps for accurate barge-in and captioning, helping you maintain fast turn taking.

Rime is the easiest way to get started: Layercode manages the credentials, so it works immediately even if you have not supplied any third-party keys. Mist v2’s precision voices help convert prospects, retain customers, and drive sales with messages that resonate, making it a strong default when you prefer consolidated billing.

Picking the right provider

Start with Rime if you want instant setup with managed billing.
Switch to Cartesia when you own a Cartesia account and need high-fidelity voices with detailed timestamps.
Use ElevenLabs when you need ElevenLabs’ cloned voices or multilingual catalog and can provide your own key.

You can mix and match providers per pipeline, so experiment with different voices and formats to find the best fit for your experience.

Overview

SDKs

How-to guides

Explanations

Text to speech providers

Cartesia (bring your own key)

ElevenLabs (bring your own key)

Rime (managed by Layercode)

Picking the right provider

Overview

SDKs

How-to guides

Explanations

​Cartesia (bring your own key)

​ElevenLabs (bring your own key)

​Rime (managed by Layercode)

​Picking the right provider

Cartesia (bring your own key)

ElevenLabs (bring your own key)

Rime (managed by Layercode)

Picking the right provider