Pricing
Layercode pricing is transparent and simple. You only pay for what you use, in per-second increments. Silence (where the user or assistant isn’t speaking) is free. The cost per second of conversation is determined by the providers and models you choose for the transcription and text-to-speech stages of your voice pipeline. For example, you will only pay the transcription provider cost for every second of user speech which is transcribed. You only pay the text-to-speech provider cost for every second of generated speech. The specific provider costs are listed below. All costs are quoted in minutes for ease of comparison, but are charged in per-second increments at 1/60th of the per-minute rate.
In addition to the provider costs, the Layercode Platform Free is charged per-second of conversation (which is the seconds of a conversation minus any silence where the user or assistant isn’t speaking)
When using your own backend, Layercode charges no additional fee for this (as your backend will be making requests to any LLM you use to generate responses). When using our Hosted Backend, there is an additional fee per-second of conversation, which covers the LLM calls we make on your behalf.
The estimated per minute cost for a specific voice pipeline is displayed in the pipeline’s page in the Dashboard. This is based on the average conversation cost for that voice pipeline over the past 24 hours.
The cost of a conversation session is deduced from your account credits at the end of each user session. You can top up your account with credits in the Dashboard, where a history of all charges can be viewewd. New user conversation sessions will be rejected if your account balance is zero or negative. Credits do not expire and there is no minimum credit purchase.
Layercode Platform Fees
Charged per-second of conversation (when user or assistant is speaking) at 1/60th of the per-minute rate.
Provider | Price per minute |
---|---|
Platform Fee | $0.06 |
Hosted Backend Fee | $0.01 |
Transcription
Charged per-second of user speech at 1/60th of the per-minute rate.
Provider | Model | Languages | Price per minute |
---|---|---|---|
Deepgram | nova-3 (English) | English | $0.0078 |
Text-to-Speech
Charged per-second of generated speech at 1/60th of the per-minute rate.
Provider | Model | Languages | Price per minute |
---|---|---|---|
Cartesia | sonic-2 | English (American/British/Australian/Southern), Spanish (Latin/Peninsula), French, Portuguese (Brazilian/European), Hindi, Chinese, Russian, Dutch, Japanese, Turkish, Korean, German, Swedish, Italian, Polish | $0.06 |
Cartesia | sonic-turbo | English (American/British/Australian/Southern), Spanish (Latin/Peninsula), French, Portuguese (Brazilian/European), Hindi, Chinese, Russian, Dutch, Japanese, Turkish, Korean, German, Swedish, Italian, Polish | $0.06 |
ElevenLabs | eleven_v2_5_flash | English, Hindi, Portuguese, Chinese, Spanish, French, German, Japanese, Arabic, Russian, Korean, Indonesian, Italian, Dutch, Turkish, Polish, Swedish, Norwegian, Filipino, Malay, Romanian, Hungarian, Ukrainian, Greek, Czech, Danish, Finnish, Bulgarian, Croatian, Slovak, Tamil, Vietnamese, Korean, Japanese, Arabic, Russian, Portuguese, Spanish, French, German, Italian, Dutch, Turkish, Polish, Swedish, Norwegian, Filipino, Malay, Romanian, Hungarian, Ukrainian, Greek, Czech, Danish, Finnish, Bulgarian, Croatian, Slovak, Tamil, Vietnamese | $0.15 |
Example Costing
Suppose you use the Deepgram nova-3 (English) transcription model at $0.0078 per minute, the Cartesia sonic-2 text-to-speech model at $0.06 per minute, and the Hosted Backend at $0.01 per minute, along with the Platform Fee of $0.06 per minute.
You are only charged per second for either transcription (when the user is speaking) or text-to-speech (when the assistant is speaking)—not both at the same time. Silence (when neither is speaking) is not charged.
For each second:
- If the user is speaking, you are charged for transcription, platform fee, and (if using Hosted Backend) the hosted backend fee.
- If the assistant is speaking, you are charged for text-to-speech, platform fee, and (if using Hosted Backend) the hosted backend fee.
- If there is silence, you are not charged.
Example: If a 1-minute conversation contains 20 seconds of user speech, 20 seconds of generated speech, and 20 seconds of silence, your cost would be:
- User speech (20s): (20/60) x ($0.0078 [transcription] + $0.06 [platform] + $0.01 [hosted backend]) = $0.026
- Assistant speech (20s): (20/60) x ($0.06 [text-to-speech] + $0.06 [platform] + $0.01 [hosted backend]) = $0.043
- Silence (20s): $0
Total cost for the minute of time the session took: $0.024 + $0.043 = $0.067
This means you are only charged for the actual seconds of speech, and never for silence. The more silence in a conversation, the lower your total cost per minute.
Platform Features
Low-latency voice pipelines | Production-ready, real-time voice processing with minimal delay |
Global infrastructure | 330+ locations worldwide for reliable, fast connections |
Multi-platform support | Web, mobile, and phone (coming soon) voice agents |
Speech-to-text transcription | Convert user speech to text using leading providers |
Text-to-speech synthesis | Convert AI responses to natural speech |
Real-time audio streaming | Continuous audio capture, processing, and playback |
Smart turn-taking | Automatic conversation flow with interrupt capability |
Hosted Backend | Managed backend option |
Custom backend support | Connect your own backend with a simple webhook |
Any framework support | Works with Next.js, Express, FastAPI, and more |
32+ languages supported | Multi-language transcription and speech synthesis |
100+ voices available | Wide selection across multiple TTS providers |
Provider flexibility | Easy switching between voice model providers |
No vendor lock-in | Switch providers and models without code changes |
Per-second billing | Pay only for actual speech time, not silence |
Transparent pricing | Usage-based costs with consolidated billing |
Limits
- No concurrency limits - Run unlimited simultaneous conversations. Layercode is built for scale.
- Metrics data retention period - Dashboard metrics data is retained for 90 days by default, but can be extended upon request.
- No maximum session duration - Sessions can run indefinitely without interruption.
- Session idle timeout - If a session has no activity for 10 minutes, it will disconnect. You can seamlessly reconnect the user to the same session if desired.