Layercode pricing is transparent and simple. You only pay for what you use, in per-second increments. Silence (where the user or assistant isn’t speaking) is free. The cost per second of conversation is determined by the providers and models you choose for the transcription and text-to-speech stages of your voice pipeline. For example, you will only pay the transcription provider cost for every second of user speech which is transcribed. You only pay the text-to-speech provider cost for every second of generated speech. The specific provider costs are listed below. All costs are quoted in minutes for ease of comparison, but are charged in per-second increments at 1/60th of the per-minute rate.

In addition to the provider costs, the Layercode Platform Free is charged per-second of conversation (which is the seconds of a conversation minus any silence where the user or assistant isn’t speaking)

When using your own backend, Layercode charges no additional fee for this (as your backend will be making requests to any LLM you use to generate responses). When using our Hosted Backend, there is an additional fee per-second of conversation, which covers the LLM calls we make on your behalf.

The estimated per minute cost for a specific voice pipeline is displayed in the pipeline’s page in the Dashboard. This is based on the average conversation cost for that voice pipeline over the past 24 hours.

The cost of a conversation session is deduced from your account credits at the end of each user session. You can top up your account with credits in the Dashboard, where a history of all charges can be viewewd. New user conversation sessions will be rejected if your account balance is zero or negative. Credits do not expire and there is no minimum credit purchase.

Layercode Platform Fees

Charged per-second of conversation (when user or assistant is speaking) at 1/60th of the per-minute rate.

ProviderPrice per minute
Platform Fee$0.06
Hosted Backend Fee$0.01

Transcription

Charged per-second of user speech at 1/60th of the per-minute rate.

ProviderModelLanguagesPrice per minute
Deepgramnova-3 (English)English$0.0078

Text-to-Speech

Charged per-second of generated speech at 1/60th of the per-minute rate.

ProviderModelLanguagesPrice per minute
Cartesiasonic-2English (American/British/Australian/Southern), Spanish (Latin/Peninsula), French, Portuguese (Brazilian/European), Hindi, Chinese, Russian, Dutch, Japanese, Turkish, Korean, German, Swedish, Italian, Polish$0.06
Cartesiasonic-turboEnglish (American/British/Australian/Southern), Spanish (Latin/Peninsula), French, Portuguese (Brazilian/European), Hindi, Chinese, Russian, Dutch, Japanese, Turkish, Korean, German, Swedish, Italian, Polish$0.06
ElevenLabseleven_v2_5_flashEnglish, Hindi, Portuguese, Chinese, Spanish, French, German, Japanese, Arabic, Russian, Korean, Indonesian, Italian, Dutch, Turkish, Polish, Swedish, Norwegian, Filipino, Malay, Romanian, Hungarian, Ukrainian, Greek, Czech, Danish, Finnish, Bulgarian, Croatian, Slovak, Tamil, Vietnamese, Korean, Japanese, Arabic, Russian, Portuguese, Spanish, French, German, Italian, Dutch, Turkish, Polish, Swedish, Norwegian, Filipino, Malay, Romanian, Hungarian, Ukrainian, Greek, Czech, Danish, Finnish, Bulgarian, Croatian, Slovak, Tamil, Vietnamese$0.15

Example Costing

Suppose you use the Deepgram nova-3 (English) transcription model at $0.0078 per minute, the Cartesia sonic-2 text-to-speech model at $0.06 per minute, and the Hosted Backend at $0.01 per minute, along with the Platform Fee of $0.06 per minute.

You are only charged per second for either transcription (when the user is speaking) or text-to-speech (when the assistant is speaking)—not both at the same time. Silence (when neither is speaking) is not charged.

For each second:

  • If the user is speaking, you are charged for transcription, platform fee, and (if using Hosted Backend) the hosted backend fee.
  • If the assistant is speaking, you are charged for text-to-speech, platform fee, and (if using Hosted Backend) the hosted backend fee.
  • If there is silence, you are not charged.

Example: If a 1-minute conversation contains 20 seconds of user speech, 20 seconds of generated speech, and 20 seconds of silence, your cost would be:

  • User speech (20s): (20/60) x ($0.0078 [transcription] + $0.06 [platform] + $0.01 [hosted backend]) = $0.026
  • Assistant speech (20s): (20/60) x ($0.06 [text-to-speech] + $0.06 [platform] + $0.01 [hosted backend]) = $0.043
  • Silence (20s): $0

Total cost for the minute of time the session took: $0.024 + $0.043 = $0.067

This means you are only charged for the actual seconds of speech, and never for silence. The more silence in a conversation, the lower your total cost per minute.

Platform Features

Low-latency voice pipelinesProduction-ready, real-time voice processing with minimal delay
Global infrastructure330+ locations worldwide for reliable, fast connections
Multi-platform supportWeb, mobile, and phone (coming soon) voice agents
Speech-to-text transcriptionConvert user speech to text using leading providers
Text-to-speech synthesisConvert AI responses to natural speech
Real-time audio streamingContinuous audio capture, processing, and playback
Smart turn-takingAutomatic conversation flow with interrupt capability
Hosted BackendManaged backend option
Custom backend supportConnect your own backend with a simple webhook
Any framework supportWorks with Next.js, Express, FastAPI, and more
32+ languages supportedMulti-language transcription and speech synthesis
100+ voices availableWide selection across multiple TTS providers
Provider flexibilityEasy switching between voice model providers
No vendor lock-inSwitch providers and models without code changes
Per-second billingPay only for actual speech time, not silence
Transparent pricingUsage-based costs with consolidated billing

Limits

  • No concurrency limits - Run unlimited simultaneous conversations. Layercode is built for scale.
  • Metrics data retention period - Dashboard metrics data is retained for 90 days by default, but can be extended upon request.
  • No maximum session duration - Sessions can run indefinitely without interruption.
  • Session idle timeout - If a session has no activity for 10 minutes, it will disconnect. You can seamlessly reconnect the user to the same session if desired.