Voice Pipeline Config

A voice pipeline defines how audio flows through your voice application. It connects speech-to-text, your agent logic, and text-to-speech into a seamless real-time conversation. Each Layercode agent has a config which defines how it works. The config is currently edited in the dashboard using the pipeline editor UI. In the near future we will allow a custom config JSON to be set on a per session basis.

Config Structure

A voice pipeline config has the following structure:

{
  "clients": { ... },
  "metadata": { ... },
  "session_webhook": { ... },
  "session_duration_timeout_minutes": 30,
  "vad": { ... },
  "plugins": [ ... ]
}

plugins and clients are required. All other fields are optional.

Root-Level Options

clients

Required. Enable or disable specific client transports.Configuration:

Option	Type	Required	Default	Description
`browser`	boolean	No	`true`	Enable browser WebSocket connections.
`twilio`	boolean	No	`false`	Enable Twilio Media Streams connections.

Example:

{
  "clients": {
    "browser": true,
    "twilio": true
  }
}

metadata

Custom key-value data attached to every session. This metadata is included in webhook payloads and can be used for tracking, analytics, or passing context to your agent.Example:

{
  "metadata": {
    "environment": "production",
    "version": "1.2.0",
    "customer_tier": "enterprise"
  }
}

session_webhook

Configure webhooks for session lifecycle events. Useful for logging, analytics, or triggering external workflows when sessions start, end, or update.Configuration:

Option	Type	Required	Default	Description
`url`	string	Yes	-	Webhook endpoint URL. Must be HTTPS.
`custom_headers`	`Record<string, string>`	No	-	Additional headers to send with webhook requests.
`custom_metadata`	`Record<string, any>`	No	-	Extra metadata to include in webhook payloads.
`events`	`array<"session.start" \| "session.end" \| "session.update">`	No	All events	Which events to send to the webhook.

Example:

{
  "session_webhook": {
    "url": "https://your-server.com/webhooks/voice",
    "custom_headers": {
      "X-Custom-Header": "value"
    },
    "events": ["session.start", "session.end"]
  }
}

session_duration_timeout_minutes

Maximum session duration in minutes. Sessions automatically end after this timeout.Configuration:

Type	Required	Default	Min	Max
number	No	30	1	1440 (24 hours)

Example:

{
  "session_duration_timeout_minutes": 60
}

vad

Voice Activity Detection (VAD) configuration. VAD detects when users start and stop speaking, enabling natural turn-taking. It is enabled by default, but in some cases you may want to disable it or edit the advanced settings. In most cases you do not need to include the vad config or edit these settings.Configuration:

Option	Type	Required	Default	Description
`enabled`	boolean	No	`true`	Enable voice activity detection.
`gate_audio`	boolean	No	`true`	Only send audio to STT when speech is detected.
`buffer_frames`	number	No	`10`	Number of audio frames to buffer (0-20).
`model`	`"v5"`	No	`"v5"`	VAD model version.
`positive_speech_threshold`	number	No	-	Confidence threshold for detecting speech (0-1).
`negative_speech_threshold`	number	No	-	Confidence threshold for detecting silence (0-1).
`redemption_frames`	number	No	-	Frames of silence before ending speech detection (0-10).
`min_speech_frames`	number	No	-	Minimum frames required to count as speech (0-10).
`pre_speech_pad_frames`	number	No	-	Frames to include before detected speech (0-10).

Example:

{
  "vad": {
    "enabled": true,
    "gate_audio": true,
    "buffer_frames": 10
  }
}

Plugins

Plugins are the processing steps in your voice pipeline. They must be specified in order:

stt.* → turn_manager → agent.* → tts.*

Each plugin is configured with a use field (the plugin type) and an optional options object.

STT Plugins (Speech-to-Text)

Convert incoming audio to text transcripts. LayerCode supports two STT providers:

Provider	Key Required	Models
Deepgram	No (managed)	Flux (English, ultra-low latency), Nova-3 (multilingual)
AssemblyAI	No (managed)	Universal Streaming (English or multilingual)

Both providers are managed by LayerCode — no API keys required.

stt.deepgram

Deepgram speech-to-text with Nova-3 or Flux models.Configuration:

`model_id: "flux"`

Option	Type	Required	Default	Description
`model_id`	`"flux"`	Yes	-	Deepgram Flux STT model.
`language`	English (`en`)	No	`"en"`	Language. Flux only supports English currently.
`keyterms`	`array<string>`	No	-	Array of key terms to boost transcription accuracy for.

`model_id: "nova-3"`

Option	Type	Required	Default	Description
`model_id`	`"nova-3"`	Yes	-	Deepgram Nova STT model.
`language`	Multilingual (English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch) (`multi`), Bulgarian (`bg`), Catalan (`ca`), Czech (`cs`), Danish (`da`), Danish (Denmark) (`da-DK`), Dutch (`nl`), English (`en`), English (US) (`en-US`), English (Australia) (`en-AU`), English (UK) (`en-GB`), English (India) (`en-IN`), English (New Zealand) (`en-NZ`), Estonian (`et`), Finnish (`fi`), Flemish (`nl-BE`), French (`fr`), French (Canada) (`fr-CA`), German (`de`), German (Switzerland) (`de-CH`), Greek (`el`), Hindi (`hi`), Hungarian (`hu`), Indonesian (`id`), Italian (`it`), Japanese (`ja`), Korean (`ko`), Korean (Korea) (`ko-KR`), Latvian (`lv`), Lithuanian (`lt`), Malay (`ms`), Norwegian (`no`), Polish (`pl`), Portuguese (`pt`), Portuguese (Brazil) (`pt-BR`), Portuguese (Portugal) (`pt-PT`), Romanian (`ro`), Russian (`ru`), Slovak (`sk`), Spanish (`es`), Spanish (Latin America) (`es-419`), Swedish (`sv`), Swedish (Sweden) (`sv-SE`), Turkish (`tr`), Ukrainian (`uk`), Vietnamese (`vi`)	No	`"multi"`	Language.
`keyterms`	`array<string>`	No	-	Array of key terms to boost transcription accuracy for.

Example:

{
  "use": "stt.deepgram",
  "options": {
    "model_id": "nova-3",
    "language": "en-US",
    "keyterms": ["LayerCode", "Realpipe"]
  }
}

stt.assemblyai

AssemblyAI Universal Streaming speech-to-text. Supports English and multilingual (English, Spanish, French, German, Italian, Portuguese). Managed by LayerCode—no API key required.Configuration:

Option	Type	Required	Default	Description
`speech_model`	`"universal-streaming-english"` \| `"universal-streaming-multilingual"`	No	`"universal-streaming-english"`	Speech model. Multilingual supports English, Spanish, French, German, Italian, Portuguese.
`word_boost`	`array<string>`	No	-	Array of custom vocabulary words to boost recognition accuracy.
`end_of_turn_confidence_threshold`	number (min: 0, max: 1)	No	-	Confidence threshold (0.0-1.0) for detecting end of turn. Default: 0.4
`min_end_of_turn_silence_when_confident`	number (min: 0, max: 9007199254740991)	No	-	Minimum silence in milliseconds when confident about end of turn. Default: 400
`max_turn_silence`	number (min: 0, max: 9007199254740991)	No	-	Maximum silence in milliseconds before end of turn is triggered. Default: 1280

Example:

{
  "use": "stt.assemblyai",
  "options": {
    "speech_model": "universal-streaming-english",
    "word_boost": ["LayerCode", "Realpipe"]
  }
}

Turn Manager

Manages conversation turn-taking between user and assistant. Handles interruptions (barge-in) and determines when the user has finished speaking.

turn_manager

VAD-based turn management with configurable timeout.Configuration:

Option	Type	Required	Default	Description
`mode`	`"automatic"`	No	`"automatic"`	Turn-taking mode. Only automatic (VAD-based interruption) is supported.
`base_timeout_ms`	number (min: 500, max: 5000)	No	`2000`	Base VAD timeout in milliseconds (e.g., 500-5000). Required.
`user_silence_timeout_minutes`	unknown	No	-	User silence timeout in minutes (e.g., 1-60). Null/undefined disables the timeout.
`disable_interruptions_during_welcome`	boolean	No	`false`	Disable user interruptions during the first assistant response (welcome message).

Example:

{
  "use": "turn_manager",
  "options": {
    "base_timeout_ms": 2000,
    "disable_interruptions_during_welcome": true
  }
}

Agent Plugins

Generate AI responses from user messages. Choose one based on your use case:

agent.llm - Hosted LLM for simple conversational agents
agent.webhook - Your own HTTPS endpoint for custom logic
agent.ws - Your own WebSocket server for real-time bidirectional communication

agent.llm

Hosted LLM agent using Google Gemini or OpenAI models. Best for simple conversational agents without custom business logic.Configuration:

Option	Type	Required	Default	Description

Example (Google):

{
  "use": "agent.llm",
  "options": {
    "provider": "google",
    "model_id": "gemini-2.5-flash-lite",
    "system_prompt": "You are a helpful customer service agent for Acme Corp.",
    "welcome_message": "Hi! Welcome to Acme Corp. How can I help you today?"
  }
}

Example (OpenAI):

{
  "use": "agent.llm",
  "options": {
    "provider": "openai",
    "model_id": "gpt-4o-mini",
    "system_prompt": "You are a friendly assistant.",
    "welcome_message": "Hello! What can I help you with?"
  }
}

agent.webhook

Send user messages to your HTTPS endpoint and receive streaming responses. Best for integrating with existing backends or AI orchestration frameworks.Configuration:

Option	Type	Required	Default	Description
`url`	string	Yes	-	Webhook endpoint URL
`headers`	`Record<string, string>`	No	-	HTTP headers to send with requests
`events`	`array<`”message”`\|`”data”`\|`”session.start”`>`	No	`["message"]`	Events to forward to webhook. ‘message’ is required, ‘session.start’, ‘data’ are optional.

Example:

{
  "use": "agent.webhook",
  "options": {
    "url": "https://your-agent.example.com/voice",
    "headers": {
      "Authorization": "Bearer your-token"
    },
    "events": ["message", "session.start"]
  }
}

TTS Plugins (Text-to-Speech)

Convert agent text responses to audio. LayerCode supports three TTS providers:

Provider	Key Required	Best For
Inworld	No (managed)	High quality, low cost expressive voices
Rime	No (managed)	Expressive voices
Cartesia	Yes (BYOK)	Customers with a Cartesia account
ElevenLabs	Yes (BYOK)	Customers with an Elevenlabs account

Inworld or Rime is the easiest way to get started — LayerCode manages the credentials, so it works immediately. For Cartesia or ElevenLabs**, add your API key in Settings → Providers.

tts.rime

Rime TTS with ultra-low latency streaming. Managed by LayerCode—no API key required.Configuration:

Option	Type	Required	Default	Description
`model_id`	`"mistv2"`	Yes	-	Rime TTS model.
`voice_id`	string	No	`"courtney"`	Rime voice id.
`language`	`"eng"`, `"spa"`	No	`"eng"`	Language.

Example:

{
  "use": "tts.rime",
  "options": {
    "model_id": "mistv2",
    "voice_id": "courtney"
  }
}

tts.inworld

Inworld TTS for gaming and interactive characters with voice tuning controls. Requires your own Inworld API credentials.Configuration:

Option	Type	Required	Default	Description
`model_id`	`"inworld-tts-1"` \| `"inworld-tts-1.5-max"` \| `"inworld-tts-1.5-mini"`	No	`"inworld-tts-1"`	Inworld TTS model.
`voice_id`	string	No	`"Clive"`	Inworld voice id.
`voice_config`	object	No	-	-

voice_config options:

Option	Type	Required	Default	Description
`pitch`	number (min: -10, max: 10)	No	`1`	Voice pitch adjustment. Range: -10 to 10. Default: 1.
`speaking_rate`	number (min: 0, max: 5)	No	`0`	Speaking rate/speed. Range: 0 to 5. Default: 0.
`robotic_filter`	number (min: 0, max: 5)	No	`0`	Robotic voice filter level. Range: 0 to 5. Default: 0.

Example:

{
  "use": "tts.inworld",
  "options": {
    "model_id": "inworld-tts-1.5-max",
    "voice_id": "Clive",
    "voice_config": {
      "pitch": 1,
      "speaking_rate": 0,
      "robotic_filter": 0
    }
  }
}

tts.elevenlabs

ElevenLabs TTS with high-quality voices and extensive voice customization. Requires your own ElevenLabs API key.Configuration:

Option	Type	Required	Default	Description
`model_id`	`"eleven_v2_5_flash"`	Yes	-	ElevenLabs TTS model.
`voice_id`	string	Yes	-	ElevenLabs voice id.
`voice_settings`	object	No	-	-
`language`	English (`en`), Japanese (`ja`), Chinese (`zh`), German (`de`), Hindi (`hi`), French (`fr`), Korean (`ko`), Portuguese (`pt`), Italian (`it`), Spanish (`es`), Indonesian (`id`), Dutch (`nl`), Turkish (`tr`), Filipino (`fil`), Polish (`pl`), Swedish (`sv`), Bulgarian (`bg`), Romanian (`ro`), Arabic (`ar`), Czech (`cs`), Greek (`el`), Finnish (`fi`), Croatian (`hr`), Malay (`ms`), Slovak (`sk`), Danish (`da`), Tamil (`ta`), Ukrainian (`uk`), Russian (`ru`), Hungarian (`hu`), Norwegian (`no`), Vietnamese (`vi`)	No	`"en"`	Language.

voice_settings options:

Option	Type	Required	Default	Description
`stability`	number (min: 0, max: 1)	No	-	Defines the stability for voice settings. Default is 0.5.
`similarity_boost`	number (min: 0, max: 1)	No	-	Defines the similarity boost for voice settings. Default is 0.75.
`style`	number (min: 0, max: 1)	No	-	Defines the style for voice settings. This parameter is available on V2+ models. Default 0.
`use_speaker_boost`	boolean	No	-	Defines the use speaker boost for voice settings. This parameter is available on V2+ models. Default true.
`speed`	number (min: 0.7, max: 1.2)	No	-	Controls the speed of the generated speech. Values range from 0.7 to 1.2. Default is 1.0.

Example:

{
  "use": "tts.elevenlabs",
  "options": {
    "model_id": "eleven_v2_5_flash",
    "voice_id": "EiNlNiXeDU1pqqOPrYMO",
    "voice_settings": {
      "stability": 0.5,
      "speed": 1.0
    }
  }
}

tts.cartesia

Cartesia Sonic TTS with emotion controls and word-level timestamps. Requires your own Cartesia API key.Configuration:

`model_id: "sonic-2"`

Option	Type	Required	Default	Description
`model_id`	`"sonic-2"`	Yes	-	Cartesia Sonic 2 TTS model.
`voice_id`	string	Yes	-	Cartesia voice id.
`language`	English (`en`), French (`fr`), German (`de`), Spanish (`es`), Portuguese (`pt`), Chinese (`zh`), Japanese (`ja`), Hindi (`hi`), Italian (`it`), Korean (`ko`), Dutch (`nl`), Polish (`pl`), Russian (`ru`), Swedish (`sv`), Turkish (`tr`)	No	`"en"`	Language.

`model_id: "sonic-3"`

Option	Type	Required	Default	Description
`model_id`	`"sonic-3"`, `"sonic-3-2025-10-27"`	Yes	-	Cartesia Sonic 3 TTS model with expanded language support.
`voice_id`	string	Yes	-	Cartesia voice id.
`voice_settings`	object	No	-	-
`language`	English (`en`), French (`fr`), German (`de`), Spanish (`es`), Portuguese (`pt`), Chinese (`zh`), Japanese (`ja`), Hindi (`hi`), Italian (`it`), Korean (`ko`), Dutch (`nl`), Polish (`pl`), Russian (`ru`), Swedish (`sv`), Turkish (`tr`), Tagalog (`tl`), Bulgarian (`bg`), Romanian (`ro`), Arabic (`ar`), Czech (`cs`), Greek (`el`), Finnish (`fi`), Croatian (`hr`), Malay (`ms`), Slovak (`sk`), Danish (`da`), Tamil (`ta`), Ukrainian (`uk`), Hungarian (`hu`), Norwegian (`no`), Vietnamese (`vi`), Bengali (`bn`), Thai (`th`), Hebrew (`he`), Georgian (`ka`), Indonesian (`id`), Telugu (`te`), Gujarati (`gu`), Kannada (`kn`), Malayalam (`ml`), Marathi (`mr`), Punjabi (`pa`)	No	`"en"`	Language.

voice_settings options:

Option	Type	Required	Default	Description
`volume`	number (min: 0.5, max: 2)	No	-	Adjusts the volume of the generated speech. Values range from 0.5 to 2.0. Default 1.0.
`speed`	number (min: 0.6, max: 1.5)	No	-	Controls the speed of the generated speech. Values range from 0.6 to 1.5. Default 1.0.
`emotion`	string	No	-	Controls the emotion of the generated speech. Primary emotions are neutral, calm, angry, content, sad, scared. See docs for more options.

Example:

{
  "use": "tts.cartesia",
  "options": {
    "model_id": "sonic-3",
    "voice_id": "your-voice-id",
    "voice_settings": {
      "speed": 1.0,
      "emotion": "neutral"
    }
  }
}

Complete Examples

Simple Hosted LLM
Custom Webhook Agent
Twilio Phone Integration

A minimal configuration using LayerCode’s hosted LLM agent:

{
  "plugins": [
    {
      "use": "stt.deepgram",
      "options": { "model_id": "nova-3", "language": "en-US" }
    },
    {
      "use": "turn_manager",
      "options": { "base_timeout_ms": 2000 }
    },
    {
      "use": "agent.llm",
      "options": {
        "provider": "google",
        "model_id": "gemini-2.5-flash-lite",
        "system_prompt": "You are a helpful assistant.",
        "welcome_message": "Hi! How can I help you today?"
      }
    },
    { "use": "sentence_buffer" },
    {
      "use": "tts.rime",
      "options": { "model_id": "mistv2", "voice_id": "courtney" }
    }
  ]
}

A configuration that sends user messages to your own server:

{
  "session_webhook": {
    "url": "https://your-server.com/webhooks/session",
    "events": ["session.start", "session.end"]
  },
  "session_duration_timeout_minutes": 60,
  "plugins": [
    {
      "use": "stt.deepgram",
      "options": {
        "model_id": "flux",
        "keyterms": ["Acme", "ProductX"]
      }
    },
    {
      "use": "turn_manager",
      "options": {
        "base_timeout_ms": 1500,
        "disable_interruptions_during_welcome": true
      }
    },
    {
      "use": "agent.webhook",
      "options": {
        "url": "https://your-server.com/agent",
        "events": ["message", "session.start"]
      }
    },
    { "use": "sentence_buffer" },
    {
      "use": "tts.elevenlabs",
      "options": {
        "model_id": "eleven_v2_5_flash",
        "voice_id": "EiNlNiXeDU1pqqOPrYMO"
      }
    }
  ]
}

A configuration for Twilio phone calls with both browser and phone support:

{
  "clients": {
    "browser": true,
    "twilio": true
  },
  "session_duration_timeout_minutes": 30,
  "plugins": [
    {
      "use": "stt.deepgram",
      "options": { "model_id": "nova-3", "language": "multi" }
    },
    {
      "use": "turn_manager",
      "options": { "base_timeout_ms": 2500 }
    },
    {
      "use": "agent.llm",
      "options": {
        "provider": "google",
        "model_id": "gemini-2.5-flash-lite",
        "welcome_message": "Thank you for calling. How can I assist you?"
      }
    },
    { "use": "sentence_buffer" },
    {
      "use": "tts.cartesia",
      "options": {
        "model_id": "sonic-3",
        "voice_id": "your-voice-id"
      }
    }
  ]
}

Audio Format

The pipeline automatically handles audio format conversion based on the client type:

Client	Input Format	Output Format
Browser	PCM16	PCM16
Twilio	mulaw @ 8kHz	mulaw @ 8kHz

You don’t need to configure audio formats manually - the pipeline negotiates the correct format with each plugin automatically.

Overview

SDKs

How-to guides

Explanations

Voice Pipeline Config

Config Structure

Root-Level Options

Plugins

STT Plugins (Speech-to-Text)

`model_id: "flux"`

`model_id: "nova-3"`

Turn Manager

Agent Plugins

TTS Plugins (Text-to-Speech)

`model_id: "sonic-2"`

`model_id: "sonic-3"`

Complete Examples

Audio Format

Overview

SDKs

How-to guides

Explanations

​Config Structure

​Root-Level Options

​Plugins

​STT Plugins (Speech-to-Text)

​model_id: "flux"

​model_id: "nova-3"

​Turn Manager

​Agent Plugins

​TTS Plugins (Text-to-Speech)

​model_id: "sonic-2"

​model_id: "sonic-3"

​Complete Examples

​Audio Format

Config Structure

Root-Level Options

Plugins

STT Plugins (Speech-to-Text)

`model_id: "flux"`

`model_id: "nova-3"`

Turn Manager

Agent Plugins

TTS Plugins (Text-to-Speech)

`model_id: "sonic-2"`

`model_id: "sonic-3"`

Complete Examples

Audio Format