# CLI Reference Source: https://docs.layercode.com/api-reference/cli Layercode CLI command reference and usage guide. ## Installation You’ll need **npm** installed to use the CLI.\ We recommend running commands with `npx` instead of installing globally. ```bash theme={null} npx @layercode/cli ``` *** ## Commands ### `login` ```bash theme={null} npx @layercode/cli login ``` Opens a browser window to log in and link your terminal to your Layercode account. During login you’ll be prompted to choose which organization to act on behalf of. The CLI uses this selection for all follow-on commands, so make sure you pick the org that owns the agents you intend to work with. *** ### `init` ```bash theme={null} npx @layercode/cli init [--agent-id ] ``` Initializes Layercode locally, creating an example project and linking an agent. **Flags** * `--agent-id `: (optional) Link an existing agent. If not provided, a new agent will be created. *** ### `tunnel` ```bash theme={null} npx @layercode/cli tunnel [--agent-id ] [--path ] [--port ] [--tail] ``` Runs local project with cloudflared tunnel and updates your agent’s webhook URL in the Layercode dashboard. * `--agent-id=` Specify the unique identifier of the agent. If omitted, the CLI checks the `LAYERCODE_AGENT_ID` environment variable, then any entry ending with `LAYERCODE_AGENT_ID` in your project `.env` file. If nothing is found, the command will fail. * `--path=` \[default: /api/agent] Set the API path to append for the agent endpoint. * `--port=` \[default: 3000] Port number to run the tunnel on. * `--tail` Continuously stream logs, including CLI messages **Equivalent to:** ```bash theme={null} cloudflared tunnel --url http://localhost: ``` *** ## Example Usage ```bash theme={null} # Log in npx @layercode/cli login # Initialize a new local setup npx @layercode/cli init # Start a tunnel for agent abc123 on port 5173 with the api found at /api/voice-agent npx @layercode/cli tunnel --agent-id=dtv3x3d2 --port=5173 --path=/api/voice-agent --tail ``` *** ## Troubleshooting If you encounter issues: * Ensure **npm** and **Node.js** are installed and up-to-date. * Try logging out and back in with `npx @layercode/cli login`. * If you see `Failed to retrieve Agent information` when starting a tunnel, verify you are logged into the organization that owns that agent. Log out, log back in, and select the correct organization when prompted, then re-run the command. * By default, your tunnel will set your webhook url path as /api/agent. You should update this with the --path flag based on where your webhook endpoint is inside your application e.g. '/api/agent' or at the root '/' or '/voice-agent'. See our guide on [webhooks for more details](/explanations/webhooks) # Frontend WebSocket API Source: https://docs.layercode.com/api-reference/frontend-ws-api Layercode WebSocket API for browser and mobile based voice agent experiences. The Layercode Frontend WebSocket API is used to create browser and mobile based voice agent experiences. The client browser streams chunks of base64 microphone audio down the WebSocket. In response, the server returns audio chunks of the assistant's response to be played to the user. Additional trigger and data event types allow control of turns and UI updates. For most use cases, we recommend using our SDKs for React ([React Guide](/tutorials/react)) or Vanilla JS ([Vanilla JS Guide](/tutorials/vanilla-js)). This API reference is intended for advanced users who need to implement the WebSocket protocol directly. # Connecting to the WebSocket The client browser connects to the Layercode WebSocket API at the following URL: ``` wss://api.layercode.com/v1/agents/web/websocket ``` ## Authorizing the WebSocket Connection When establishing the WebSocket connection, the following query parameter must be included in the request URL: * `client_session_key`: A unique session key obtained from the Layercode REST API `/authorize` endpoint. Example full connection URL: ``` wss://api.layercode.com/v1/agents/web/websocket?client_session_key=your_client_session_key ``` To obtain a client\_session\_key, you must first create a new session for the user by calling the [Layercode REST API /authorize](/api-reference/rest-api#authorize) endpoint. This endpoint returns a client\_session\_key which must be included in the WebSocket connection parameters. This API call should be made from your backend server, not the client browser. This ensures your LAYERCODE\_API\_KEY is never exposed to the client, and allows you to do any additional user authorization checks required by your application. # WebSocket Events ## Client → Server Messages ### Client Ready When the client has established the WebSocket connection and is ready to begin streaming audio, it should send a ready message: ```json theme={null} { "type": "client.ready" } ``` ### Audio Streaming At WebSocket connection, the client should constantly send audio chunks of the user's microphone in the format below. The content must be the following format: * Base64 encoded * 16-bit PCM audio data * 8000 Hz sample rate * Mono channel See the [Vanilla JS SDK code](https://github.com/layercodedev/packages-and-docs/tree/main/packages/layercode-js-sdk/src) for an example of how browser microphone audio is correctly encoded to base64. ```json theme={null} { "type": "client.audio", "content": "base64audio" } ``` ### Voice Activity Detection Events The client can send Voice Activity Detection (VAD) events to inform the server about speech detection. This will improve the speed and accuracy of automatic turn taking: VAD detects voice activity: Note: The client is responsible for stopping any in-progress assistant audio playback when the user interrupts. ```json theme={null} { "type": "vad_events", "event": "vad_start" } ``` Detected voice activity ends: ```json theme={null} { "type": "vad_events", "event": "vad_end" } ``` Client could not load the VAD model, so VAD events won't be sent: ```json theme={null} { "type": "vad_events", "event": "vad_model_failed" } ``` ### Response Audio Replay Finished The client will receive audio chunks of the assistant's response (see [Audio Response](#audio-response)). When the client has finished replaying all assistant audio chunks in its buffer it must reply with 'client.response\_audio\_replay\_finished' Note that the assistant webhook can return response.tts events (which are turned into speech and received by the client as response.audio events) at any point during a long response (in between other text or json events), so the client must handle situations where it's played all the audio in the buffer, but then receives more to play. This will result in the client sending multiple 'trigger.response.audio.replay\_finished' completed events over a single turn. ```json theme={null} { "type": "trigger.response.audio.replay_finished", "reason": "completed", "turn_id": "UUID of assistant response" } ``` ### Push-to-Talk Control (Optional) In push-to-talk mode (read more about [Turn Taking](/explanations/turn-taking)), the client must send the following events to start and end a user turn to speak. This is typically connected to a button which is held down for the user to speak. In this mode, the client can also preemptively halt the assistant's audio playback when the user interrupts. Instead of waiting to receive a `turn.start` event (which indicates a turn change), send a `trigger.audio.replay_finished` event when the user interrupts the assistant. Start user turn (user has pressed the button): ```json theme={null} { "type": "trigger.turn.start", "role": "user" } ``` End user turn (user has released the button): ```json theme={null} { "type": "trigger.turn.end", "role": "user" } ``` ### Send Text Messages (Optional) To enable your users to send text messages (as an alternative to voice), send a text user message from your frontend in the `client.response.text` event. Layercode will send the user text message to your agent backend in the same format as a regular user transcript message. ```json theme={null} { "type": "client.response.text", "content": "Text input from the user" } ``` * `content`: The full user message. Empty or whitespace-only payloads are ignored. ### Send Structured Data (Optional) Use `client.response.data` to forward JSON data from the browser to your agent backend without interrupting or switching the current turn. The payload is relayed as a `data` webhook event. See docs page: [Send JSON data from the client](/how-tos/send-json-data). ```json theme={null} { "type": "client.response.data", "data": { "action": "select_option", "optionId": "support" } } ``` * `data`: Any JSON-serializable value that your webhook endpoint expects. Enable the `data` webhook event in your agent settings to receive these payloads. ## Server → Client Messages The client will receive the following events from Layercode: ### Turn Management When the server detects the start of the user's turn: ```json theme={null} { "type": "turn.start", "role": "user", "turn_id": "UUID of user turn" } ``` When it's the assistant's turn: ```json theme={null} { "type": "turn.start", "role": "assistant", "turn_id": "UUID of assistant turn" } ``` ### Audio Response The client will receive audio chunks of the assistant's response, which should be buffered and played immediately. The content will be audio in the following format: * Base64 encoded * 16-bit PCM audio data * 16000 Hz sample rate * Mono channel See the [Vanilla JS SDK code](https://github.com/layercodedev/packages-and-docs/tree/main/packages/layercode-js-sdk/src) for an example of how to play the audio chunks. ```json theme={null} { "type": "response.audio", "content": "base64audio", "delta_id": "UUID unique to each delta msg", "turn_id": "UUID of assistant response turn" } ``` ### Text Response The client will receive text of the assistant's response for display or processing. There are two event types: #### Streaming Text Delta As the assistant generates text, you'll receive incremental deltas: ```json theme={null} { "type": "response.text.delta", "content": "Text delta from assistant", "turn_id": "UUID of assistant response turn" } ``` #### Complete Text Once the full text is available, you'll receive the complete message: ```json theme={null} { "type": "response.text", "content": "Complete text content from assistant", "turn_id": "UUID of assistant response turn" } ``` ### User Transcript Updates Layercode streams back transcription updates for the user's speech so you can render the live transcript in your UI. #### Interim Transcript Delta Interim updates refine the current transcript in place as the speech recognizer gains confidence. Each `user.transcript.interim_delta` replaces the previous one (with a matching delta\_counter) until a `user.transcript.delta` arrives (with a matching delta\_counter). Subsequent `user.transcript.interim_delta` will have an incremented delta\_counter and should now be appended to the previous finalized `user.transcript.delta` text. ```json theme={null} { "type": "user.transcript.interim_delta", "content": "Partial user text", "turn_id": "user-UUID of the speaking turn", "delta_counter": 6 } ``` * `content`: Latest partial text heard for the in-progress user utterance. * `turn_id`: The user turn identifier (prefixed with the role for clarity). * `delta_counter`: Monotonic counter forwarded from the underlying transcription `delta.counter` to help you discard out-of-order updates. #### Transcript Delta Once the recognizer finalizes a span of text, it is emitted as a `user.transcript.delta`. Any subsequent `user.transcript.interim_delta` start a new span until the next finalized delta arrives. ```json theme={null} { "type": "user.transcript.delta", "content": "Stabilized transcript segment", "turn_id": "user-UUID of the speaking turn", "delta_counter": 6 } ``` * `content`: Stabilized transcript segment that should replace the previous interim text. * `turn_id`: The user turn identifier (prefixed with the role for clarity). * `delta_counter`: Monotonic counter forwarded from the underlying transcription `delta.counter` so you can detect missed or out-of-order deltas. #### Final Transcript Once the user's turn has been deemed complete, a final transcript is emitted. This contains the full text of the user's turn. ```json theme={null} { "type": "user.transcript", "content": "Complete transcript of user turn", "turn_id": "user-UUID of the speaking turn" } ``` ### Data and State Updates Your Webhook can return response.data SSE events, which will be forwarded to the browser client. This is ideal for updating UI and state in the browser. If you want to pass text or json deltas instead of full objects, you can simply pass a json object like `{ "delta": "text delta..." }` and accumulate and render the delta in the client browser. ```json theme={null} { "type": "response.data", "content": { "json": "object" }, "turn_id": "UUID of assistant response" } ``` # Layercode API reference Source: https://docs.layercode.com/api-reference/introduction Choose the right Layercode API surface for your workflow. Stream microphone audio from web or mobile clients and receive live agent responses over a low-latency WebSocket channel. Implement a server-side webhook that ingests transcripts and replies with SSE messages containing prompts for Layercode to speak. Manage pipelines, sessions, and analytics data programmatically with standard HTTP endpoints. Automate workflows and debugging from your shell using the Layercode command-line interface. # REST API Source: https://docs.layercode.com/api-reference/rest-api API reference for the Layercode REST API. ## Authorize Client Session To connect a client (browser or mobile app) to a Layercode voice agent, you must first authorize the session. This is done by calling the Layercode REST API endpoint below from your backend. **How the authorization flow works:** When using a Layercode frontend SDK (such as `@layercode/react-sdk` or `@layercode/js-sdk`), the SDK will automatically make a POST request to the `authorizeSessionEndpoint` URL that you specify in your frontend code. This `authorizeSessionEndpoint` should be an endpoint on **your own backend** (not Layercode's). Your backend receives this request from the frontend, then securely calls the Layercode REST API (`https://api.layercode.com/v1/agents/web/authorize_session`) using your `LAYERCODE_API_KEY`. Your backend then returns the `client_session_key` to the frontend. Your Layercode API key should never be exposed to the frontend. Always call this endpoint from your backend, then return the client\_session\_key to your frontend. ### Endpoint ```http theme={null} POST https://api.layercode.com/v1/agents/web/authorize_session ``` ### Headers Bearer token using your LAYERCODE\_API\_KEY. Must be application/json. ### Request Body The ID of the Layercode agent the client should connect to. (Optional) The conversation ID to resume an existing conversation. If not provided, a new conversation will be created. (Optional) Per-session pipeline configuration override. When provided, it is stored on the session and takes precedence over the agent's saved config for that session. Use this to customize the config options (e.g. the TTS voice). ### Response The key your frontend uses to connect to the Layercode WebSocket API. The unique conversation ID. Optional configuration for this session used by the frontend SDK. When present, it can include:
transcription.trigger and VAD settings such as vad.enabled, vad.gate\_audio, vad.buffer\_frames, vad.model, vad.positive\_speech\_threshold, vad.negative\_speech\_threshold, vad.redemption\_frames, vad.min\_speech\_frames, vad.pre\_speech\_pad\_frames, vad.frame\_samples.
### Example Request ```bash theme={null} # Example with only agent_id (creates a new session) curl -X POST https://api.layercode.com/v1/agents/web/authorize_session \ -H "Authorization: Bearer $LAYERCODE_API_KEY" \ -H "Content-Type: application/json" \ -d '{"agent_id": "ag-123456"}' # Example with agent_id and conversation_id (resumes an existing conversation) curl -X POST https://api.layercode.com/v1/agents/web/authorize_session \ -H "Authorization: Bearer $LAYERCODE_API_KEY" \ -H "Content-Type: application/json" \ -d '{"agent_id": "ag-123456", "conversation_id": "lc_conv_abc123..."}' # Example with per-session config curl -X POST https://api.layercode.com/v1/agents/web/authorize_session \ -H "Authorization: Bearer $LAYERCODE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "agent_id": "ag-123456", "config": { "type": "voice", "clients": { "browser": { "enabled": true } }, "plugins": [ { "use": "stt.deepgram", "options": { "model_id": "flux" } }, { "use": "turn_manager", "options": { "mode": "automatic" } }, { "use": "agent.llm", "options": { "provider": "google", "model_id": "gemini-2.5-flash-lite" } }, { "use": "tts.rime", "options": { "model_id": "mistv2", "voice_id": "courtney" } } ], "session_webhook": { "url": "https://example.com/session-webhook", "events": ["session.start", "session.end", "session.update"], "custom_metadata": { "tenant_id": "t_42", "plan": "enterprise" }, "custom_headers": { "x-tenant-id": "t_42", "x-session-origin": "mobile" } } } }' ``` ### Example Response ```json theme={null} { "client_session_key": "lc_sesskey_abc123...", "conversation_id": "lc_conv_abc123..." } ``` ### Error Responses Error message describing the problem. **Possible error cases:** * `400` – Invalid or missing bearer token, invalid agent ID, missing or invalid conversation ID. * `402` – Insufficient balance for the organization. **Example error response:** ```json theme={null} { "error": "insufficient balance" } ``` ### Example: Backend Endpoint (Next.js) Here's how you might implement an authorization endpoint in your backend (Next.js example): ```ts Next.js app/api/authorize/route.ts [expandable] theme={null} export const dynamic = 'force-dynamic'; import { NextResponse } from 'next/server'; export const POST = async (request: Request) => { // Here you could do any user authorization checks you need for your app const endpoint = 'https://api.layercode.com/v1/agents/web/authorize_session'; const apiKey = process.env.LAYERCODE_API_KEY; if (!apiKey) { throw new Error('LAYERCODE_API_KEY is not set.'); } const requestBody = await request.json(); if (!requestBody || !requestBody.agent_id) { throw new Error('Missing agent_id in request body.'); } try { const response = await fetch(endpoint, { method: 'POST', headers: { 'Content-Type': 'application/json', Authorization: `Bearer ${apiKey}` }, body: JSON.stringify(requestBody) }); if (!response.ok) { const text = await response.text(); throw new Error(text || response.statusText); } return NextResponse.json(await response.json()); } catch (error: any) { console.log('Layercode authorize session response error:', error.message); return NextResponse.json({ error: error.message }, { status: 500 }); } }; ``` For other backend frameworks (Express, FastAPI, etc.), the logic is the same: receive a request from your frontend, call the Layercode authorize\_session endpoint with your API key, and return the client\_session\_key to your frontend. ## Agents ### List Agents ```http theme={null} GET https://api.layercode.com/v1/agents ``` Bearer token using your LAYERCODE\_API\_KEY. #### Response Returns all agents. Each agent object includes id, name, type, agent\_template\_id, created\_at, updated\_at, and assigned\_phone\_numbers (array of phone number assignments with phone\_number, twilio\_sid, friendly\_name, assigned\_at). #### Example ```bash theme={null} curl -H "Authorization: Bearer $LAYERCODE_API_KEY" \ https://api.layercode.com/v1/agents ``` ```json theme={null} { "agents": [ { "id": "ag-123456", "name": "My Agent ag-123456", "type": "voice", "agent_template_id": "tmpl_default", "created_at": "2024-04-01T12:00:00.000Z", "updated_at": "2024-04-08T16:30:16.000Z", "assigned_phone_numbers": [ { "phone_number": "+15551234567", "twilio_sid": "PNxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx", "friendly_name": "Support Line", "assigned_at": "2024-04-02T09:21:00.000Z" } ] } ] } ``` ### Create Agent From Template ```http theme={null} POST https://api.layercode.com/v1/agents ``` Bearer token using your LAYERCODE\_API\_KEY. Must be application/json. Optional template ID to initialize the agent configuration. If omitted, the default recommended template is used. Optional display name for the new agent. If omitted, Layercode assigns a default name (e.g., My Agent ag-123456). #### Response Returns the newly created agent record, including configuration and webhook secret. Unique identifier for the agent. Human-friendly name assigned by Layercode. Agent type (currently voice). Full pipeline configuration cloned from the template. Secret used to validate incoming webhooks. ID of the template used to create the agent. ```bash theme={null} curl -X POST https://api.layercode.com/v1/agents \ -H "Authorization: Bearer $LAYERCODE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "template_id": "tmpl_sales", "name": "Sales Assistant" }' ``` ### Get Agent Details ```http theme={null} GET https://api.layercode.com/v1/agents/{agent_id} ``` Bearer token using your LAYERCODE\_API\_KEY. The ID of the agent. #### Response Returns the agent. Agent ID. Agent display name. Current pipeline configuration. Array of phone number assignments for this agent. ```bash theme={null} curl -H "Authorization: Bearer $LAYERCODE_API_KEY" \ https://api.layercode.com/v1/agents/ag-123456 ``` ### Update Agent Configuration ```http theme={null} POST https://api.layercode.com/v1/agents/{agent_id} ``` Bearer token using your LAYERCODE\_API\_KEY. Must be application/json. The ID of the agent to update. URL for production webhooks. When provided, agent.llm is swapped for agent.webhook (disabling demo\_mode and routing traffic to your backend). Optional display name to set for this agent. If omitted, the name remains unchanged. Optional display name to set for this agent. If omitted, the name remains unchanged. #### Response Returns the updated agent record with the new configuration. ```bash theme={null} curl -X POST https://api.layercode.com/v1/agents/ag-123456 \ -H "Authorization: Bearer $LAYERCODE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "webhook_url": "https://example.com/layercode-webhook" }' ``` ## Sessions ### Get Session Details ```http theme={null} GET https://api.layercode.com/v1/agents/{agent_id}/sessions/{session_id} ``` Bearer token using your LAYERCODE\_API\_KEY. The ID of the agent. The connection ID for the session. This is the unique connection identifier for a given session. #### Response Returns JSON with details about the session, transcript, and recording status. Connection ID for the session. ID of the agent. ISO timestamp when the connection started. ISO timestamp when the connection ended (if ended). Total connection duration in milliseconds. Custom metadata associated with the session. Caller phone number (Twilio), if applicable. Caller country code (Twilio), if applicable. Agent phone number (Twilio), if applicable. Agent phone number country code (Twilio), if applicable. IP address of the connection. Country code derived from IP address when available. Total seconds of user speech. Total seconds of generated speech. Processing latency in milliseconds. Array of transcript entries. Each entry includes: timestamp, user\_message, assistant\_message, latency\_ms. One of not\_available, in\_progress, completed. If recording\_status is completed, a URL to download the WAV recording for this session connection. #### Example ```bash theme={null} curl -H "Authorization: Bearer $LAYERCODE_API_KEY" \ https://api.layercode.com/v1/agents/ag-123456/sessions/lc_conn_abc123 ``` ### Download Session Recording ```http theme={null} GET https://api.layercode.com/v1/agents/{agent_id}/sessions/{session_id}/recording ``` Bearer token using your LAYERCODE\_API\_KEY. The ID of the agent. The connection ID for the session. Returns a WAV audio file if available. ```bash theme={null} curl -L -H "Authorization: Bearer $LAYERCODE_API_KEY" \ -o session.wav \ https://api.layercode.com/v1/agents/ag-123456/sessions/lc_conn_abc123/recording ``` Recordings are generated after a session completes. If a recording is still processing, the details endpoint will return recording\_status: "in\_progress". Once your frontend receives the client\_session\_key, it can connect to the Layercode WebSocket API to start streaming audio. ## Calls ### Initiate Outbound Call ```http theme={null} POST https://api.layercode.com/v1/agents/ag-123456/calls/initiate_outbound ``` The phone number assigned to your Layercode Agent that will make the call. Remember: the from\_phone\_number must be a number already assigned to your Laycode Agent in the dashboard. The phone number to call (e.g., your mobile number for testing). #### Response The unique conversation ID. A Session (associated with the returned conversation\_id) will be created shortly after once Twilio initiates the call) #### Example Request ```bash theme={null} curl -X POST https://api.layercode.com/v1/agents/ag-123456/calls/initiate_outbound \ -H 'Authorization: Bearer $LAYERCODE_API_KEY' \ -H 'Content-Type: application/json' \ -D '{ "from_phone_number": "NUMBER_ASSIGNED_TO_YOUR_AGENT", "to_phone_number": "PHONE_NUMBER_TO_CALL" }' ``` #### Example Response ```json theme={null} { "conversation_id": "lc_conv_abc123..." } ``` #### Error Responses Error message describing the problem. **Possible error cases:** * `400` – Invalid or missing bearer token, missing or request body, invalid from\_phone\_number (i.e. not assigned to the agent specified in the url). * `429` – Account session concurrency limit reached. * `402` – Insufficient balance for the organization. ## Twilio Voice ### TwiML Webhook Use this endpoint as the Voice webhook in your Twilio phone number configuration. Layercode validates the incoming request, authorizes a session, and returns TwiML that connects the call to your agent's WebSocket stream. ```http theme={null} POST https://api.layercode.com/v1/agents/twilio/twiml ``` Signature supplied by Twilio for request verification. Required when you have stored Twilio credentials in Layercode. Call direction reported by Twilio (e.g., inbound or outbound-api). Caller phone number. Caller country code supplied by Twilio. Phone number assigned to your agent. Destination country code supplied by Twilio. #### Response Returns TwiML that streams the call to the Layercode Twilio WebSocket endpoint. ```xml theme={null} ``` The response Streaming URL is generated dynamically for each request. Do not cache or reuse the client session key. # Webhook SSE API Source: https://docs.layercode.com/api-reference/webhook-sse-api Webhook SSE API ## Webhook Request Payload Layercode sends different webhook event types to your backend. Each request body is JSON. All requests include: * `type` (string): One of `message`, `data`, `session.start`, `session.end`, `session.update`. * `session_id` (string): Connection identifier for this session. Changes each reconnect. * `conversation_id` (string): Stable conversation identifier. * `custom_metadata` (object, optional): Custom metadata supplied via the pipeline configuration (or per-session override). See [custom metadata and headers how-to](/how-tos/custom-webhook-metadata-and-headers). In addition, when you set `session_webhook.custom_headers` in the pipeline config (or a per-session override), Layercode appends those headers to every webhook request. See [custom metadata and headers how-to](/how-tos/custom-webhook-metadata-and-headers). Additional fields vary by event type, as described below. *** ### **message** * `text` (string): Transcribed user text. * `session_id` (string): A unique identifier for the current session. * `conversation_id` (string): A unique identifier for the conversation. * `turn_id` (string): Unique ID for this turn. * `from_phone_number` (string, optional): Caller phone number if Twilio is used. * `to_phone_number` (string, optional): Agent phone number if Twilio is used. **Example:** ```json theme={null} { "type": "message", "session_id": "sess_abc123", "conversation_id": "conv_xyz789", "turn_id": "turn_xyz123", "text": "Hello, how are you?", "from_phone_number": "+14155550123", "to_phone_number": "+14155559876" } ``` *** ### **data** Sent when your client emits `client.response.data` to pass structured JSON without interrupting speech. * `data` (object): The arbitrary JSON payload sent by the client. * `session_id` (string): A unique identifier for the current session. * `conversation_id` (string): A unique identifier for the conversation. * `turn_id` (string): Unique ID for the current turn. * `from_phone_number` (string, optional): Caller phone number if Twilio is used. * `to_phone_number` (string, optional): Agent phone number if Twilio is used. The response to this event type should be a **regular JSON HTTP response** (not an SSE stream). See also: [Send JSON data from the client](/how-tos/send-json-data). **Example:** ```json theme={null} { "type": "data", "session_id": "sess_abc123", "conversation_id": "conv_xyz789", "turn_id": "turn_xyz123", "data": { "action": "confirm_order", "orderId": "ORD-12345", "timestamp": 1724848800000 } } ``` *** ### **session.start** Sent when a new session begins and your agent should optionally speak first. * `session_id` (string): A unique identifier for the current session. * `conversation_id` (string): A unique identifier for the conversation. * `turn_id` (string): Unique ID for the assistant welcome turn. * `from_phone_number` (string, optional): Caller phone number if Twilio is used. * `to_phone_number` (string, optional): Agent phone number if Twilio is used. **Example:** ```json theme={null} { "type": "session.start", "session_id": "sess_abc123", "conversation_id": "conv_xyz789", "turn_id": "turn_welcome_123", "from_phone_number": "+14155550123", "to_phone_number": "+14155559876" } ``` *** ### **session.update** Sent when asynchronous session data becomes available (e.g., after a recording completes). * `session_id` (string): A unique identifier for the current session. * `conversation_id` (string): A unique identifier for the conversation. * `recording_status` (string): `completed` or `failed`. * `recording_url` (string, optional): API URL to download WAV when `completed`. * `recording_duration` (number, optional): Duration in seconds. * `error_message` (string, optional): Error details when `failed`. * `metadata` (object): Session metadata originally provided during authorization (if any). * `from_phone_number` (string, optional): Caller phone number if Twilio is used. * `to_phone_number` (string, optional): Agent phone number if Twilio is used. **Example:** ```json theme={null} { "type": "session.update", "session_id": "sess_abc123", "conversation_id": "conv_xyz789", "from_phone_number": "+14155550123", "to_phone_number": "+14155559876", "recording_status": "completed", "recording_url": "https://api.layercode.com/v1/agents/ag_123/sessions/sess_abc123/recording", "recording_duration": 42.3, "metadata": { "userId": "u_123" } } ``` *** ### **session.end** Sent when the session finishes. Includes transcript and usage metrics. * `session_id` (string): A unique identifier for the current session. * `conversation_id` (string): A unique identifier for the conversation. * `agent_id` (string): Agent ID. * `started_at` / `ended_at` (string): ISO timestamps. * `duration` (number|null): Total milliseconds (if available). * `transcription_duration_seconds` (number|null) * `tts_duration_seconds` (number|null) * `latency` (number|null) * `ip_address` (string|null) * `country_code` (string|null) * `recording_status` (string): `enabled` or `disabled` (org setting for session recording). * `transcript` (array): Items of `{ role: 'user' | 'assistant', text: string, timestamp: number }`. * `from_phone_number` (string, optional): Caller phone number if Twilio is used. * `to_phone_number` (string, optional): Agent phone number if Twilio is used. **Example:** ```json theme={null} { "type": "session.end", "session_id": "sess_abc123", "conversation_id": "conv_xyz789", "agent_id": "ag_123", "from_phone_number": "+14155550123", "to_phone_number": "+14155559876", "started_at": "2025-08-28T10:00:00.000Z", "ended_at": "2025-08-28T10:03:00.000Z", "duration": 180000, "transcription_duration_seconds": 20.1, "tts_duration_seconds": 19.8, "latency": 120, "ip_address": "203.0.113.10", "country_code": "US", "recording_status": "enabled", "transcript": [ { "role": "user", "text": "Hello", "timestamp": 1724848800000 }, { "role": "assistant", "text": "Hi there!", "timestamp": 1724848805000 } ] } ``` *** ## Webhook Response Events When Layercode calls your webhook, your handler typically streams back Server-Sent Events (SSE) so the assistant can speak or update the UI. The following event types are recognized. Note: For `response.data`, you may either include it in an SSE stream, or when responding to the incoming `data` webhook event type, return it as a regular JSON HTTP response (non‑SSE). All other webhook events should return SSE events. ### **response.tts** Send spoken content for the assistant turn. Layercode converts the provided text to speech and streams it to the user. ```json theme={null} { "type": "response.tts", "content": "Let me check that for you.", "turn_id": "turn_xyz123" } ``` ### **response.data** Deliver JSON payloads to the frontend without speaking. Useful for updating dashboards or sending structured data. ```json theme={null} { "type": "response.data", "content": { "status": "looking-up-order" }, "turn_id": "turn_xyz123" } ``` JSON (non‑SSE) response example (when responding to the incoming `data` webhook event). See docs page: [Send JSON data from the client](/how-tos/send-json-data). ```json theme={null} { "type": "response.data", "content": { "status": "received", "echo": { "orderId": "ORD-12345" } }, "turn_id": "turn_xyz123" } ``` ### **response.end** Signal that the assistant has finished generating its reply for the current turn. Always emit this once the turn is complete unless you are issuing a `response.hangup`. ```json theme={null} { "type": "response.end", "turn_id": "turn_xyz123" } ``` ### **response.hangup** Request that Layercode end the session after finishing playback of the current assistant audio. Provide a farewell in the **required** `content` field. You do **not** need to send a separate `response.end`; Layercode will flush remaining audio and end the session automatically. ```json theme={null} { "type": "response.hangup", "content": "Thank you for calling. Goodbye!", "turn_id": "turn_xyz123" } ``` # Setting up AGENTS.md and CLAUDE.md (and llms.txt) Source: https://docs.layercode.com/explanations/agents-md How to set up an AGENTS.md and CLAUDE.md for working with Layercode and how to find llms.txt When working with LLMs in development with Layercode, we recommend creating an [AGENTS.md](https://agents.md/) (for most agents) and/or a [CLAUDE.md](https://www.anthropic.com/engineering/claude-code-best-practices) for Claude Code. The easiest way to add Layercode is to copy our whole docs into the file. You can find [every line of our docs in markdown here](https://docs.layercode.com/llms-full.txt), [links to our pages here](https://docs.layercode.com/llms.txt), or grab the lightweight summary at [llm.txt](https://docs.layercode.com/llm.txt). # Connect Your Backend Source: https://docs.layercode.com/explanations/connect-backend How to connect your own agent backend to a Layercode agent. Layercode is designed for maximum flexibility: you can connect any backend that can receive an HTTP request and return a Server-Sent Events (SSE) stream. This allows you to use your own LLM-powered agent, business logic, or orchestration—while Layercode handles all the real-time voice infrastructure. ## How it works To use your own backend, click the "Connect Your Backend" button on your agent, and then set the **Webhook URL** to point to your backend's endpoint. Connect Backend When a user interacts with your voice agent, Layercode will: 1. Transcribe the user's speech to text. 2. Send an HTTP POST request to your backend at the Webhook URL you provide. 3. Your backend responds with a Server-Sent Events (SSE) stream containing the agent's reply (text to be spoken, and optional data). 4. Layercode handles converting the text in your response to speech and streaming it back to the user in real time. 5. Return of JSON data is also supported to allow you to pass state back to your UI. Layercode Diagram ## Configuring Your Agent 1. In the Layercode dashboard, open your agent and click **Connect Your Backend** (or click the edit button in the Your Backend box if you've already connected your backend previously). 2. Enter your backend's **Webhook URL** in the configuration modal. 3. Optionally, configure which agent events you want to receive and (if needed) set a session webhook for lifecycle events (see below). 4. Save your changes. ## Agent Webhook Events * **message** (required):\ Sent when the user finishes speaking. Contains the transcribed message and metadata. Your backend should respond with an SSE stream containing the agent's reply. * **data** (optional): Fired when your client calls `response.data` (e.g. HTMX forms, function invocations). Useful for reacting to structured payloads without speech. * **welcome** (optional):\ Sent as soon as a session opens so your agent can greet the user proactively. If disabled, the agent waits for the user to speak first. ## Session Webhook Events Configure the optional session webhook to receive lifecycle callbacks separate from agent responses: * **session.start** – Fired when a session is authorized. Includes metadata, phone numbers, and IDs. * **session.end** – Delivered after the bridge closes along with duration, transcript history, and latency metrics. * **session.update** – Fired when recordings finish processing (e.g. transcription uploads). Requires session recording to be enabled for the org. ## Webhook Verification To ensure the security of your backend, it's crucial to verify that incoming requests are indeed from Layercode. This can be done by verifying the `layercode-signature` header, which contains a timestamp and a HMAC-SHA256 signature of the request body. Here's how you can verify the signature in your backend: 1. Retrieve the `layercode-signature` header from the request. It will be in the format: `t=timestamp,v1=signature`. 2. Get your Layercode webhook secret from the Layercode dashboard (found by going to the appropriate agent and clicking the edit button in the Your Backend box, where you'll find the Webhook Secret). 3. Reconstruct the signed payload by concatenating the timestamp, a period (`.`), and the exact raw webhook request body: `signed_payload = timestamp + "." + request_body`. 4. Compute the HMAC-SHA256 signature of this signed payload using your webhook secret. 5. Compare the computed signature with the `v1` value from the `layercode-signature` header. If they match, the request is valid. 6. (Recommended) Check that the timestamp is recent (for example, within 5 minutes) to prevent replay attacks. ## Example: Webhook Request When a user finishes speaking, Layercode will send a POST request to your webhook with the following JSON payload body: ```json theme={null} { "type": "message", // Agent events: message, data, welcome "session_id": "uuid", // Session ID is unique per conversation. Use this to know which conversation a webhook belongs to. "turn_id": "uuid", // Turn ID is unique per turn of the conversation. This ID must be returned in all SSE events. It is unique per turn of the conversation. "text": "What's the weather today?" // The user's transcribed message } ``` See the [Webhook SSE API documentation](/api-reference/webhook-sse-api) for details ## Example: SSE Response Your backend should respond with an SSE stream. Each SSE message contains a JSON payload with the following fields: `type`, `content` (when required) and `turn_id`. See the [Webhook SSE API documentation](/api-reference/webhook-sse-api) for details. # Keeping track of conversation history Source: https://docs.layercode.com/explanations/conversation-history How to persist turn-by-turn context when webhook requests can abort Tracking conversation history seems easy. But there is one big gotcha - webhook requests can abort. And it's common in voice because of interruptions. And so we need to adjust our approach. Let's start naively. A user sends a message, so we add it to an array. ```json theme={null} [ { "role": "user", "turn_id": "turn-1", "content": "Hey, how do I make a hot dog?" } ] ``` And then when the assistant replies, we simply append it: ```json theme={null} [ { "role": "user", "turn_id": "turn-1", "content": "Hey, how do I make a hot dog?" }, { "role": "assistant", "turn_id": "turn-2", "content": "You put the frankfurter in the bun and add some mustard." } ] ``` ### But what if the user interrupts? When the user interrupts mid-response, the **webhook request that was generating the assistant’s reply is abruptly terminated**.\ Unless we’ve already written something to memory, the assistant’s partial message could be lost. In practice, this happens a lot with voice agents — users cut off the model to ask something new before the previous response finishes.\ If we don’t handle this carefully, our in-memory state drifts out of sync with what actually happened in the conversation. And you might not even realize, and think the LLM is just being a silly billy. *** ## So what do I need to do? When a new user webhook arrives, persist in this order: 1. **Store the user message** right away so the turn is anchored in history. 2. **Insert the assistant placeholder** before you start streaming tokens back. ```ts theme={null} conversationMessages[conversation_id].push({ role: "user", turn_id, content: userInput }); ``` ```ts theme={null} conversationMessages[conversation_id].push({ role: "assistant", turn_id, content: "" // placeholder }); ``` If the webhook completes successfully: * Remove the placeholder and append final messages with the same `turn_id`. If the webhook is aborted: * The placeholder remains, capturing the interrupted turn. You can reconcile by marking that entry as interrupted. *** ### Why doesn't the assistant finish the turn? When a user interrupts, Layercode immediately cancels the webhook request that was streaming the assistant response.\ Because the request terminates, your worker never has a chance to finalize the response or append it to history.\ There is currently no back-channel for Layercode to notify your backend gracefully — cancelling the request is the only interruption signal we can provide. This is why persisting the placeholder before you stream tokens is essential. ### Do I get an `AbortSignal`? Layercode does not propagate a custom `AbortSignal` into your AI SDK calls.\ Instead, the framework relies on the platform aborting the request (Cloudflare Workers receive the native `ExecutionContext` cancellation). Make sure any long-running model or fetch calls can tolerate the request being torn down mid-stream; the placeholder you stored lets you recover once the next webhook arrives. ### What about multiple interruptions in a row? Even if a user interrupts several turns back-to-back, the placeholder pattern above keeps your transcript accurate. Persist placeholders as soon as the new webhook starts (before any expensive work) so they survive if another interruption happens quickly afterward. *** ## Stored Message Shape and `turn_id` Every stored message (user and assistant) includes a `turn_id` corresponding to the webhook event that created it: ```ts theme={null} { role: 'user', turn_id: , content: '...' } { role: 'assistant', turn_id: , content: '...' } ``` The initial system message does **not** have a `turn_id`. *** ## Persistence Notes * There is no deduplication or idempotency handling yet in Layercode. So you will need to write logic to filter this. *** ## TL;DR ✅ Always store user messages immediately.\ ✅ Add a placeholder assistant message before streaming.\ ✅ Replace or mark the placeholder when the turn finishes or is interrupted.\ ✅ Never rely on the webhook completing — it might abort anytime.\ ✅ Keep `turn_id` and `conversation_id` consistent for reconciliation. # How connecting to Layercode works Source: https://docs.layercode.com/explanations/how-connect-works Visual diagram of how your app connects to Layercode ## Fresh Page Load (New Conversation) ```mermaid theme={null} sequenceDiagram participant UI as Browser UI participant SDK as LayercodeClient (JS SDK) participant Auth as POST /v1/agents/web/authorize_session participant DB (sessions + conversations) participant WS as GET /v1/agents/web/websocket participant Pipeline as Voice Pipeline Worker UI->>SDK: instantiate client.connect() SDK->>Auth: POST { agent_id, metadata, sdk_version } Auth->>DB: validate pipeline/org and insert conversation + session DB-->>Auth: client_session_key + conversation_id Auth-->>SDK: { client_session_key, conversation_id, config } SDK->>WS: WebSocket upgrade ?client_session_key=... WS->>DB: lookup session via client_session_key WS->>Pipeline: start voicePipeline(session) Pipeline-->>SDK: streaming audio + events SDK-->>UI: onConnect({ conversationId, config }) ``` * `authorizeSession` creates the conversation record when no `conversation_id` exists, allocates a session row, and returns a 1-hour `client_session_key`. * The browser client must include a valid bearer token (API key) when proxying to the authorize endpoint. *** ## Page Load With Stored Conversation ```mermaid theme={null} sequenceDiagram participant UI as Browser UI (resuming) participant SDK as LayercodeClient participant Auth as POST /v1/agents/web/authorize_session participant DB (sessions + conversations) participant WS as GET /v1/agents/web/websocket participant Pipeline as Voice Pipeline Worker UI->>SDK: client.connect() SDK->>Auth: POST { agent_id, conversation_id } Auth->>DB: fetch conversation + pipeline, create new session key DB-->>Auth: verify ownership, persist session Auth-->>SDK: { client_session_key, conversation_id, config } SDK->>WS: WebSocket upgrade using new client_session_key WS->>DB: validate session + pipeline balance WS->>Pipeline: resume conversation context Pipeline-->>SDK: stream resumes with existing turn state ``` * The SDK automatically reconnects to an existing conversation if a `conversationId` is cached. * To start fresh, create a new client with `conversationId = null`. * Re-authorizing rotates the `client_session_key`, so old WebSocket URLs stop working once a resume happens. *** ## Network Drop and Manual Reconnect ```mermaid theme={null} sequenceDiagram participant UI as Browser UI participant SDK as LayercodeClient participant WS as WebSocket Connection participant Auth as POST /v1/agents/web/authorize_session participant DB (sessions + conversations) participant Pipeline as Voice Pipeline Worker WS-xSDK: network drop / close event SDK->>SDK: _performDisconnectCleanup() (status=disconnected) SDK-->>UI: onDisconnect() (show reconnect) UI->>SDK: user clicks reconnect SDK->>Auth: POST { agent_id, conversation_id } Auth->>DB: create fresh session + ensure balance Auth-->>SDK: { client_session_key, conversation_id, config } SDK->>WS: establish new WebSocket ?client_session_key=... WS->>Pipeline: restart transport against same conversation Pipeline-->>SDK: continue streaming and emit onConnect({ conversationId, config }) ``` * Device listeners, VAD, and amplitude monitors are rebuilt on reconnect. * The cached `conversationId` persists, so the next `authorize` call resumes seamlessly. * To force a fresh run after a drop, instantiate a new client with `conversationId = null` before reconnecting. # How Layercode works Source: https://docs.layercode.com/explanations/how-layercode-works What Layercode does and how you can get setup with Layercode Layercode architecture diagram Layercode is a real-time voice agent orchestration layer built on Cloudflare Workers. It handles the entire audio transport so you can ship production-grade voice AI agents without managing WebRTC, browser audio, or speech infrastructure yourself. From your perspective as a developer, Layercode is pretty much **text in / text out**: 1. Layercode captures the caller’s audio, runs speech-to-text (STT), and sends the transcribed text to your backend webhook. 2. Your backend decides what to do — calling an LLM, tools, or business logic — and responds with the text you want the user to hear. 3. Layercode turns that text into speech (TTS) and streams it back to the user in real time. ## Authentication and Session Model Layercode routes every client through an authorize → WebSocket handshake so you can govern sessions centrally. ### Client Authentication Flow 1. Your frontend calls your backend (e.g., `/api/authorize`) with user context. 2. The backend requests `POST /v1/agents/web/authorize_session` with `agent_id` and the org-scoped API key. 3. Layercode returns a time-bounded `client_session_key` plus the `conversation_id`. 4. The frontend connects to `/v1/agents/web/websocket?client_session_key=...` using the Layercode SDK. See the [REST API reference](/api-reference/rest-api) and [Frontend WebSocket docs](/api-reference/frontend-ws-api) for field-level details. ### Agent Webhook Flow 1. Layercode sends signed POST requests (HMAC via `layercode-signature`) to your webhook. 2. Verify requests with `verifySignature` from `@layercode/node-server-sdk` using `LAYERCODE_WEBHOOK_SECRET`. 3. Handle events such as `session.start`, `message`, `session.update`, and `session.end`. The `message` event includes the transcription and conversation identifiers. 4. Respond by calling `streamResponse(payload, handler)` and emitting `stream.tts()`, `stream.data()`, or tool call results. Always call `stream.end()` even for silent turns. Minimal example of sending a welcome message to users: ```ts theme={null} import express from "express"; import { streamResponse } from "@layercode/node-server-sdk"; const app = express(); app.use(express.json()); app.post("/agent", async (req, res) => { return streamResponse(req.body, async ({ stream }) => { stream.tts("Hi, how can I help you today?"); stream.end(); }); }); ``` ### Receiving messages from the client (user) Every Layercode webhook request includes the transcribed user utterance so your backend never has to handle raw audio. A typical payload contains: ```json theme={null} { "type": "message", "session_id": "sess_123", "conversation_id": "conv_456", "text": "What is our return policy?" } ``` ### Generating LLM responses and replying Once you have a response string (or stream) from your model, send it back through the `stream` helper. You can optionally stream interim data to the UI while you wait on the final text. ```ts theme={null} import { streamText } from "ai"; import { google } from "@ai-sdk/google"; import { streamResponse } from "@layercode/node-server-sdk"; app.post("/agent", async (req, res) => { const { type, text } = req.body; return streamResponse(req.body, async ({ stream }) => { if (type !== "message") { stream.end(); return; } const { textStream } = await streamText({ model: google("gemini-2.0-flash-001"), messages: [ { role: "system", content: "You are a helpful assistant." }, { role: "user", content: text }, ], }); await stream.ttsTextStream(textStream); stream.end(); }); }); ``` That’s the full loop: Layercode gives you user text, you return assistant text. Layercode handles buffering, chunking, and converting that text back into speech for the caller. ## Summary: what Layercode does and doesn't do ### What Layercode does * Connects browsers, mobile apps, or telephony clients to a single real-time voice pipeline. * Streams user audio, performs STT (Deepgram today, more providers coming), and delivers plain text to your webhook in milliseconds. * Accepts your text responses and converts them into low-latency speech using ElevenLabs, Cartesia, or Rime—bring your own keys or use Layercode-managed ones. * Manages turn taking (auto VAD or push-to-talk), jitter buffering, and session lifecycle so conversations feel natural. * Provides dashboards for observability, session recording, latency analytics, and agent configuration without redeploys. ### What Layercode doesn't do * Host your web app or backend logic — you run your own servers and own your customer state. * Provide the LLM or agent brain—you choose the model, prompts, and tool integrations. Layercode only transports text to and from your system. * Guarantee tool execution or business workflows — that remains inside your infrastructure; Layercode just keeps the audio loop in sync. * Currently, Layercode does not support real time Speech to Speech models # Reducing latency with Layercode Source: https://docs.layercode.com/explanations/latency How to reduce latency with your voice ai agents. Reducing latency and - especially reducing time-to-first-token - is important for natural-sounding conversations. There are some things that we will always work hard on reducing (e.g. transporting your audio across the internet). But some latency is based on choices and trade-offs you can make. And there are some things that won't reduce latency directly but may reduce the feeling of latency. A lot of these are even more important if you are doing tool calls or letting agents run in loops - this could take a long time to complete. Here are some tips that could help you reduce latency (or perceived latency) with your voice agents: 1. **Pick a low-TTFT model.** We currently recommend Gemini 2.5 Flash Lite because it delivers quick time-to-first-token. Avoid “thinking” or reasoning-extended variants unless you explicitly need them—they trade large amounts of latency for marginal quality gains in spoken conversations. 2. **Prime the user with speech before long work.** Inside a tool call, send a `response.tts` event such as “Let me look that up for you” before you start heavy processing. The SDK will surface it to the client as audio immediately, buying you time without leaving silence. See [the tool calling how-to](/how-tos/tool-calling-js#sending-speech-to-the-user-to-tell-them-a-call-is-happening) for an example. 3. **Keep users informed during long tool calls.** Emit a `response.data` message as soon as the work starts so the UI can surface a loader or status update—see [Sending data to the client](/how-tos/sending-data-to-client) and the API reference for [Data and state updates](/api-reference/frontend-ws-api#data-and-state-updates). You can also play a short “thinking” audio clip in the browser so the user hears that the agent is still busy. 4. **Be deliberate with RAG.** Running retrieval on every turn (especially in loops) adds network hops and can stall a conversation. Fetch external data through tool calls only when it’s needed, and narrate what the agent is doing so the user understands the delay. 5. **Reduce infrastructure round trips.** Store conversations in a fast, nearby database—Redis is a good default—and keep ancillary services in the same region as your Layercode deployment to avoid cross-region latency spikes. # Tool calling Source: https://docs.layercode.com/explanations/tool-calling How to set up tool calling with Layercode. Also known as function calling. Function calling is one of the first things you will want to do after setting up your agent. Because Layercode let's you work directly with text, you can use existing tools. There are many frameworks which can help you with function calling. ## TypeScript: * [ai SDK](https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling) * [mastra](https://mastra.ai/en/examples/tools/calling-tools#from-an-agent) - see [example here](https://github.com/jackbridger?tab=repositories) ## Python: * [LlamaIndex](https://docs.llamaindex.ai/en/stable/module_guides/deploying/agents/tools/) * [LangChain](https://python.langchain.com/docs/concepts/tool_calling/) * [CrewAI](https://docs.crewai.com/en/concepts/tools) We have written a guide on [tool calling in Next.js with Layercode](/how-tos/tool-calling-js) # Voice Pipeline Config Source: https://docs.layercode.com/explanations/voice-pipeline Learn how to configure LayerCode's voice pipeline for real-time conversations A voice pipeline defines how audio flows through your voice application. It connects speech-to-text, your agent logic, and text-to-speech into a seamless real-time conversation. Each Layercode agent has a config which defines how it works. The config is currently edited in the dashboard using the pipeline editor UI. In the near future we will allow a custom config JSON to be set on a per session basis. *** ## Config Structure A voice pipeline config has the following structure: ```json theme={null} { "clients": { ... }, "metadata": { ... }, "session_webhook": { ... }, "session_duration_timeout_minutes": 30, "vad": { ... }, "plugins": [ ... ] } ``` `plugins` and `clients` are required. All other fields are optional. *** ## Root-Level Options **Required.** Enable or disable specific client transports. **Configuration:** | Option | Type | Required | Default | Description | | --------- | ------- | -------- | ------- | ---------------------------------------- | | `browser` | boolean | No | `true` | Enable browser WebSocket connections. | | `twilio` | boolean | No | `false` | Enable Twilio Media Streams connections. | **Example:** ```json theme={null} { "clients": { "browser": true, "twilio": true } } ``` Custom key-value data attached to every session. This metadata is included in webhook payloads and can be used for tracking, analytics, or passing context to your agent. **Example:** ```json theme={null} { "metadata": { "environment": "production", "version": "1.2.0", "customer_tier": "enterprise" } } ``` Configure webhooks for session lifecycle events. Useful for logging, analytics, or triggering external workflows when sessions start, end, or update. **Configuration:** | Option | Type | Required | Default | Description | | ----------------- | ------------------------------------------------------------- | -------- | ---------- | ------------------------------------------------- | | `url` | string | Yes | - | Webhook endpoint URL. Must be HTTPS. | | `custom_headers` | `Record` | No | - | Additional headers to send with webhook requests. | | `custom_metadata` | `Record` | No | - | Extra metadata to include in webhook payloads. | | `events` | `array<"session.start" \| "session.end" \| "session.update">` | No | All events | Which events to send to the webhook. | **Example:** ```json theme={null} { "session_webhook": { "url": "https://your-server.com/webhooks/voice", "custom_headers": { "X-Custom-Header": "value" }, "events": ["session.start", "session.end"] } } ``` Maximum session duration in minutes. Sessions automatically end after this timeout. **Configuration:** | Type | Required | Default | Min | Max | | ------ | -------- | ------- | --- | --------------- | | number | No | 30 | 1 | 1440 (24 hours) | **Example:** ```json theme={null} { "session_duration_timeout_minutes": 60 } ``` Voice Activity Detection (VAD) configuration. VAD detects when users start and stop speaking, enabling natural turn-taking. It is enabled by default, but in some cases you may want to disable it or edit the advanced settings. In most cases you do not need to include the vad config or edit these settings. **Configuration:** | Option | Type | Required | Default | Description | | --------------------------- | ------- | -------- | ------- | -------------------------------------------------------- | | `enabled` | boolean | No | `true` | Enable voice activity detection. | | `gate_audio` | boolean | No | `true` | Only send audio to STT when speech is detected. | | `buffer_frames` | number | No | `10` | Number of audio frames to buffer (0-20). | | `model` | `"v5"` | No | `"v5"` | VAD model version. | | `positive_speech_threshold` | number | No | - | Confidence threshold for detecting speech (0-1). | | `negative_speech_threshold` | number | No | - | Confidence threshold for detecting silence (0-1). | | `redemption_frames` | number | No | - | Frames of silence before ending speech detection (0-10). | | `min_speech_frames` | number | No | - | Minimum frames required to count as speech (0-10). | | `pre_speech_pad_frames` | number | No | - | Frames to include before detected speech (0-10). | **Example:** ```json theme={null} { "vad": { "enabled": true, "gate_audio": true, "buffer_frames": 10 } } ``` *** ## Plugins Plugins are the processing steps in your voice pipeline. They must be specified in order: ``` stt.* → turn_manager → agent.* → tts.* ``` Each plugin is configured with a `use` field (the plugin type) and an optional `options` object. ### STT Plugins (Speech-to-Text) Convert incoming audio to text transcripts. LayerCode supports two STT providers: | Provider | Key Required | Models | | -------------- | ------------ | -------------------------------------------------------- | | **Deepgram** | No (managed) | Flux (English, ultra-low latency), Nova-3 (multilingual) | | **AssemblyAI** | No (managed) | Universal Streaming (English or multilingual) | Both providers are managed by LayerCode — no API keys required. Deepgram speech-to-text with Nova-3 or Flux models. **Configuration:** ### `model_id: "flux"` | Option | Type | Required | Default | Description | | ---------- | --------------- | -------- | ------- | ------------------------------------------------------- | | `model_id` | `"flux"` | Yes | - | Deepgram Flux STT model. | | `language` | English (`en`) | No | `"en"` | Language. Flux only supports English currently. | | `keyterms` | `array` | No | - | Array of key terms to boost transcription accuracy for. | ### `model_id: "nova-3"` | Option | Type | Required | Default | Description | | ---------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | --------- | ------------------------------------------------------- | | `model_id` | `"nova-3"` | Yes | - | Deepgram Nova STT model. | | `language` | Multilingual (English, Spanish, French, German, Hindi, Russian, Portuguese, Japanese, Italian, and Dutch) (`multi`), Bulgarian (`bg`), Catalan (`ca`), Czech (`cs`), Danish (`da`), Danish (Denmark) (`da-DK`), Dutch (`nl`), English (`en`), English (US) (`en-US`), English (Australia) (`en-AU`), English (UK) (`en-GB`), English (India) (`en-IN`), English (New Zealand) (`en-NZ`), Estonian (`et`), Finnish (`fi`), Flemish (`nl-BE`), French (`fr`), French (Canada) (`fr-CA`), German (`de`), German (Switzerland) (`de-CH`), Greek (`el`), Hindi (`hi`), Hungarian (`hu`), Indonesian (`id`), Italian (`it`), Japanese (`ja`), Korean (`ko`), Korean (Korea) (`ko-KR`), Latvian (`lv`), Lithuanian (`lt`), Malay (`ms`), Norwegian (`no`), Polish (`pl`), Portuguese (`pt`), Portuguese (Brazil) (`pt-BR`), Portuguese (Portugal) (`pt-PT`), Romanian (`ro`), Russian (`ru`), Slovak (`sk`), Spanish (`es`), Spanish (Latin America) (`es-419`), Swedish (`sv`), Swedish (Sweden) (`sv-SE`), Turkish (`tr`), Ukrainian (`uk`), Vietnamese (`vi`) | No | `"multi"` | Language. | | `keyterms` | `array` | No | - | Array of key terms to boost transcription accuracy for. | **Example:** ```json theme={null} { "use": "stt.deepgram", "options": { "model_id": "nova-3", "language": "en-US", "keyterms": ["LayerCode", "Realpipe"] } } ``` AssemblyAI Universal Streaming speech-to-text. Supports English and multilingual (English, Spanish, French, German, Italian, Portuguese). Managed by LayerCode—no API key required. **Configuration:** | Option | Type | Required | Default | Description | | ---------------------------------------- | ----------------------------------------------------------------------- | -------- | ------------------------------- | ------------------------------------------------------------------------------------------ | | `speech_model` | `"universal-streaming-english"` \| `"universal-streaming-multilingual"` | No | `"universal-streaming-english"` | Speech model. Multilingual supports English, Spanish, French, German, Italian, Portuguese. | | `word_boost` | `array` | No | - | Array of custom vocabulary words to boost recognition accuracy. | | `end_of_turn_confidence_threshold` | number (min: 0, max: 1) | No | - | Confidence threshold (0.0-1.0) for detecting end of turn. Default: 0.4 | | `min_end_of_turn_silence_when_confident` | number (min: 0, max: 9007199254740991) | No | - | Minimum silence in milliseconds when confident about end of turn. Default: 400 | | `max_turn_silence` | number (min: 0, max: 9007199254740991) | No | - | Maximum silence in milliseconds before end of turn is triggered. Default: 1280 | **Example:** ```json theme={null} { "use": "stt.assemblyai", "options": { "speech_model": "universal-streaming-english", "word_boost": ["LayerCode", "Realpipe"] } } ``` ### Turn Manager Manages conversation turn-taking between user and assistant. Handles interruptions (barge-in) and determines when the user has finished speaking. VAD-based turn management with configurable timeout. **Configuration:** | Option | Type | Required | Default | Description | | -------------------------------------- | ---------------------------- | -------- | ------------- | ---------------------------------------------------------------------------------- | | `mode` | `"automatic"` | No | `"automatic"` | Turn-taking mode. Only automatic (VAD-based interruption) is supported. | | `base_timeout_ms` | number (min: 500, max: 5000) | No | `2000` | Base VAD timeout in milliseconds (e.g., 500-5000). Required. | | `user_silence_timeout_minutes` | unknown | No | - | User silence timeout in minutes (e.g., 1-60). Null/undefined disables the timeout. | | `disable_interruptions_during_welcome` | boolean | No | `false` | Disable user interruptions during the first assistant response (welcome message). | **Example:** ```json theme={null} { "use": "turn_manager", "options": { "base_timeout_ms": 2000, "disable_interruptions_during_welcome": true } } ``` ### Agent Plugins Generate AI responses from user messages. Choose one based on your use case: * **`agent.llm`** - Hosted LLM for simple conversational agents * **`agent.webhook`** - Your own HTTPS endpoint for custom logic * **`agent.ws`** - Your own WebSocket server for real-time bidirectional communication Hosted LLM agent using Google Gemini or OpenAI models. Best for simple conversational agents without custom business logic. **Configuration:** | Option | Type | Required | Default | Description | | ------ | ---- | -------- | ------- | ----------- | **Example (Google):** ```json theme={null} { "use": "agent.llm", "options": { "provider": "google", "model_id": "gemini-2.5-flash-lite", "system_prompt": "You are a helpful customer service agent for Acme Corp.", "welcome_message": "Hi! Welcome to Acme Corp. How can I help you today?" } } ``` **Example (OpenAI):** ```json theme={null} { "use": "agent.llm", "options": { "provider": "openai", "model_id": "gpt-4o-mini", "system_prompt": "You are a friendly assistant.", "welcome_message": "Hello! What can I help you with?" } } ``` Send user messages to your HTTPS endpoint and receive streaming responses. Best for integrating with existing backends or AI orchestration frameworks. **Configuration:** | Option | Type | Required | Default | Description | | --------- | ------------------------------------------------- | -------- | ------------- | ------------------------------------------------------------------------------------------ | | `url` | string | Yes | - | Webhook endpoint URL | | `headers` | `Record` | No | - | HTTP headers to send with requests | | `events` | `array<`"message"`\|`"data"`\|`"session.start"`>` | No | `["message"]` | Events to forward to webhook. 'message' is required, 'session.start', 'data' are optional. | **Example:** ```json theme={null} { "use": "agent.webhook", "options": { "url": "https://your-agent.example.com/voice", "headers": { "Authorization": "Bearer your-token" }, "events": ["message", "session.start"] } } ``` ### TTS Plugins (Text-to-Speech) Convert agent text responses to audio. LayerCode supports three TTS providers: | Provider | Key Required | Best For | | -------------- | ------------ | ---------------------------------------- | | **Inworld** | No (managed) | High quality, low cost expressive voices | | **Rime** | No (managed) | Expressive voices | | **Cartesia** | Yes (BYOK) | Customers with a Cartesia account | | **ElevenLabs** | Yes (BYOK) | Customers with an Elevenlabs account | **Inworld** or **Rime** is the easiest way to get started — LayerCode manages the credentials, so it works immediately. For **Cartesia** or **ElevenLabs**\*\*, add your API key in **Settings → Providers**. Rime TTS with ultra-low latency streaming. Managed by LayerCode—no API key required. **Configuration:** | Option | Type | Required | Default | Description | | ---------- | ---------------- | -------- | ------------ | --------------- | | `model_id` | `"mistv2"` | Yes | - | Rime TTS model. | | `voice_id` | string | No | `"courtney"` | Rime voice id. | | `language` | `"eng"`, `"spa"` | No | `"eng"` | Language. | **Example:** ```json theme={null} { "use": "tts.rime", "options": { "model_id": "mistv2", "voice_id": "courtney" } } ``` Inworld TTS for gaming and interactive characters with voice tuning controls. Requires your own Inworld API credentials. **Configuration:** | Option | Type | Required | Default | Description | | -------------- | ------------------------------------------------------------------------ | -------- | ----------------- | ------------------ | | `model_id` | `"inworld-tts-1"` \| `"inworld-tts-1.5-max"` \| `"inworld-tts-1.5-mini"` | No | `"inworld-tts-1"` | Inworld TTS model. | | `voice_id` | string | No | `"Clive"` | Inworld voice id. | | `voice_config` | object | No | - | - | **`voice_config` options:** | Option | Type | Required | Default | Description | | ---------------- | -------------------------- | -------- | ------- | ------------------------------------------------------ | | `pitch` | number (min: -10, max: 10) | No | `1` | Voice pitch adjustment. Range: -10 to 10. Default: 1. | | `speaking_rate` | number (min: 0, max: 5) | No | `0` | Speaking rate/speed. Range: 0 to 5. Default: 0. | | `robotic_filter` | number (min: 0, max: 5) | No | `0` | Robotic voice filter level. Range: 0 to 5. Default: 0. | **Example:** ```json theme={null} { "use": "tts.inworld", "options": { "model_id": "inworld-tts-1.5-max", "voice_id": "Clive", "voice_config": { "pitch": 1, "speaking_rate": 0, "robotic_filter": 0 } } } ``` ElevenLabs TTS with high-quality voices and extensive voice customization. Requires your own ElevenLabs API key. **Configuration:** | Option | Type | Required | Default | Description | | ---------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | --------------------- | | `model_id` | `"eleven_v2_5_flash"` | Yes | - | ElevenLabs TTS model. | | `voice_id` | string | Yes | - | ElevenLabs voice id. | | `voice_settings` | object | No | - | - | | `language` | English (`en`), Japanese (`ja`), Chinese (`zh`), German (`de`), Hindi (`hi`), French (`fr`), Korean (`ko`), Portuguese (`pt`), Italian (`it`), Spanish (`es`), Indonesian (`id`), Dutch (`nl`), Turkish (`tr`), Filipino (`fil`), Polish (`pl`), Swedish (`sv`), Bulgarian (`bg`), Romanian (`ro`), Arabic (`ar`), Czech (`cs`), Greek (`el`), Finnish (`fi`), Croatian (`hr`), Malay (`ms`), Slovak (`sk`), Danish (`da`), Tamil (`ta`), Ukrainian (`uk`), Russian (`ru`), Hungarian (`hu`), Norwegian (`no`), Vietnamese (`vi`) | No | `"en"` | Language. | **`voice_settings` options:** | Option | Type | Required | Default | Description | | ------------------- | --------------------------- | -------- | ------- | ---------------------------------------------------------------------------------------------------------- | | `stability` | number (min: 0, max: 1) | No | - | Defines the stability for voice settings. Default is 0.5. | | `similarity_boost` | number (min: 0, max: 1) | No | - | Defines the similarity boost for voice settings. Default is 0.75. | | `style` | number (min: 0, max: 1) | No | - | Defines the style for voice settings. This parameter is available on V2+ models. Default 0. | | `use_speaker_boost` | boolean | No | - | Defines the use speaker boost for voice settings. This parameter is available on V2+ models. Default true. | | `speed` | number (min: 0.7, max: 1.2) | No | - | Controls the speed of the generated speech. Values range from 0.7 to 1.2. Default is 1.0. | **Example:** ```json theme={null} { "use": "tts.elevenlabs", "options": { "model_id": "eleven_v2_5_flash", "voice_id": "EiNlNiXeDU1pqqOPrYMO", "voice_settings": { "stability": 0.5, "speed": 1.0 } } } ``` Cartesia Sonic TTS with emotion controls and word-level timestamps. Requires your own Cartesia API key. **Configuration:** ### `model_id: "sonic-2"` | Option | Type | Required | Default | Description | | ---------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------ | -------- | ------- | --------------------------- | | `model_id` | `"sonic-2"` | Yes | - | Cartesia Sonic 2 TTS model. | | `voice_id` | string | Yes | - | Cartesia voice id. | | `language` | English (`en`), French (`fr`), German (`de`), Spanish (`es`), Portuguese (`pt`), Chinese (`zh`), Japanese (`ja`), Hindi (`hi`), Italian (`it`), Korean (`ko`), Dutch (`nl`), Polish (`pl`), Russian (`ru`), Swedish (`sv`), Turkish (`tr`) | No | `"en"` | Language. | ### `model_id: "sonic-3"` | Option | Type | Required | Default | Description | | ---------------- | ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | -------- | ------- | ---------------------------------------------------------- | | `model_id` | `"sonic-3"`, `"sonic-3-2025-10-27"` | Yes | - | Cartesia Sonic 3 TTS model with expanded language support. | | `voice_id` | string | Yes | - | Cartesia voice id. | | `voice_settings` | object | No | - | - | | `language` | English (`en`), French (`fr`), German (`de`), Spanish (`es`), Portuguese (`pt`), Chinese (`zh`), Japanese (`ja`), Hindi (`hi`), Italian (`it`), Korean (`ko`), Dutch (`nl`), Polish (`pl`), Russian (`ru`), Swedish (`sv`), Turkish (`tr`), Tagalog (`tl`), Bulgarian (`bg`), Romanian (`ro`), Arabic (`ar`), Czech (`cs`), Greek (`el`), Finnish (`fi`), Croatian (`hr`), Malay (`ms`), Slovak (`sk`), Danish (`da`), Tamil (`ta`), Ukrainian (`uk`), Hungarian (`hu`), Norwegian (`no`), Vietnamese (`vi`), Bengali (`bn`), Thai (`th`), Hebrew (`he`), Georgian (`ka`), Indonesian (`id`), Telugu (`te`), Gujarati (`gu`), Kannada (`kn`), Malayalam (`ml`), Marathi (`mr`), Punjabi (`pa`) | No | `"en"` | Language. | **`voice_settings` options:** | Option | Type | Required | Default | Description | | --------- | --------------------------- | -------- | ------- | ----------------------------------------------------------------------------------------------------------------------------------------- | | `volume` | number (min: 0.5, max: 2) | No | - | Adjusts the volume of the generated speech. Values range from 0.5 to 2.0. Default 1.0. | | `speed` | number (min: 0.6, max: 1.5) | No | - | Controls the speed of the generated speech. Values range from 0.6 to 1.5. Default 1.0. | | `emotion` | string | No | - | Controls the emotion of the generated speech. Primary emotions are neutral, calm, angry, content, sad, scared. See docs for more options. | **Example:** ```json theme={null} { "use": "tts.cartesia", "options": { "model_id": "sonic-3", "voice_id": "your-voice-id", "voice_settings": { "speed": 1.0, "emotion": "neutral" } } } ``` *** ## Complete Examples A minimal configuration using LayerCode's hosted LLM agent: ```json theme={null} { "plugins": [ { "use": "stt.deepgram", "options": { "model_id": "nova-3", "language": "en-US" } }, { "use": "turn_manager", "options": { "base_timeout_ms": 2000 } }, { "use": "agent.llm", "options": { "provider": "google", "model_id": "gemini-2.5-flash-lite", "system_prompt": "You are a helpful assistant.", "welcome_message": "Hi! How can I help you today?" } }, { "use": "sentence_buffer" }, { "use": "tts.rime", "options": { "model_id": "mistv2", "voice_id": "courtney" } } ] } ``` A configuration that sends user messages to your own server: ```json theme={null} { "session_webhook": { "url": "https://your-server.com/webhooks/session", "events": ["session.start", "session.end"] }, "session_duration_timeout_minutes": 60, "plugins": [ { "use": "stt.deepgram", "options": { "model_id": "flux", "keyterms": ["Acme", "ProductX"] } }, { "use": "turn_manager", "options": { "base_timeout_ms": 1500, "disable_interruptions_during_welcome": true } }, { "use": "agent.webhook", "options": { "url": "https://your-server.com/agent", "events": ["message", "session.start"] } }, { "use": "sentence_buffer" }, { "use": "tts.elevenlabs", "options": { "model_id": "eleven_v2_5_flash", "voice_id": "EiNlNiXeDU1pqqOPrYMO" } } ] } ``` A configuration for Twilio phone calls with both browser and phone support: ```json theme={null} { "clients": { "browser": true, "twilio": true }, "session_duration_timeout_minutes": 30, "plugins": [ { "use": "stt.deepgram", "options": { "model_id": "nova-3", "language": "multi" } }, { "use": "turn_manager", "options": { "base_timeout_ms": 2500 } }, { "use": "agent.llm", "options": { "provider": "google", "model_id": "gemini-2.5-flash-lite", "welcome_message": "Thank you for calling. How can I assist you?" } }, { "use": "sentence_buffer" }, { "use": "tts.cartesia", "options": { "model_id": "sonic-3", "voice_id": "your-voice-id" } } ] } ``` *** ## Audio Format The pipeline automatically handles audio format conversion based on the client type: | Client | Input Format | Output Format | | ------- | ------------ | ------------- | | Browser | PCM16 | PCM16 | | Twilio | mulaw @ 8kHz | mulaw @ 8kHz | You don't need to configure audio formats manually - the pipeline negotiates the correct format with each plugin automatically. # How webhooks work with Layercode Source: https://docs.layercode.com/explanations/webhooks How to receive events from Layercode Layercode delivers conversation updates to your backend through HTTPS webhooks. Each time a user joins, speaks, or finishes a session, the voice pipeline posts JSON to the webhook URL configured on your agent. We are rolling out a new low-latency agent integration that connects to your backend over WebSocket. Existing webhook + SSE agents continue to work, but new real-time features (like barge-in and token streaming) use the WebSocket channel instead. Until the migration is complete you can keep this webhook reference handy, and review the WebSocket agent guide (preview) inside the dashboard for the latest schema. In reply to this, your backend can stream text replies back with Server-Sent Events (SSE), and Layercode will use a text to speech model to return voice back to your user. We tell your backend - in text - what the user said. And your backend tells Layercode - in text - what to speak back to the user. ## Receiving requests from Layercode In order to receive and process messages from your users, you need a backend endpoint that Layercode can communicate with. For example, in Next.js it might look something like this: ```ts theme={null} export const dynamic = 'force-dynamic'; import { streamResponse, verifySignature } from '@layercode/node-server-sdk'; export const POST = async (request: Request) => { const requestBody = (await request.json()) as WebhookRequest; // Authorization goes here! (explained below) const { text: userText } = requestBody; console.log("user said: ", userText) // This is where all your LLM stuff can go to generate your response const aiResponse = "thank you for your message" // this would be dynamic in your application await stream.ttsTextStream(aiResponse); }; ``` *Note: authorization is below* ## Tell Layercode where your endpoint is Now you have an endpoint to receive messages from Layercode, you need to tell Layercode where to send your events. Go to Layercode's dashboard, create or use an existing agent. Go to manual setup and type in the API endpoint that Layercode should send requests to. Setting a webhook URL If your endpoint is just in your root, then you would use the url of your host. If it's in /voice-agent use your host/voice-agent. If you're using one of our [Next.js examples]('https://github.com/layercodedev/fullstack-nextjs-cloudflare/blob/main/app/api/agent/route.ts'), you will see the path to receive the requests from Layercode is /api/agent ### Expose your local endpoint with a tunnel If you're developing locally, you will need to run a tunnel such as cloudflared or ngrok and paste the tunnel URL into the dashboard (with the path of your endpoint in your app appended - for example *tunnel-url*/api/agent). Our [tunnelling guide](/how-tos/tunnelling) walks through the setup. ## Verify incoming requests You should make sure that only authorized requests are sent to this endpoint. To do this, we expose a secret that you can find in the same location you used above. You should save this secret with the other secrets in your backend and verify each incoming request to ```ts theme={null} export const dynamic = 'force-dynamic'; import { streamResponse, verifySignature } from '@layercode/node-server-sdk'; export const POST = async (request: Request) => { const requestBody = (await request.json()) as WebhookRequest; // Verify this webhook request is from Layercode const signature = request.headers.get('layercode-signature') || ''; const secret = process.env.LAYERCODE_WEBHOOK_SECRET || ''; const isValid = verifySignature({ payload: JSON.stringify(requestBody), signature, secret }); if (!isValid) return new Response('Invalid layercode-signature', { status: 401 }); const { text: userText } = requestBody; console.log("user said: ", userText) // This is where all your LLM stuff can go to generate your response const aiResponse = "thank you for your message" // this would be dynamic in your application await stream.ttsTextStream(aiResponse); }; ``` ## Customize which events you receive You can see details on the data that Layercode [sends to this endpoint here](/api-reference/webhook-sse-api) **Agent webhook events** (configure inside the Your Backend modal): * `message` – (required) Fired after speech-to-text transcription completes for the user’s turn. * `data` – Delivered when the client sends a structured payload via `response.data`. * `session.start` – Sent as soon as a session opens so you can greet the user proactively. **Session webhook events** (configure via the optional Session Webhooks section): * `session.end` – Delivered when a session closes, including timing metrics and the full transcript. * `session.update` – Sent asynchronously once a session recording finishes processing (requires session recording to be enabled for the org). Webhook event types ## Attach custom metadata and headers to webhooks Add static metadata in the pipeline builder (the value is forwarded to every agent + session webhook request). To attach per-session metadata or headers, send a `config.session_webhook` override in the Layercode REST API `/v1/agents/web/authorize_session` request. See [custom metadata and headers how-to](/how-tos/custom-webhook-metadata-and-headers). ## Respond to webhook events It's great to receive messages from users but of course you want to reply too. We can use a method on Layercode's stream object to reply `await stream.ttsTextStream("this is my reply");` # Attach custom metadata and headers to webhooks Source: https://docs.layercode.com/how-tos/custom-webhook-metadata-and-headers Pass tenant-specific context and outbound headers when authorizing sessions so every webhook contains the data your backend needs. When you create a new conversation and authorize a client session you can set **custom metadata** and **custom headers** on the session webhook configuration (either on the agent itself or per-session via a config override). ## 1) Set metadata and headers via session\_webhook config Add them to your agent’s `session_webhook` config, or send a one-off override in the `config` field when calling `POST /v1/agents/web/authorize_session` (see [REST API docs](https://docs.layercode.com/api-reference/rest-api#authorize-client-session)). Both fields must be plain JSON objects. For agent.webhook plugin requests, set `custom_headers` / `custom_metadata` in the plugin options to have them included on every agent webhook call. ```bash theme={null} curl -X POST https://api.layercode.com/v1/agents/web/authorize_session \ -H "Authorization: Bearer $LAYERCODE_API_KEY" \ -H "Content-Type: application/json" \ -d '{ "agent_id": "dvv052qk", "config": { "type": "voice", "clients": { "browser": { "enabled": true } }, "plugins": [ { "use": "stt.deepgram", "options": { "model_id": "flux" } }, { "use": "turn_manager", "options": { "mode": "automatic" } }, { "use": "agent.llm", "options": { "provider": "google", "model_id": "gemini-2.5-flash-lite" } }, { "use": "tts.rime", "options": { "model_id": "mistv2", "voice_id": "courtney" } } ], "session_webhook": { "url": "https://example.com/session-webhook", "events": ["session.start", "session.end", "session.update"], "custom_metadata": { "tenant_id": "t_42", "crm_contact_id": "abc-123" }, "custom_headers": { "x-tenant-id": "t_42" } } } }' ``` ```ts Next.js app/api/authorize/route.ts [expandable] theme={null} export const dynamic = "force-dynamic"; import { NextResponse } from "next/server"; export const POST = async (request: Request) => { // Here you could do any user authorization checks you need for your app const endpoint = "https://api.layercode.com/v1/agents/web/authorize_session"; const apiKey = process.env.LAYERCODE_API_KEY; if (!apiKey) { throw new Error("LAYERCODE_API_KEY is not set."); } const requestBody = await request.json(); if (!requestBody || !requestBody.agent_id) { throw new Error("Missing agent_id in request body."); } requestBody.config = { type: "voice", clients: { browser: { enabled: true } }, plugins: [ { use: "stt.deepgram", options: { model_id: "flux" } }, { use: "turn_manager", options: { mode: "automatic" } }, { use: "agent.llm", options: { provider: "google", model_id: "gemini-2.5-flash-lite" } }, { use: "tts.rime", options: { model_id: "mistv2", voice_id: "courtney" } } ], session_webhook: { url: "https://example.com/session-webhook", events: ["session.start", "session.end", "session.update"], custom_metadata: { tenant_id: "t_42", crm_contact_id: "abc-123" }, custom_headers: { "x-tenant-id": "t_42", "x-layercode-flow": "concierge" } } } try { const response = await fetch(endpoint, { method: "POST", headers: { "Content-Type": "application/json", Authorization: `Bearer ${apiKey}`, }, body: JSON.stringify(requestBody), }); if (!response.ok) { const text = await response.text(); throw new Error(text || response.statusText); } return NextResponse.json(await response.json()); } catch (error: any) { console.log("Layercode authorize session response error:", error.message); return NextResponse.json({ error: error.message }, { status: 500 }); } }; ``` ## 2) Read the metadata in your webhook handler Every webhook invocation now includes the metadata you provided: ```json theme={null} { "type": "message", "session_id": "a0ad4pv43sdakgh99lxeik8x", "conversation_id": "ary4c07o15n5l43iu7dkhukt", "custom_metadata": { "tenant_id": "t_42", "crm_contact_id": "abc-123" }, // ... } ``` ## 3) Inspect custom headers on receipt Layercode prepends two headers to every webhook call: `Content-Type: application/json` and `layercode-signature`. Your custom headers are then appended. ```ts webhook-handler.ts theme={null} export async function POST(request: Request) { const xTenantId = request.headers.get('x-tenant-id'); // ... const payload = await request.json(); // ... } ``` # Deploy Next.js to Cloudflare Source: https://docs.layercode.com/how-tos/deploy-nextjs-to-cloudflare Some tips when deploying a Next.js voice agent to Cloudflare Layercode runs in our cloud, but you will need to deploy your Next.js application to provide your APIs and agent functionality (LLMs and tool calling). Plus if you are building for web, your Next.js acts as the client. This guide assumes you already have your Next.js application running locally with Layercode. If not, pleae follow our [getting started guide](/tutorials/getting-started) If you are using our Cloudflare getting-started project, you can simply run `npm run deploy` Otherwise, you should run ```bash theme={null} npm i @opennextjs/cloudflare ``` if it doesn't exist already, add a deploy script in your `package.json` ```json theme={null} "deploy": "opennextjs-cloudflare build && opennextjs-cloudflare deploy" ``` Then run ```bash theme={null} npm run deploy ``` You will be asked to create/connect a Cloudflare account if you don't already have one connected. note: you will need to use npm to deploy to Cloudflare because it expects a `package-lock.json` file\* You should see an ouput like this: ``` Total Upload: 5867.42 KiB / gzip: 1177.82 KiB Worker Startup Time: 25 ms Your Worker has access to the following bindings: Binding Resource env.ASSETS Assets Uploaded jolly-queen-84e7 (16.45 sec) Deployed jolly-queen-84e7 triggers (4.70 sec) https://jolly-queen-84e7.jacksbridger.workers.dev Current Version ID: 047446f6-055e-46b0-b67a-b45cb14fa8e8 ``` Take that url (e.g. [https://jolly-queen-84e7.jacksbridger.workers.dev](https://jolly-queen-84e7.jacksbridger.workers.dev)) of your backend and save it into the Layercode agent backend settings under webhook url (append the appropriate path for your API e.g. [https://jolly-queen-84e7.jacksbridger.workers.dev/api/agent](https://jolly-queen-84e7.jacksbridger.workers.dev/api/agent)) Then your application should run. But please reach out if you run into any issues. ## Setting up automated Cloudflare deployments You can use [Cloudflare Workers Builds](https://developers.cloudflare.com/workers/ci-cd/builds/) to deploy your application on GitHub commits. You connect your GitHub repository to your Worker by following [these steps](https://developers.cloudflare.com/workers/ci-cd/builds/git-integration/). In the Build settings: * The "Build command" should be set to `npx opennextjs-cloudflare build`. * The "Deploy command" should be set to `npx opennextjs-cloudflare deploy`. * The environment variables you previously set in `.env` **must** be copied and set in the "Build variables and secrets" section. This is so that `npm next build` executed by Workers Builds will have access to the environment variables. It needs that access to inline the NEXT\_PUBLIC\_... variables and access non-NEXT\_PUBLIC\_... variables needed for SSG pages. If you don't do this, you'll find the NEXT\_PUBLIC\_LAYERCODE\_AGENT\_ID env variable is missing and your voice agent won't work. Note: do not change your `package.json` build command. It should stay as `next build`. # Deploy Next.js to Vercel Source: https://docs.layercode.com/how-tos/deploy-nextjs-to-vercel Some tips when deploying a voice agent to Vercel Layercode runs in our cloud, but you will need to deploy your Next.js application to provide your APIs and agent functionality (LLMs and tool calling). Plus if you are building for web, your Next.js acts as the client. This guide assumes you already have your application running locally with Layercode. If not, pleae follow our [getting started guide](/tutorials/getting-started) To deploy to Vercel: 1. push your changes to a remote repo (i.e. GitHub/GitLab). 2. Sign up at Vercel, Click Add New project 3. Then import your Git Respository 4. Paste in your environmental variables from `.env` 5. Deploy 6. Take that url (e.g. [https://fullstack-nextjs-vercel-five.vercel.app/](https://fullstack-nextjs-vercel-five.vercel.app/)) of your backend and save it into the Layercode agent backend settings under webhook url (append the appropriate path for your API e.g. [https://fullstack-nextjs-vercel-five.vercel.app/api/agent](https://fullstack-nextjs-vercel-five.vercel.app/api/agent)) ### Troubleshooting authentication issues When deploying to Vercel, you MUST disable Vercel Authentication to allow Layercode webhooks to be received. By default for pro plans, Vercel blocks external requests to your application /api routes. This means that Layercode webhooks will not be received by your application, and your voice agent will not work. Disable Vercel Authentication by going to your project settings in the Vercel dashboard, then go to "Deployment Protection" in left sidebar menu, then turn off "Vercel Authentication" and Save. You do not need to redeploy. You can check your Webhook Logs in the Layercode dashboard to ensure that webhooks are being received successfully. If you receive a 405 error response to webhooks, this indicates that Vercel Authentication is still enabled. Note: if you're on a free tier, you may not need to make this change. Vercel authentication # Deploying to production Source: https://docs.layercode.com/how-tos/deploying Point Layercode to your production backend and manage environments Use this guide when moving from local development (tunnel Webhook URL) to a stable production deployment. ## Set your production Webhook URL In the Layercode dashboard: 1. Open the agent you want to be your production agent and click **Connect Your Backend** 2. Set your Webhook URL to your production endpoint, e.g. `https://your-domain.com/api/agent` 3. Save changes Use separate Layercode agents for production and for development or staging. Point each to its own backend URL. Keep your production Webhook URL stable and use staging agents for preview builds. ## Verify webhook signature in production Keep signature verification enabled in your `/api/agent` route. This protects your app from spoofed requests. ## Platform-specific deployment guides Follow one of our hosting guides for detailed steps on shipping your Next.js app: * [Deploy Next.js to Cloudflare](/how-tos/deploy-nextjs-to-cloudflare) * [Deploy Next.js to Vercel](/how-tos/deploy-nextjs-to-vercel) # Connect to MCP servers with AI SDK Source: https://docs.layercode.com/how-tos/mcp-ai-sdk How to get your voice agents to use Model Context Protocol (MCP) tools with AI SDK and Layercode It can be useful for your voice agents to use [Model Context Protocol (MCP)](https://modelcontextprotocol.io) to fetch live data or perform external actions — for example, retrieving docs, querying databases, or running custom APIs. This guide shows you how to connect your **AI SDK** app to an **MCP server** and expose those tools to your **Layercode voice agent**. *** ## Prerequisites This guide assumes you already have **tool calling** set up and working with Layercode. If not, start here first:\ 👉 [Tool calling in Next.js with Layercode](https://docs.layercode.com/how-tos/tool-calling-js) Once that’s working, you can extend your agent with **MCP-based tools**. *** ## Example Setup > **Note:** The MCP URL `https://docs.layercode.com/mcp` below is just an example endpoint that connects to the **Layercode Docs MCP server**.\ > Replace this with your **own MCP server URL** — for example, one that connects to your company’s data, APIs, or private knowledge. ```ts theme={null} import { createGoogleGenerativeAI } from '@ai-sdk/google'; import { streamText, stepCountIs, experimental_createMCPClient, tool } from 'ai'; import { streamResponse } from '@layercode/node-server-sdk'; import { StreamableHTTPClientTransport } from '@modelcontextprotocol/sdk/client/streamableHttp.js'; import z from 'zod'; export const POST = async (request: Request) => { const requestBody = await request.json(); const { conversation_id, text, turn_id } = requestBody; return streamResponse(requestBody, async ({ stream }) => { // ✅ Create a fresh MCP transport per request const transport = new StreamableHTTPClientTransport(new URL('https://docs.layercode.com/mcp')); const docsMCP = await experimental_createMCPClient({ transport }); try { const docsTools = await docsMCP.tools(); const weather = tool({ description: 'Get the weather in a location', inputSchema: z.object({ location: z.string().describe('The location to get the weather for') }), execute: async ({ location }) => ({ location, temperature: 72 + Math.floor(Math.random() * 21) - 10 }) }); const { textStream } = streamText({ model: createGoogleGenerativeAI({ apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY })('gemini-2.5-flash-lite'), system: 'You are a helpful assistant.', messages: [{ role: 'user', content: text }], tools: { weather, ...docsTools }, toolChoice: 'auto', stopWhen: stepCountIs(10), onFinish: async ({ response }) => { console.log('MCP Response Complete', response); stream.end(); } }); await stream.ttsTextStream(textStream); } finally { // ✅ Clean up the MCP connection await docsMCP.close(); } }); }; ``` # Multi-agents and agent transfers Source: https://docs.layercode.com/how-tos/multi-agents-agent-transfers How to orchestrate multiple, task-specific AI SDK agents while keeping one consistent Layercode voice Some voice projects need **focused sub-agents** (quotes, policy expertise, escalations, etc.) so you can narrow prompts, tools, and guardrails per task — but callers should still feel like they are speaking to a single person.\ This guide shows how to build an **AI SDK orchestrator** that loops over specialized sub-agents, then pipes the final response into **Layercode** for voice delivery. *** ## Prerequisites * A Next.js (or plain Fetch) endpoint that already handles [Layercode server webhooks](https://docs.layercode.com/intro). * Familiarity with [AI SDK streamText](https://sdk.vercel.ai/docs/reference/ai) and JavaScript tool calling. * (Optional) [Custom webhook metadata](https://docs.layercode.com/how-tos/custom-webhook-metadata-and-headers) if you want to preload caller info. *** ## 1. Define a shared persona + sub-agent config Each sub-agent reuses a common **voice persona** so the caller hears a consistent tone, then narrows to a specific responsibility and tool set. ```tsx theme={null} import { streamText, stepCountIs, tool, type ModelMessage } from 'ai'; import { createOpenAI } from '@ai-sdk/openai'; import z from 'zod'; const openai = createOpenAI({ apiKey: process.env.OPENAI_API_KEY! }); const model = openai('gpt-4o-mini'); const VOICE_PERSONA = ` You are Agent Smith from Matrix Car Insurance. Your output will be read aloud in a phone call. Be concise, friendly, and never mention routing, tools, or sub-agents. `.trim(); type SubAgentId = 'quote' | 'policy' | 'escalations'; type SubAgent = { id: SubAgentId; description: string; system: string; getTools: (conversationId: string) => Record>; canTransferTo?: 'any' | readonly SubAgentId[]; }; ``` Example “quote” sub-agent: ```tsx theme={null} const QuoteInfoSchema = z.object({ car_registration: z.string().optional(), driver_age: z.number().optional() }); const SUB_AGENTS: Record = { quote: { id: 'quote', description: 'Introductions + general policy questions.', canTransferTo: 'any', system: ` ${VOICE_PERSONA} Handle introductions and general questions about the insurance process. If a user asks about coverage details, transfer to "policy". If they complain or say "stop calling", transfer to "escalations". `.trim(), getTools: () => ({ recordQuoteProgress: tool({ description: 'Record basic quote details (stub).', inputSchema: QuoteInfoSchema, async execute({ car_registration, driver_age }) { // TODO: persist in your CRM or DB return { ok: true, car_registration, driver_age }; } }) }) }, // policy, escalations... }; ``` Repeat for `policy` and `escalations`, each with its own Zod schema + tooling. *** ## 2. Build an internal transfer tool Transfers are just **tool calls** that tell the orchestrator to switch sub-agents. They never surface to the caller. ```tsx theme={null} function makeTransferTool(args: { from: SubAgentId; allowed: SubAgentId[]; messages: ModelMessage[]; setNext: (to: SubAgentId, handoff?: string) => void; }) { const { from, allowed, messages, setNext } = args; return tool({ description: `INTERNAL: route to another sub-agent. Allowed from "${from}": ${allowed.join(', ')}`, inputSchema: z.object({ to: z.enum(['quote', 'policy', 'escalations']), handoff: z.string().optional() }), async execute({ to, handoff }) { if (!allowed.includes(to)) return 'NOT_ALLOWED'; if (handoff?.trim()) { messages.push({ role: 'system', content: `HANDOFF (${from} -> ${to}): ${handoff}` }); } setNext(to, handoff); return 'OK'; } }); } ``` Notice the **handoff notes**: they become new `system` messages so the next agent instantly knows why the call switched. *** ## 3. Orchestrator loop The orchestrator is a loop that keeps calling the same LLM with **different system prompts + tools** until no transfer is requested. ```tsx theme={null} export async function runOrchestrator(conversationId: string, userText: string) { const messages: ModelMessage[] = [{ role: 'user', content: userText }]; let active: SubAgentId = 'quote'; let transfers = 0; const maxTransfers = 3; while (true) { const sub = SUB_AGENTS[active]; const allowed = sub.canTransferTo === 'any' ? (Object.keys(SUB_AGENTS) as SubAgentId[]).filter((id) => id !== active) : [...(sub.canTransferTo ?? [])]; let next: SubAgentId | undefined; const transferToSubAgent = makeTransferTool({ from: active, allowed, messages, setNext: (to) => { next = to; } }); const tools = { ...sub.getTools(conversationId), transferToSubAgent }; const { text, response } = await streamText({ model, system: sub.system, messages, tools, toolChoice: 'auto', stopWhen: stepCountIs(10), onFinish: ({ response }) => messages.push(...response.messages) }); if (!next) return text; // final answer for this Layercode turn transfers++; if (transfers >= maxTransfers) return 'Sorry, something went wrong with routing.'; active = next; } } ``` Persist `messages` per `conversationId` (database, KV, Durable Object, etc.) to maintain full history across turns. *** ## 4. Connect to Layercode Wrap the orchestrator inside your Layercode webhook handler. Each webhook turn can stream TTS back to the caller. ```tsx theme={null} import { streamResponse } from '@layercode/node-server-sdk'; export async function POST(request: Request) { const payload = (await request.json()) as WebhookRequest; if (!isSignatureValid(request, payload)) { return new Response('Invalid layercode-signature', { status: 401 }); } return streamResponse(payload, async ({ stream }) => { if (payload.type === 'session.start') { stream.tts( "Hi, this is Agent Smith from Matrix Car Insurance. I'm calling about your quote. Who am I speaking with today?" ); stream.end(); return; } if (payload.type !== 'message') { stream.end(); return; } const replyText = await runOrchestrator(payload.conversation_id, payload.text); stream.tts(replyText); stream.end(); }); } ``` Layercode will invoke this endpoint again for every new user utterance. Store the conversation state by `conversation_id` so the orchestrator can pick up where it left off. *** ## 5. Pass caller context with metadata Before the call even starts, you often know the **lead’s name, quote ID, or campaign**.\ Attach that information via [custom webhook metadata](https://docs.layercode.com/how-tos/custom-webhook-metadata-and-headers) so it arrives inside every `session.start` and `message` payload.\ You can then preload the orchestrator’s history with system messages like `Lead name: Jordan Carter` or seed tool inputs with the quote identifier. *** ## Next steps * Replace stub tool logic with real CRM / policy lookups. * Persist orchestrator state (messages, active sub-agent, outstanding tasks) in a durable store. * Tune prompts for stricter routing rules (e.g., escalate only when specific keywords appear). * Add interrupt handling + barge-in support from the [voice quick start](https://docs.layercode.com/how-tos/outbound-calls). With these pieces in place, callers experience one consistent persona while you retain the control and safety of dedicated task-specific agents. # Outbound calls with Twilio Source: https://docs.layercode.com/how-tos/outbound-calls Using your Layercode Agent to make outbound phone calls You will need: * A Layercode Agent with an assigned Twilio phone number (see [Inbound calls with Twilio](/how-tos/setting-up-twilio)) This guide walks you through triggering an outbound phone call from your Layercode Agent. To trigger an outbound call, use the [`https://api.layercode.com/v1/agents/AGENT_ID/calls/initiate_outbound` endpoint](/api-reference/rest-api#initiate-outbound-call). You can call this endpoint from your backend whenever you want to initiate a call. You must have already set up your Layercode Agent to work with Twilio. If you haven't done that yet, see [Inbound calls with Twilio](/how-tos/setting-up-twilio). Goto REST API docs for **[more details about calling initiate\_outbound](/api-reference/rest-api#initiate-outbound-call)**. ### Example Request ```bash theme={null} curl -X POST https://api.layercode.com/v1/agents/ag-123456/calls/initiate_outbound \ -H 'Authorization: Bearer $LAYERCODE_API_KEY' \ -H 'Content-Type: application/json' \ -D '{ "from_phone_number": "NUMBER_ASSIGNED_TO_YOUR_AGENT", "to_phone_number": "PHONE_NUMBER_TO_CALL" }' ``` # Post-call analysis of transcripts and saving recordings Source: https://docs.layercode.com/how-tos/post-call-analysis Capture transcripts and recordings once a call finishes and run analysis with LLMs Layercode gives you everything you need to run post-call workflows: the webhook events that tell you when a call has ended, REST endpoints to fetch transcript data, and download URLs for finished recordings. ## 1. Subscribe to the right webhook events Enable the **`session.end`** and **`session.update`** events on your agent webhook configuration. Layercode sends `session.end` immediately after the call finishes with usage metrics and the full transcript, and follows up later with `session.update` when the recording file is ready (if session recording is enabled for your org). Your webhook handler should capture both payloads. A minimal example in Next.js: ```ts theme={null} import type { NextRequest } from 'next/server' export async function POST(req: NextRequest) { const payload = await req.json() if (payload.type === 'session.end') { // Persist transcript + metrics for analytics or QA queues. // For example, insert payload.transcript rows into your database along with // the latency + duration stats so dashboards and QA reviewers can query the // conversation later. } if (payload.type === 'session.update' && payload.recording_status === 'completed') { // Recording is ready—download the file and kick off downstream processing. // For example, stream payload.recording_url into your storage bucket (S3, // GCS, etc.) and enqueue summarization or compliance jobs that work off the // stored WAV file. } return new Response('ok') } ``` ## 2. Fetch full session details on demand While the `session.end` payload already includes the transcript, you can always fetch the authoritative record later through the REST API: ```bash theme={null} curl -H "Authorization: Bearer $LAYERCODE_API_KEY" \ https://api.layercode.com/v1/agents/AGENT_ID/sessions/SESSION_ID ``` The response returns connection timing, phone metadata, transcript entries, and the current recording status. Once the status is `completed`, the payload includes a `recording_url` that points to the downloadable WAV file.【F:docs/api-reference/rest-api.mdx†L321-L366】 ## 3. Download the call recording when it finishes When you receive a `session.update` webhook indicating `recording_status: "completed"`, stream the audio file directly from the recording endpoint: ```bash theme={null} curl -L -H "Authorization: Bearer $LAYERCODE_API_KEY" \ -o session.wav \ https://api.layercode.com/v1/agents/AGENT_ID/sessions/SESSION_ID/recording ``` Layercode returns a WAV file for completed sessions and reports `recording_status: "in_progress"` if processing is still happening.【F:docs/api-reference/rest-api.mdx†L373-L401】 ## 4. Kick off your analytics pipeline With transcripts saved and recordings queued, you can start whatever analysis you need—summaries, compliance checks, quality scoring, or AI-powered tagging. A common pattern is: 1. Store the transcript rows in your database when `session.end` arrives. 2. Trigger asynchronous jobs from `session.update` that download the recording and push it to transcription review, summarization, or storage. 3. Merge results (e.g., LLM summaries, compliance flags, sentiment) back into your customer dashboard once processing completes. ### Example: summarize transcripts with the Vercel AI SDK Once the transcript is stored, you can enrich it with an LLM call that produces business-ready insights. The snippet below shows how to send the transcript text to a Google Gemini model using the Vercel AI SDK and extract a structured summary, caller name, and sentiment flag: ```ts theme={null} import { generateObject } from 'ai' import { createGoogleGenerativeAI } from '@ai-sdk/google' type PostCallInsights = { summary: string intent:string follow_ups: string | null customerName: string | null sentiment: 'happy' | 'frustrated' | 'neutral' } export async function analyzeSessionTranscript(transcript: string) { const { object } = await generateObject({ model: createGoogleGenerativeAI({ apiKey: process.env.GOOGLE_GENERATIVE_AI_API_KEY })('gemini-2.5-flash-lite'), prompt: `You are analyzing an AI voice recording for an out of hours plumbing service.\n\nTranscript:\n${transcript}\n\nSummarize the call in 2 sentences, capture the caller's first name if stated, their call intent, follow ups (if applicable) and classify whether the caller felt happy, frustrated, or neutral by the end of the call.`, schema: { type: 'object', properties: { summary: { type: 'string' }, customerName: { type: ['string', 'null'] }, sentiment: { type: 'string', enum: ['happy', 'frustrated', 'neutral'], }, }, required: ['summary', 'sentiment'], }, }) return object } ``` Call `analyzeSessionTranscript` after you persist the transcript rows so downstream dashboards and QA tools can display the summary alongside the original conversation.【F:docs/api-reference/rest-api.mdx†L321-L366】 # How to write prompts for voice agents Source: https://docs.layercode.com/how-tos/prompting Some quick examples and tips for writing prompts for voice AI. Using the right system prompt is especially important when building Voice AI Agents. LLMs are primarily trained on written text, so they tend to produce output that is more formal and structured than natural speech. By carefully crafting your prompt, you can guide the model to generate responses that sound more conversational and human-like. # Base System Prompt for Voice AI ```text Minimal base prompt for Voice AI theme={null} You are a helpful conversation voice AI assistant. You are having a spoken conversation. Your responses will be read aloud by a text-to-speech system. You should respond to the user's message in a conversational manner that matches spoken word. Punctuation should still always be included. Never output markdown, emojis or special characters. Use contractions naturally. ``` # Pronunciation of numbers, dates & times Pronunciation of numbers, dates, times, and special characters is also crucial for voice applications. TTS (text-to-speech) providers handle pronunciations in different ways. A good base prompt that guides the LLM to use words to spell out numbers, dates, addresses etc will work for common cases. ```text Numbers & data rules theme={null} Convert the output text into a format suitable for text-to-speech. Ensure that numbers, symbols, and abbreviations are expanded for clarity when read aloud. Expand all abbreviations to their full spoken forms. Example input and output: "$42.50" → "forty-two dollars and fifty cents" "£1,001.32" → "one thousand and one pounds and thirty-two pence" "1234" → "one thousand two hundred thirty-four" "3.14" → "three point one four" "555-555-5555" → "five five five, five five five, five five five five" "2nd" → "second" "XIV" → "fourteen" - unless it's a title, then it's "the fourteenth" "3.5" → "three point five" "⅔" → "two-thirds" "Dr." → "Doctor" "Ave." → "Avenue" "St." → "Street" (but saints like "St. Patrick" should remain) "Ctrl + Z" → "control z" "100km" → "one hundred kilometers" "100%" → "one hundred percent" "elevenlabs.io/docs" → "eleven labs dot io slash docs" "2024-01-01" → "January first, two-thousand twenty-four" "123 Main St, Anytown, USA" → "one two three Main Street, Anytown, United States of America" "14:30" → "two thirty PM" "01/02/2023" → "January second, two-thousand twenty-three" or "the first of February, two-thousand twenty-three", depending on locale of the user ``` # Keep long paragraphs sounding natural Most text-to-speech systems will change prosody if they receive each sentence individually. If your voice agent needs to speak a large amount of text (e.g. a long legal disclosures or policy statements), follow this guidance to keep paragraphs sounding natural: * **Send multiple sentences together.** When you already have the full copy (for example, a static disclosure), pass the entire paragraph in a single `stream.tts` message so the speech engine can maintain the correct intonation. * **Wait for the model to finish generating.** Fast models such as Gemini 2.5 Flash Lite can produce a paragraph quickly. Instead of streaming each partial sentence as soon as it appears, collect the complete paragraph and then forward the whole string to the TTS provider. This approach avoids sentence-by-sentence delivery that can make disclosures sound choppy. # Enable push-to-talk in React/Next.js Source: https://docs.layercode.com/how-tos/push-to-talk Configure push-to-talk turn taking with the Layercode React SDK. By default, Layercode agents use automatic turn taking. If you prefer explicit control—press and hold to speak—enable push-to-talk in your agent and wire up the callbacks in your UI. ## 1) Enable push-to-talk in the dashboard In your agent panel on [https://dash.layercode.com/](https://dash.layercode.com/) → Transcriber → Settings → set Turn Taking to Push to Talk → Save your changes. Select push to talk ## 2) Use the React SDK callbacks When using push-to-talk, call `triggerUserTurnStarted()` when the user begins speaking (pressing the button), and `triggerUserTurnFinished()` when they stop (releasing the button). ```tsx app/ui/VoiceAgentPushToTalk.tsx theme={null} 'use client'; import { useLayercodeAgent } from '@layercode/react-sdk'; export default function VoiceAgentPushToTalk() { const { status, triggerUserTurnStarted, triggerUserTurnFinished } = useLayercodeAgent({ agentId: process.env.NEXT_PUBLIC_LAYERCODE_AGENT_ID!, authorizeSessionEndpoint: '/api/authorize', }); return ( ); } ``` Turn taking is explained conceptually in our [Turn taking guide](/explanations/turn-taking). ## What gets sent when you press the button The React SDK continuously captures microphone audio into a short rolling buffer even while the button is idle. When you call `triggerUserTurnStarted()`, we immediately flush roughly one second of pre-roll audio along with anything you speak while the button stays down. This keeps the start of the utterance intact, so agents hear the full word instead of a clipped syllable. You can fine-tune the pre-roll length with the `vad.buffer_frames` agent setting. Each frame represents about 100 ms of audio, so lowering the value shortens the buffer and raising it adds more context before the press. # Send JSON data from the client Source: https://docs.layercode.com/how-tos/send-json-data Forward events to your frontend to your agent backend without interrupting the live conversation Layercode supports more than voice or text replies. When you need to pass button clicks, form submissions, or other UI state to your agent backend mid-turn, emit a `client.response.data` event from the client SDK. The payload arrives at your webhook as a `data` event without interrupting the current turn. The webhook can respond with a regular JSON response (not an SEE stream like other webhook event types), which is then delivered immediately back to the client as a `response.data` event. This data flow is in addition to the ability of other [webhook event types (like `message` or `session.start`) returning `response.data` SEE events](/api-reference/webhook-sse-api#response-data), which are also sent to the client as `response.data` events. ## Vanilla JS example ```html send-data.html theme={null} ``` What happens: * The current turn continues uninterrupted. * The payload is forwarded as a `data` webhook event with the session/turn identifiers. ## React example ```tsx app/components/OrderActions.tsx theme={null} 'use client'; import { useLayercodeAgent } from '@layercode/react-sdk'; export function OrderActions() { const { status, sendClientResponseData } = useLayercodeAgent({ agentId: process.env.NEXT_PUBLIC_LAYERCODE_AGENT_ID!, authorizeSessionEndpoint: '/api/authorize' }); const handleConfirm = () => { sendClientResponseData({ action: 'confirm_order', orderId: 'ORD-12345', timestamp: Date.now() }); }; return ( ); } ``` ## Backend example When the client emits `client.response.data`, your webhook receives a `data` event. Respond with **plain JSON** (not SSE messages) using `type: "response.data"`. The payload is delivered to the client and surfaced via the `onDataMessage` callback. The snippet below only handles the `data` event for clarity: ```ts Express theme={null} import express from 'express'; const app = express(); app.use(express.json()); app.post('/agent', async (req, res) => { const { type, turn_id, data } = req.body; if (type === 'data') { // Return normal JSON with type "response.data" // This is received on the client in onDataMessage(...) return res.json({ type: 'response.data', turn_id, content: { status: 'received', echo: data } }); } // ...handle other event types (e.g. 'message', 'session.start') with SSE responses }); ``` On the client, read this via `onDataMessage` (the same applies to the [React SDK](/sdk-reference/react-sdk)): ```ts theme={null} new LayercodeClient({ /* ... */ onDataMessage: (data) => { console.log('Received from backend:', data); } }); ``` # Send text messages from the client Source: https://docs.layercode.com/how-tos/send-text-messages Capture text input in your UI and hand it to a Layercode agent without streaming audio. Layercode agents normally consume live microphone audio, but some experiences need a text fallback—think chat bubbles, accessibility flows, or quick corrections while the mic is muted. The WebSocket API and SDKs expose `sendClientResponseText` for exactly that: send the full utterance as text, let the server close the user turn, and have the agent reply immediately. This guide shows how to wire text messages in both Vanilla JS and React. If you're working directly with the WebSocket API (for example, outside a Node environment), check the [Send Text Messages (optional)](https://docs.layercode.com/api-reference/frontend-ws-api#send-text-messages-optional) reference for payload details. > Need to push structured data (form values, button clicks) without ending the turn? See [Send JSON data from the client](/how-tos/send-json-data). Starting in text-only mode? Pass `audioInput: false` (and optionally `enableAmplitudeMonitoring: false`) when you instantiate the SDK. The browser skips the microphone permission prompt until you later call `setAudioInput(true)`. ## 1) Vanilla JS example The `LayercodeClient` instance exposes `sendClientResponseText`. Add a simple form that forwards the entered text and clears the field when submitted. ```html send-text.html theme={null}
``` What happens when you call `sendClientResponseText`: * The client sends a `client.response.text` message (no `trigger.turn.end` is sent from the SDK). * The server interrupts any active agent audio, emits a `user.transcript` event, closes the current user turn, and queues the agent response. * The agent receives the text message through the regular webhook path and responds immediately. ## 2) React example The React SDK exposes the same capability via the `useLayercodeAgent` hook. Grab the `sendClientResponseText` method from the hook and call it from your form handler. ```tsx app/components/TextReplyForm.tsx theme={null} 'use client'; import { FormEvent, useEffect } from 'react'; import { useLayercodeAgent } from '@layercode/react-sdk'; export function TextReplyForm() { const { status, connect, disconnect, sendClientResponseText } = useLayercodeAgent({ agentId: process.env.NEXT_PUBLIC_LAYERCODE_AGENT_ID!, authorizeSessionEndpoint: '/api/authorize', }); useEffect(() => { connect(); return () => { disconnect(); }; }, [connect, disconnect]); const handleSubmit = (event: FormEvent) => { event.preventDefault(); const form = event.currentTarget; const data = new FormData(form); const message = (data.get('message') as string).trim(); if (!message) return; sendClientResponseText(message); form.reset(); }; return (
); } ``` Disable the form while the client is still connecting so you do not cue messages before a session exists. When you want to escalate from text to voice mode, grab `setAudioInput` from the hook: ```tsx theme={null} const { setAudioInput } = useLayercodeAgent({ audioInput: false, enableAmplitudeMonitoring: false, // ...other options }); return ; ``` # Sending data to your client from Layercode stream Source: https://docs.layercode.com/how-tos/sending-data-to-client How to send data to your client via the Layercode stream Sometimes you will want your Layercode stream to include additional data. For example, you might want to update the user that the LLM is thinking or looking something up. To do this, you can use the `stream.data` method. For example: ```ts theme={null} stream.data({ status: 'thinking' }) ``` And on the client side, you can receive the data you send: ```tsx theme={null} const { } = useLayercodeAgent({ agentId: "your-agent-id", authorizeSessionEndpoint: "/api/authorize", onDataMessage: (data) => console.log("Received data:", data), // {status: 'thinking'} }); ``` # Inbound calls with Twilio Source: https://docs.layercode.com/how-tos/setting-up-twilio Setting up a voice agent to receive phone calls for you You will need: * A Twilio account * A Twilio phone number (can be a trial number) * Your Twilio Account SID and Auth Token This guide walks you through configuring Layercode to answer calls to your Twilio phone number. If you'd like to trigger outbound calls from your Layercode Agent, see [Outbound calls with Twilio](/how-tos/outbound-calls). 1. Go to the Layercode dashboard at [https://dash.layercode.com](https://dash.layercode.com) and select your agent. 2. Open the client settings, enable Twilio phone calls, and then save changes. Edit client for Twilio 3. Go to the Layercode settings at [https://dash.layercode.com/settings](https://dash.layercode.com/settings). 4. Add your Twilio Account SID and Auth Token, then save. Save Twilio credentials Twilio recently changed where the Auth Token and Account SID are displayed. In the Twilio Console, use the search bar to find “Account SID” and “Auth Token”. 5. Return to your agent's client settings. You should now be able to select a Twilio phone number. If you don't see your number, refresh the page. Ensure the number is in the same Twilio account as the credentials you added. You can assign multiple Twilio phone numbers to a single agent. For each call, Layercode stores the from/to phone numbers (and country codes) on the session. See the [REST API](/api-reference/rest-api#sessions) for retrieving these details along with transcripts and recordings. 6. Test by calling the number. For a quick check, set a short welcome message in Layercode (for example, "Hello from Layercode"). 7. To run Twilio in production, you will need a backend where you can run your LLM flow. You should review one of our backend tutorials, for example, check out our [Next.js quick start](/tutorials/getting-started.mdx). And you can consult the [reference on webhooks](/api-reference/webhook-sse-api#webhook-request-payload) to see how you can receive the `from_phone_number` and `to_phone_number`. # Tool calling in Next.js with Layercode Source: https://docs.layercode.com/how-tos/tool-calling-js How to setup tool calling in Next.js with Layercode and ai sdk. Here's how to set up tool calling in Next.js. Make sure you have `ai` and `zod` installed. ### Install ai sdk and zod ```bash npm theme={null} npm install ai zod ``` ```bash pnpm theme={null} pnpm install ai zod ``` ```bash yarn theme={null} yarn install ai zod ``` ```bash bun theme={null} bun install ai zod ``` In your backend, where your agent is running, import `tool` and `stepCountIs` from `ai` and import `zod`. Note: you probably already imported `streamText` and `ModelMessage` ```ts theme={null} import { streamText, ModelMessage, tool, stepCountIs } from 'ai'; import z from 'zod' ``` Inside the callback of your layercode `streamResponse` in the case of a message received, initialize a tool. For instance, `weather` ```ts theme={null} const weather = tool({ description: 'Get the weather in a location', inputSchema: z.object({ location: z.string().describe('The location to get the weather for') }), execute: async ({ location }) => ({ location, temperature: 72 + Math.floor(Math.random() * 21) }) }); ``` Then set ```ts theme={null} tools: { weather } ``` as a property inside `streamText` You should also set these properties ```ts theme={null} toolChoice: 'auto', stopWhen: stepCountIs(10), ``` You can find more info in the [ai sdk docs](https://ai-sdk.dev/docs/ai-sdk-core/tools-and-tool-calling). Once you have this, make sure your prompt mentions the tool is available. For example add "you can use the weather tool to find the weather for a given location." And now, it should let you query the weather and you'll see it's a different temperature (between 72 and 92) each time because it has some randomness in the function. ## Next steps: telling the user that tool calling is happening One thing many many developers wish to do is update the user that tool calling is happening so they don't expect an immediate response. To do this, your tools can notify the client that there is a tool call happening. This guide will show you [how you can do that](/how-tos/sending-data-to-client). ## Sending speech to the user to tell them a call is happening. If you anticipate a long tool call, you may want to send a spoken message to them, such as "just a moment, let me grab that for you.". With ai sdk, you can do that by calling Layercode's stream.tts at the start of your `execute` function. Note that the tool must be defined inside your Layercode streamResponse callback function so that it has access to `stream`. ```ts theme={null} const weather = tool({ description: 'Get the weather in a location', inputSchema: z.object({ location: z.string().describe('The location to get the weather for') }), execute: async ({ location }) => { stream.tts("Just a moment, let me grab that for you."); // do something to get the weather return { location, temperature: 72 + Math.floor(Math.random() * 21) - 10 }; } }); ``` # Troubleshooting Next.js Source: https://docs.layercode.com/how-tos/troubleshooting-nextjs Some releavant tips and gotchas when building with Next.js and Layercode ### Use dynamic imports for Layercode hooks For instance ```tsx theme={null} 'use client'; import dynamic from 'next/dynamic'; // Dynamically import the VoiceAgent component with SSR disabled const VoiceAgent = dynamic(() => import('./ui/VoiceAgent'), { ssr: false }); export default function Home() { return ; } ``` You can see [an example here](https://github.com/layercodedev/fullstack-nextjs-cloudflare/blob/faa51f42b21be71cf488961d0df2f9a3a8e88ed8/app/page.tsx#L4) # Create a Cloudflare tunnel for webhooks Source: https://docs.layercode.com/how-tos/tunnelling Expose your local backend to Layercode using Cloudflare Tunnel. Layercode needs to send a webhook to your backend to generate agent responses. If you're running your backend locally, you'll need to expose it to the internet using a tunnel service. ## Setting up a tunnel with cloudflared We recommend using [cloudflared](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/do-more-with-tunnels/trycloudflare/), which is free for development. * **macOS:** `brew install cloudflared` * **Windows:** `winget install --id Cloudflare.cloudflared` * [Other platforms](https://developers.cloudflare.com/cloudflare-one/connections/connect-networks/downloads/) Run the following command to expose your local server: ```bash theme={null} cloudflared tunnel --url http://localhost:YOUR_PORT ``` After starting, cloudflared will print a public URL in your terminal, e.g.: ``` https://my-tunnel-name.trycloudflare.com ``` Add the path of your backend's webhook endpoint to the URL, e.g.: ``` https://my-tunnel-name.trycloudflare.com/api/agent ``` `/api/agent` is just an example. Your actual endpoint may be different depending on your backend configuration. 1. Go to the [Layercode dashboard](https://dash.layercode.com). 2. Click on your agent. 3. Click the Edit button in the 'Your Backend' box. 4. Enter your Webhook URL (from the previous step) and ensure your `LAYERCODE_WEBHOOK_SECRET` matches your environment variable. Open the agent Playground tab and start speaking to your voice agent! If you're having trouble, make sure your backend server is running and listening on the specified port (e.g., 3000). You can also visit the Webhook Logs tab in the agent to see the webhook requests being sent and any errors returned. Every time you restart the cloudflared tunnel, the assigned public URL will change. Be sure to update the webhook URL in the Layercode dashboard each time you restart the tunnel. ## Alternative Tunneling Solutions Besides cloudflared, you can also use other tunneling solutions like [ngrok](https://ngrok.com/) to expose your local backend. ## If using Vite: By default, Vite blocks requests from other hosts, so you will need to add your Cloudflared (or ngrok, etc.) address to `vite.config.ts` in `server.allowedHosts`. For example: ```ts theme={null} allowedHosts: ["suggesting-sri-pair-hugh.trycloudflare.com"] ``` # Introduction Source: https://docs.layercode.com/intro Layercode makes it easy for developers to build low-latency, production-ready voice AI agents [Stay up to date on availability via the Layercode status page.](https://statuspage.incident.io/layercode) Layercode architecture diagram Follow the step-by-step guide to launch your first Layercode-powered voice agent. Understand the pipeline architecture, transports, and real-time audio flow. Browse REST, WebSocket, and CLI endpoints with request/response examples. Integrate with Layercode using our Node.js, Python, React, and Vanilla JS SDKs. ## What is Layercode?