The Layercode Frontend WebSocket API is used to create browser and mobile based voice agent experiences. The client browser streams chunks of base64 microphone audio down the WebSocket. In response, the server returns audio chunks of the assistant’s response to be played to the user. Additional trigger and data event types allow control of turns and UI updates.

For most use cases, we recommend using our SDKs for React (React Guide) or Vanilla JS (Vanilla JS Guide). This API reference is intended for advanced users who need to implement the WebSocket protocol directly.

Connecting to the WebSocket

The client browser connects to the Layercode WebSocket API at the following URL:

wss://api.layercode.com/v1/pipelines/websocket

Authorizing the WebSocket Connection

When establishing the WebSocket connection, the following query parameter must be included in the request URL:

  • client_session_key: A unique session key obtained from the Layercode REST API /authorize endpoint.

Example full connection URL:

wss://api.layercode.com/v1/pipelines/websocket?client_session_key=your_client_session_key

To obtain a client_session_key, you must first create a new session for the user by calling the Layercode REST API /authorize endpoint. This endpoint returns a client_session_key which must be included in the WebSocket connection parameters. This API call should be made from your backend server, not the client browser. This ensures your LAYERCODE_API_KEY is never exposed to the client, and allows you to do any additional user authorization checks required by your application.

WebSocket Events

Client → Server Messages

Audio Streaming

At WebSocket connection, the client should constantly send audio chunks of the user’s microphone in the format below. The content must be the following format:

  • Base64 encoded
  • 16-bit PCM audio data
  • 8000 Hz sample rate
  • Mono channel

See the Vanilla JS SDK code for an example of how browser microphone audio is correctly encoded to base64.

{ "type": "client.audio", "content": "base64audio" }

Response Audio Replay Finished

The client will receive audio chunks of the assistant’s response (see Audio Response). When the client has finished replaying all assistant audio chunks in its buffer it must reply with ‘client.response_audio_replay_finished’ with the reason ‘completed’ or ‘interrupted’. Note that the assistant webhook can return response.tts events (which are turned into speech and received by the client as response.audio events) at any point during a long response (in between other text or json events), so the client must handle situations where it’s played all the audio in the buffer, but then receives more to play. This will result in the client sending multiple ‘trigger.response.audio.replay_finished’ completed events over a single turn.

{
  "type": "trigger.response.audio.replay_finished",
  "reason": "completed OR interrupted",
  "last_delta_id_played": "UUID of the last audio delta played", // Optional
  "turn_id": "UUID of assistant response"
}

Push-to-Talk Control (Optional)

In push-to-talk mode (read more about Turn Taking), the client must send the following events to start and end a user turn to speak. This is typically connected to a button which is held down for the user to speak. In this mode, the client can also pre-emptively halt the assistant response audio playback when the user interrupts (instead of having to wait to receive a turn.end event), a trigger.audio.replay_finished event should also be sent when the user interrupts the assistant response.

Start user turn (user has pressed the button):

{ "type": "trigger.turn.start", "role": "user" }

End user turn (user has released the button):

{ "type": "trigger.turn.end", "role": "user" }

Server → Client Messages

The client will receive the following events from Layercode:

Turn Management

When the server detects the start of the user’s turn:

{ "type": "turn.start", "role": "user", "turn_id": "UUID of user turn" }

When the end of the user turn is detected:

{ "type": "turn.end", "role": "user", "turn_id": "UUID of user turn" }

When it’s the assistant’s turn:

{ "type": "turn.start", "role": "assistant", "turn_id": "UUID of assistant turn" }

Or end of assistant turn:

{ "type": "turn.end", "role": "assistant", "turn_id": "UUID of assistant turn" }

Audio Response

The client will receive audio chunks of the assistant’s response, which should be buffered and played immediately.

The content will be audio in the following format:

  • Base64 encoded
  • 16-bit PCM audio data
  • 16000 Hz sample rate
  • Mono channel

See the Vanilla JS SDK code for an example of how to play the audio chunks.

{
  "type": "response.audio",
  "content": "base64audio",
  "delta_id": "UUID unique to each delta msg",
  "turn_id": "UUID of assistant response turn"
}

Data and State Updates

Your Webhook can return response.data SSE events, which will be forwarded to the browser client. This is ideal for updating UI and state in the browser. If you want to pass text or json deltas instead of full objects, you can simply pass a json object like { "delta": "text delta..." } and accumulate and render the delta in the client browser.

{
  "type": "response.data",
  "content": { "json": "object" },
  "turn_id": "UUID of assistant response"
}