Frontend WebSocket API
Layercode WebSocket API for browser and mobile based voice agent experiences.
The Layercode Frontend WebSocket API is used to create browser and mobile based voice agent experiences. The client browser streams chunks of base64 microphone audio down the WebSocket. In response, the server returns audio chunks of the assistant’s response to be played to the user. Additional trigger and data event types allow control of turns and UI updates.
For most use cases, we recommend using our SDKs for React (React Guide) or Vanilla JS (Vanilla JS Guide). This API reference is intended for advanced users who need to implement the WebSocket protocol directly.
Connecting to the WebSocket
The client browser connects to the Layercode WebSocket API at the following URL:
Authorizing the WebSocket Connection
When establishing the WebSocket connection, the following query parameter must be included in the request URL:
client_session_key
: A unique session key obtained from the Layercode REST API/authorize
endpoint.
Example full connection URL:
To obtain a client_session_key, you must first create a new session for the user by calling the Layercode REST API /authorize endpoint. This endpoint returns a client_session_key which must be included in the WebSocket connection parameters. This API call should be made from your backend server, not the client browser. This ensures your LAYERCODE_API_KEY is never exposed to the client, and allows you to do any additional user authorization checks required by your application.
WebSocket Events
Client → Server Messages
Audio Streaming
At WebSocket connection, the client should constantly send audio chunks of the user’s microphone in the format below. The content must be the following format:
- Base64 encoded
- 16-bit PCM audio data
- 8000 Hz sample rate
- Mono channel
See the Vanilla JS SDK code for an example of how browser microphone audio is correctly encoded to base64.
Response Audio Replay Finished
The client will receive audio chunks of the assistant’s response (see Audio Response). When the client has finished replaying all assistant audio chunks in its buffer it must reply with ‘client.response_audio_replay_finished’ with the reason ‘completed’ or ‘interrupted’. Note that the assistant webhook can return response.tts events (which are turned into speech and received by the client as response.audio events) at any point during a long response (in between other text or json events), so the client must handle situations where it’s played all the audio in the buffer, but then receives more to play. This will result in the client sending multiple ‘trigger.response.audio.replay_finished’ completed events over a single turn.
Push-to-Talk Control (Optional)
In push-to-talk mode (read more about Turn Taking), the client must send the following events to start and end a user turn to speak. This is typically connected to a button which is held down for the user to speak. In this mode, the client can also pre-emptively halt the assistant response audio playback when the user interrupts (instead of having to wait to receive a turn.end event), a trigger.audio.replay_finished event should also be sent when the user interrupts the assistant response.
Start user turn (user has pressed the button):
End user turn (user has released the button):
Server → Client Messages
The client will receive the following events from Layercode:
Turn Management
When the server detects the start of the user’s turn:
When the end of the user turn is detected:
When it’s the assistant’s turn:
Or end of assistant turn:
Audio Response
The client will receive audio chunks of the assistant’s response, which should be buffered and played immediately.
The content will be audio in the following format:
- Base64 encoded
- 16-bit PCM audio data
- 16000 Hz sample rate
- Mono channel
See the Vanilla JS SDK code for an example of how to play the audio chunks.
Data and State Updates
Your Webhook can return response.data SSE events, which will be forwarded to the browser client. This is ideal for updating UI and state in the browser. If you want to pass text or json deltas instead of full objects, you can simply pass a json object like { "delta": "text delta..." }
and accumulate and render the delta in the client browser.