For most use cases, we recommend using our SDKs for React (React Guide) or Vanilla JS (Vanilla JS Guide). This API reference is
intended for advanced users who need to implement the WebSocket protocol
directly.
Connecting to the WebSocket
The client browser connects to the Layercode WebSocket API at the following URL:Authorizing the WebSocket Connection
When establishing the WebSocket connection, the following query parameter must be included in the request URL:client_session_key
: A unique session key obtained from the Layercode REST API/authorize
endpoint.
WebSocket Events
Client → Server Messages
Client Ready
When the client has established the WebSocket connection and is ready to begin streaming audio, it should send a ready message:Audio Streaming
At WebSocket connection, the client should constantly send audio chunks of the user’s microphone in the format below. The content must be the following format:- Base64 encoded
- 16-bit PCM audio data
- 8000 Hz sample rate
- Mono channel
Voice Activity Detection Events
The client can send Voice Activity Detection (VAD) events to inform the server about speech detection. This will improve the speed and accuracy of automatic turn taking: VAD detects voice activity: Note: The client is responsible for stopping any in-progress assistant audio playback when the user interrupts.Response Audio Replay Finished
The client will receive audio chunks of the assistant’s response (see Audio Response). When the client has finished replaying all assistant audio chunks in its buffer it must reply with ‘client.response_audio_replay_finished’ Note that the assistant webhook can return response.tts events (which are turned into speech and received by the client as response.audio events) at any point during a long response (in between other text or json events), so the client must handle situations where it’s played all the audio in the buffer, but then receives more to play. This will result in the client sending multiple ‘trigger.response.audio.replay_finished’ completed events over a single turn.Push-to-Talk Control (Optional)
In push-to-talk mode (read more about Turn Taking), the client must send the following events to start and end a user turn to speak. This is typically connected to a button which is held down for the user to speak. In this mode, the client can also preemptively halt the assistant’s audio playback when the user interrupts. Instead of waiting to receive aturn.strat
event (which indicates a turn change), send a trigger.audio.replay_finished
event when the user interrupts the assistant.
Start user turn (user has pressed the button):
Server → Client Messages
The client will receive the following events from Layercode:Turn Management
When the server detects the start of the user’s turn:Audio Response
The client will receive audio chunks of the assistant’s response, which should be buffered and played immediately. The content will be audio in the following format:- Base64 encoded
- 16-bit PCM audio data
- 16000 Hz sample rate
- Mono channel
Text Response
The client will receive text chunks of the assistant’s response for display or processing:Data and State Updates
Your Webhook can return response.data SSE events, which will be forwarded to the browser client. This is ideal for updating UI and state in the browser. If you want to pass text or json deltas instead of full objects, you can simply pass a json object like{ "delta": "text delta..." }
and accumulate and render the delta in the client browser.