Overview
WebSocket connections provide efficient, low-latency transcript updates compared to polling REST endpoints. Since REST transcript retrieval is not suitable for frequent polling due to server API efficiency concerns, WebSocket subscriptions offer real-time updates without the overhead of repeated HTTP requests. This document describes how to connect to Vexa’s WebSocket API for real-time meeting transcription. The protocol supports subscribing to active meetings and receiving live transcript updates with proper deduplication and speaker grouping. Implementation Reference: Thetesting/ws_realtime_transcription.py script serves as a complete Python implementation of real-time transcript rendering using this WebSocket protocol. It demonstrates the full algorithm from REST bootstrap through WebSocket updates with proper deduplication, speaker grouping, and live terminal rendering.
Prerequisites: The meeting bot must already be running and active for the target meeting.
Starting a Bot (if not already running)
To start a transcription bot for a meeting:Connection Details
WebSocket URL
Derive the WebSocket URL from your API base URL:https://api.example.com→wss://api.example.com/wshttp://localhost:8056→ws://localhost:8056/ws
Authentication
Authentication is performed using theX-API-Key header:
Meeting Identity
Meetings are identified by platform and native meeting ID:google_meet, teams, zoom
REST API Bootstrap
Before connecting to WebSocket, fetch the last full transcript via REST API:WebSocket Protocol
Subscription
Send subscription message after connecting:action: Always"subscribe"meetings: Array of meeting objects withplatformandnative_id
Message Types
transcript.mutable
Live transcript segments that may be updated.
session_uid, speaker_mapping_status, and relative timing (start, end_time) may be present but are not required for basic transcript processing.
transcript.finalized
DEPRECATED: No longer emitted. transcript.finalized messages are not used by clients. Only transcript.mutable messages are processed for live transcript updates. Use the REST API endpoint to fetch the complete, stable transcript.
meeting.status
Meeting status updates.
requested, joining, awaiting_admission, connecting, active, stopping, completed, failed
subscribed
Confirmation of successful subscription.
pong
Response to ping messages.
error
Error messages.
Interactive Bot Events
When a bot has interactive capabilities enabled (default), the following additional events are published on the WebSocket connection:| Event Type | Payload | Description |
|---|---|---|
speak.started | {"text": "..."} | Bot started speaking |
speak.completed | — | Speech playback finished |
speak.interrupted | — | Speech interrupted via API |
chat.received | {"sender": "John", "text": "...", "timestamp": 1234} | Chat message captured from a participant |
chat.sent | {"text": "..."} | Bot sent a chat message |
screen.sharing_started | {"content_type": "image"} | Screen sharing started |
screen.sharing_stopped | — | Screen sharing stopped |
va:meeting:{meeting_id}:events. See the Interactive Bots guide for full details on controlling the bot’s voice, chat, and screen share capabilities.
Segment Schema
Minimum fields to consume:| Field | Type | Required | Description |
|---|---|---|---|
text | string | Yes | Transcript text content |
speaker | string | No | Speaker identifier |
language | string | No | Language code (e.g., “en”, “es”) |
absolute_start_time | string | Yes | UTC timestamp (ISO 8601) |
absolute_end_time | string | Yes | UTC timestamp (ISO 8601) |
updated_at | string | No | Last update timestamp |
Algorithm
Implemented intesting/ws_realtime_transcription.py
1. Bootstrap
- Fetch initial transcript via REST API
- Seed in-memory map keyed by
absolute_start_time - Ignore segments missing
absolute_start_timefor ordering
2. WebSocket Updates
For eachtranscript.mutable message:
- For every segment with
absolute_start_time:- Upsert into map by key
- If
updated_atexists on both existing and incoming, keep the newer (updated_atmax) - Discard segments with empty/whitespace-only
text
3. Rendering Order
Sort byabsolute_start_time ascending:
4. Speaker Merging
Group consecutive segments by same speaker:5. Rendering Strategy
For maximum readability, re-render the entire transcript on every update:\033[H: Move cursor to home position (top-left)\033[J: Clear screen from cursor to endend='': Suppress newline for immediate effect
Keepalive
Client may send ping messages:pong. Recommended ping interval: 25 seconds.
Error Handling
- Log
errormessages but continue processing - Handle connection drops gracefully
- Reconnect and resubscribe as needed
- Idempotent merging preserves order on reconnection
Environment Variables
Example Usage
See the real-time transcription script for a complete implementation:Complete Implementation
The real-time transcription script (testing/ws_realtime_transcription.py) serves as a complete reference implementation of this WebSocket protocol. It demonstrates:
- REST API Bootstrap: Fetching initial transcript data
- WebSocket Connection: Proper authentication and subscription
- Message Processing: Handling all WebSocket event types
- Data Deduplication: Merging segments by
absolute_start_timewithupdated_atprecedence - Speaker Grouping: Combining consecutive segments by speaker
- Live Rendering: Full re-render strategy with ANSI escape codes
- Error Handling: Graceful handling of connection issues
Raw Debug Mode
Use the--raw flag to debug WebSocket message flow:
- Display raw JSON frames in terminal with
RAW:prefix - Log all messages to
testing/logs/ws_raw.log
testing/logs/ws_raw.log (single file, appends all runs)
Example log file line: