The Interactive Bots feature transforms the Vexa bot from a passive transcription observer into a fully interactive meeting participant. An external agent or application controls the bot via REST API to speak, read/write chat, and share visual content during a live meeting.Documentation Index
Fetch the complete documentation index at: https://docs.vexa.ai/llms.txt
Use this file to discover all available pages before exploring further.
Capabilities
| Capability | Description | Status |
|---|---|---|
| Speak | Text-to-speech or raw audio playback into the meeting | Working |
| Chat write | Send messages to the meeting chat | Working |
| Chat read | Capture messages from the meeting chat | Working |
| Screen share | Display images, URLs, or video via screen share | Working |
| Virtual camera | Show avatar/content via the bot’s camera feed | Experimental |
Quick Start
Send a bot to a meeting
Interactive bot capabilities are enabled by default (
voice_agent_enabled: true). Just send a regular bot:Output Formats
For interactive command endpoints (speak, chat write, screen, avatar), the API returns:
message varies by endpoint (Speak stop command sent, Chat message sent, Screen content command sent, Avatar set command sent, etc.).
Chat read returns:
How It Works
Whenvoice_agent_enabled is set, the bot’s audio pipeline changes: instead of feeding silence as mic input, the bot reads from a PulseAudio virtual microphone that receives TTS audio. The bot starts muted and auto-unmutes only when speaking.
Audio Pipeline (TTS)
Screen Content Pipeline
Chat Pipeline
The bot interacts with the meeting’s native chat UI via DOM automation:- Write: opens the chat panel, types the message, and sends it
- Read: captures messages from the chat panel (sender, text, timestamp)
Chat Read & Write
Chat enables two-way text communication in the meeting alongside voice. Write a message:messages and meeting_id. Each message has sender, text, timestamp (Unix milliseconds), and is_from_bot fields. Real-time chat events are also available via WebSocket (chat.received, chat.sent).
Screen Share (Showing Images & Content)
Display visual content to meeting participants via the bot’s screen share. Three content types are supported:| Type | Description |
|---|---|
image | Renders an image fullscreen on a black background |
url | Opens a URL in a browser window (e.g., Google Slides, dashboards) |
video | Plays video fullscreen with autoplay |
Avatar (Virtual Camera)
The virtual camera uses a canvas-based approach to replace the bot’s camera feed with custom content (e.g., an avatar image or animation). When working, participants see the avatar in the bot’s video tile instead of a blank camera. You can set or reset the avatar at any time via the API:- Only tested on Google Meet
replaceTrackinto WebRTC works intermittently- Screen share is the recommended alternative for displaying images and content
WebSocket Events
When interactive bot mode is enabled, additional events are published on the WebSocket connection:| Event | Payload | Description |
|---|---|---|
speak.started | {"text": "..."} | Bot started speaking |
speak.completed | — | Speech playback finished |
speak.interrupted | — | Speech was interrupted via API |
chat.received | {"sender": "John", "text": "...", "timestamp": 1234} | Chat message captured from a participant |
chat.sent | {"text": "..."} | Bot sent a chat message |
screen.sharing_started | {"content_type": "image"} | Screen sharing started |
screen.sharing_stopped | — | Screen sharing stopped |
Platform Support
| Feature | Google Meet | Teams | Zoom |
|---|---|---|---|
| Speak (TTS) | Supported | Beta (requires M365 Business Basic) | Requires Zoom SDK setup |
| Chat write | Supported | Beta (requires M365 Business Basic) | Requires Zoom SDK setup |
| Chat read | Supported | Beta (requires M365 Business Basic) | Requires Zoom SDK setup |
| Screen share | Supported | Beta (requires M365 Business Basic) | Requires Zoom SDK setup |
| Virtual camera | Experimental | — | — |
Prerequisites
OpenAI API key for text-to-speech synthesis. Passed through
deploy/compose/docker-compose.yml to the bot container.entrypoint.sh). No manual setup is needed.
Known Limitations
- Virtual camera is experimental — the canvas-based virtual camera works intermittently on Google Meet. Screen share is more reliable for displaying visual content.
- Single TTS provider — currently only OpenAI TTS is implemented. The architecture supports adding other providers.
- Zoom requires native SDK artifacts — without Zoom Meeting SDK binaries, Zoom joins fail during startup.
- No speech queue — rapid speak commands may overlap. Wait for the
speak.completedWebSocket event before sending the next command, or useDELETE /speakto interrupt. - Teams avatar not visible — Teams SFU returns
a=inactivefor video from anonymous guests, so the bot’s virtual camera/avatar is never visible to other Teams participants. Use screen share instead. (#124)
Related
- Interactive Bots API Reference — full endpoint documentation
- WebSocket — real-time event streaming
- Bots API — requesting bots with
voice_agent_enabled - Concepts — meeting/bot/session model