The Interactive Bots feature transforms the Vexa bot from a passive transcription observer into a fully interactive meeting participant. An external agent or application controls the bot via REST API to speak, read/write chat, and share visual content during a live meeting.
Capabilities
| Capability | Description | Status |
|---|
| Speak | Text-to-speech or raw audio playback into the meeting | Working |
| Chat write | Send messages to the meeting chat | Working |
| Chat read | Capture messages from the meeting chat | Working |
| Screen share | Display images, URLs, or video via screen share | Working |
| Virtual camera | Show avatar/content via the bot’s camera feed | Experimental |
Quick Start
Send a bot to a meeting
Interactive bot capabilities are enabled by default (voice_agent_enabled: true). Just send a regular bot:curl -X POST "$API_BASE/bots" \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{
"platform": "google_meet",
"native_meeting_id": "abc-defg-hij",
"bot_name": "AI Assistant"
}'
Make the bot speak
curl -X POST "$API_BASE/bots/google_meet/abc-defg-hij/speak" \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{"text": "Hello everyone, I am the meeting assistant.", "voice": "nova"}'
Send a chat message
curl -X POST "$API_BASE/bots/google_meet/abc-defg-hij/chat" \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{"text": "Meeting summary: 3 action items identified."}'
Share visual content
curl -X POST "$API_BASE/bots/google_meet/abc-defg-hij/screen" \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{"type": "image", "url": "https://example.com/quarterly-chart.png"}'
For the full endpoint reference, see Interactive Bots API.
For interactive command endpoints (speak, chat write, screen, avatar), the API returns:
{
"message": "Speak command sent",
"meeting_id": 227
}
message varies by endpoint (Speak stop command sent, Chat message sent, Screen content command sent, Avatar set command sent, etc.).
Chat read returns:
{
"messages": [],
"meeting_id": 227
}
When the meeting is not active, interactive endpoints return:
{
"detail": "No active meeting found for google_meet/abc-defg-hij"
}
How It Works
When voice_agent_enabled is set, the bot’s audio pipeline changes: instead of feeding silence as mic input, the bot reads from a PulseAudio virtual microphone that receives TTS audio. The bot starts muted and auto-unmutes only when speaking.
Audio Pipeline (TTS)
OpenAI TTS API
-> PCM stream (24 kHz, mono)
-> PulseAudio virtual sink
-> Chromium default audio source
-> WebRTC -> meeting participants hear speech
The bot unmutes before playback and re-mutes once audio finishes or is interrupted.
Screen Content Pipeline
API request (image/url/video)
-> Playwright renders content on Xvfb (1920x1080)
-> Content displayed fullscreen
-> Bot clicks "Present now" in meeting UI
-> Participants see shared screen with content
Chat Pipeline
The bot interacts with the meeting’s native chat UI via DOM automation:
- Write: opens the chat panel, types the message, and sends it
- Read: captures messages from the chat panel (sender, text, timestamp)
Chat Read & Write
Chat enables two-way text communication in the meeting alongside voice.
Write a message:
curl -X POST "$API_BASE/bots/google_meet/abc-defg-hij/chat" \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{"text": "Here is the meeting summary so far."}'
Read all messages:
curl "$API_BASE/bots/google_meet/abc-defg-hij/chat" \
-H "X-API-Key: $API_KEY"
Returns an object with messages and meeting_id. Each message has sender, text, timestamp (Unix milliseconds), and is_from_bot fields. Real-time chat events are also available via WebSocket (chat.received, chat.sent).
Screen Share (Showing Images & Content)
Display visual content to meeting participants via the bot’s screen share. Three content types are supported:
| Type | Description |
|---|
image | Renders an image fullscreen on a black background |
url | Opens a URL in a browser window (e.g., Google Slides, dashboards) |
video | Plays video fullscreen with autoplay |
# Share an image
curl -X POST "$API_BASE/bots/google_meet/abc-defg-hij/screen" \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{"type": "image", "url": "https://example.com/chart.png"}'
# Share a Google Slides presentation
curl -X POST "$API_BASE/bots/google_meet/abc-defg-hij/screen" \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{"type": "url", "url": "https://docs.google.com/presentation/d/..."}'
# Stop sharing
curl -X DELETE "$API_BASE/bots/google_meet/abc-defg-hij/screen" \
-H "X-API-Key: $API_KEY"
Avatar (Virtual Camera)
The virtual camera feature is experimental. It works intermittently on Google Meet due to WebRTC replaceTrack reliability. For displaying visual content to participants, screen share is recommended as the more reliable approach.
The virtual camera uses a canvas-based approach to replace the bot’s camera feed with custom content (e.g., an avatar image or animation). When working, participants see the avatar in the bot’s video tile instead of a blank camera.
You can set or reset the avatar at any time via the API:
# Set a custom avatar
curl -X PUT "$API_BASE/bots/google_meet/abc-defg-hij/avatar" \
-H "Content-Type: application/json" \
-H "X-API-Key: $API_KEY" \
-d '{"url": "https://example.com/avatar.png"}'
# Reset to default Vexa logo
curl -X DELETE "$API_BASE/bots/google_meet/abc-defg-hij/avatar" \
-H "X-API-Key: $API_KEY"
See the Avatar API reference for full details.
Current limitations:
- Only tested on Google Meet
replaceTrack into WebRTC works intermittently
- Screen share is the recommended alternative for displaying images and content
WebSocket Events
When interactive bot mode is enabled, additional events are published on the WebSocket connection:
| Event | Payload | Description |
|---|
speak.started | {"text": "..."} | Bot started speaking |
speak.completed | — | Speech playback finished |
speak.interrupted | — | Speech was interrupted via API |
chat.received | {"sender": "John", "text": "...", "timestamp": 1234} | Chat message captured from a participant |
chat.sent | {"text": "..."} | Bot sent a chat message |
screen.sharing_started | {"content_type": "image"} | Screen sharing started |
screen.sharing_stopped | — | Screen sharing stopped |
| Feature | Google Meet | Teams | Zoom |
|---|
| Speak (TTS) | Supported | Supported (validated API flow) | Requires Zoom SDK setup |
| Chat write | Supported | Supported (validated API flow) | Requires Zoom SDK setup |
| Chat read | Supported | Supported (validated API flow) | Requires Zoom SDK setup |
| Screen share | Supported | Supported (validated API flow) | Requires Zoom SDK setup |
| Virtual camera | Experimental | — | — |
Prerequisites
OpenAI API key for text-to-speech synthesis. Passed through docker-compose.yml to the bot container.
PulseAudio is already configured in the bot container (entrypoint.sh). No manual setup is needed.
Known Limitations
- Virtual camera is experimental — the canvas-based virtual camera works intermittently on Google Meet. Screen share is more reliable for displaying visual content.
- Single TTS provider — currently only OpenAI TTS is implemented. The architecture supports adding other providers.
- Zoom requires native SDK artifacts — without Zoom Meeting SDK binaries, Zoom joins fail during startup.
- No speech queue — rapid speak commands may overlap. Wait for the
speak.completed WebSocket event before sending the next command, or use DELETE /speak to interrupt.
- Teams avatar not visible — Teams SFU returns
a=inactive for video from anonymous guests, so the bot’s virtual camera/avatar is never visible to other Teams participants. Use screen share instead. (#124)