Skip to main content
The Interactive Bots feature transforms the Vexa bot from a passive transcription observer into a fully interactive meeting participant. An external agent or application controls the bot via REST API to speak, read/write chat, and share visual content during a live meeting.

Capabilities

CapabilityDescriptionStatus
SpeakText-to-speech or raw audio playback into the meetingWorking
Chat writeSend messages to the meeting chatWorking
Chat readCapture messages from the meeting chatWorking
Screen shareDisplay images, URLs, or video via screen shareWorking
Virtual cameraShow avatar/content via the bot’s camera feedExperimental

Quick Start

1

Send a bot to a meeting

Interactive bot capabilities are enabled by default (voice_agent_enabled: true). Just send a regular bot:
curl -X POST "$API_BASE/bots" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $API_KEY" \
  -d '{
    "platform": "google_meet",
    "native_meeting_id": "abc-defg-hij",
    "bot_name": "AI Assistant"
  }'
2

Make the bot speak

curl -X POST "$API_BASE/bots/google_meet/abc-defg-hij/speak" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $API_KEY" \
  -d '{"text": "Hello everyone, I am the meeting assistant.", "voice": "nova"}'
3

Send a chat message

curl -X POST "$API_BASE/bots/google_meet/abc-defg-hij/chat" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $API_KEY" \
  -d '{"text": "Meeting summary: 3 action items identified."}'
4

Share visual content

curl -X POST "$API_BASE/bots/google_meet/abc-defg-hij/screen" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $API_KEY" \
  -d '{"type": "image", "url": "https://example.com/quarterly-chart.png"}'
For the full endpoint reference, see Interactive Bots API.

Output Formats

For interactive command endpoints (speak, chat write, screen, avatar), the API returns:
{
  "message": "Speak command sent",
  "meeting_id": 227
}
message varies by endpoint (Speak stop command sent, Chat message sent, Screen content command sent, Avatar set command sent, etc.). Chat read returns:
{
  "messages": [],
  "meeting_id": 227
}
When the meeting is not active, interactive endpoints return:
{
  "detail": "No active meeting found for google_meet/abc-defg-hij"
}

How It Works

When voice_agent_enabled is set, the bot’s audio pipeline changes: instead of feeding silence as mic input, the bot reads from a PulseAudio virtual microphone that receives TTS audio. The bot starts muted and auto-unmutes only when speaking.

Audio Pipeline (TTS)

OpenAI TTS API
  -> PCM stream (24 kHz, mono)
  -> PulseAudio virtual sink
  -> Chromium default audio source
  -> WebRTC -> meeting participants hear speech
The bot unmutes before playback and re-mutes once audio finishes or is interrupted.

Screen Content Pipeline

API request (image/url/video)
  -> Playwright renders content on Xvfb (1920x1080)
  -> Content displayed fullscreen
  -> Bot clicks "Present now" in meeting UI
  -> Participants see shared screen with content

Chat Pipeline

The bot interacts with the meeting’s native chat UI via DOM automation:
  • Write: opens the chat panel, types the message, and sends it
  • Read: captures messages from the chat panel (sender, text, timestamp)

Chat Read & Write

Chat enables two-way text communication in the meeting alongside voice. Write a message:
curl -X POST "$API_BASE/bots/google_meet/abc-defg-hij/chat" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $API_KEY" \
  -d '{"text": "Here is the meeting summary so far."}'
Read all messages:
curl "$API_BASE/bots/google_meet/abc-defg-hij/chat" \
  -H "X-API-Key: $API_KEY"
Returns an object with messages and meeting_id. Each message has sender, text, timestamp (Unix milliseconds), and is_from_bot fields. Real-time chat events are also available via WebSocket (chat.received, chat.sent).

Screen Share (Showing Images & Content)

Display visual content to meeting participants via the bot’s screen share. Three content types are supported:
TypeDescription
imageRenders an image fullscreen on a black background
urlOpens a URL in a browser window (e.g., Google Slides, dashboards)
videoPlays video fullscreen with autoplay
# Share an image
curl -X POST "$API_BASE/bots/google_meet/abc-defg-hij/screen" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $API_KEY" \
  -d '{"type": "image", "url": "https://example.com/chart.png"}'

# Share a Google Slides presentation
curl -X POST "$API_BASE/bots/google_meet/abc-defg-hij/screen" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $API_KEY" \
  -d '{"type": "url", "url": "https://docs.google.com/presentation/d/..."}'

# Stop sharing
curl -X DELETE "$API_BASE/bots/google_meet/abc-defg-hij/screen" \
  -H "X-API-Key: $API_KEY"

Avatar (Virtual Camera)

The virtual camera feature is experimental. It works intermittently on Google Meet due to WebRTC replaceTrack reliability. For displaying visual content to participants, screen share is recommended as the more reliable approach.
The virtual camera uses a canvas-based approach to replace the bot’s camera feed with custom content (e.g., an avatar image or animation). When working, participants see the avatar in the bot’s video tile instead of a blank camera. You can set or reset the avatar at any time via the API:
# Set a custom avatar
curl -X PUT "$API_BASE/bots/google_meet/abc-defg-hij/avatar" \
  -H "Content-Type: application/json" \
  -H "X-API-Key: $API_KEY" \
  -d '{"url": "https://example.com/avatar.png"}'

# Reset to default Vexa logo
curl -X DELETE "$API_BASE/bots/google_meet/abc-defg-hij/avatar" \
  -H "X-API-Key: $API_KEY"
See the Avatar API reference for full details. Current limitations:
  • Only tested on Google Meet
  • replaceTrack into WebRTC works intermittently
  • Screen share is the recommended alternative for displaying images and content

WebSocket Events

When interactive bot mode is enabled, additional events are published on the WebSocket connection:
EventPayloadDescription
speak.started{"text": "..."}Bot started speaking
speak.completedSpeech playback finished
speak.interruptedSpeech was interrupted via API
chat.received{"sender": "John", "text": "...", "timestamp": 1234}Chat message captured from a participant
chat.sent{"text": "..."}Bot sent a chat message
screen.sharing_started{"content_type": "image"}Screen sharing started
screen.sharing_stoppedScreen sharing stopped

Platform Support

FeatureGoogle MeetTeamsZoom
Speak (TTS)SupportedSupported (validated API flow)Requires Zoom SDK setup
Chat writeSupportedSupported (validated API flow)Requires Zoom SDK setup
Chat readSupportedSupported (validated API flow)Requires Zoom SDK setup
Screen shareSupportedSupported (validated API flow)Requires Zoom SDK setup
Virtual cameraExperimental

Prerequisites

OPENAI_API_KEY
string
required
OpenAI API key for text-to-speech synthesis. Passed through docker-compose.yml to the bot container.
PulseAudio is already configured in the bot container (entrypoint.sh). No manual setup is needed.

Known Limitations

  1. Virtual camera is experimental — the canvas-based virtual camera works intermittently on Google Meet. Screen share is more reliable for displaying visual content.
  2. Single TTS provider — currently only OpenAI TTS is implemented. The architecture supports adding other providers.
  3. Zoom requires native SDK artifacts — without Zoom Meeting SDK binaries, Zoom joins fail during startup.
  4. No speech queue — rapid speak commands may overlap. Wait for the speak.completed WebSocket event before sending the next command, or use DELETE /speak to interrupt.
  5. Teams avatar not visible — Teams SFU returns a=inactive for video from anonymous guests, so the bot’s virtual camera/avatar is never visible to other Teams participants. Use screen share instead. (#124)