Skip to main content
This page covers Vexa’s architecture, resource requirements per bot, and how to scale for concurrent meetings.

Architecture Overview

Vexa follows a one-browser-per-bot model. Each meeting bot runs as an isolated container with its own Chromium instance:
Meeting Bot (per meeting)           Shared Services
┌──────────────────────┐    ┌─────────────────────────────┐
│ Chromium (Playwright) │    │ API Gateway (port 8056)     │
│ Audio capture         │───>│ Meeting API (port 8080)     │
│ Speaker detection     │    │ Runtime API (port 8090)     │
│ Transcription client  │    │ Transcription Service (GPU) │
└──────────────────────┘    │ Redis, PostgreSQL            │
                            └─────────────────────────────┘
Bot containers are ephemeral — they are created when you request a bot and destroyed after the meeting ends (or after an idle timeout).

Resource Requirements Per Bot

ResourceRequest (steady-state)Limit (peak)
CPU250m1000m
Memory600 Mi1 Gi
Shared memory2 GB (/dev/shm)2 GB
These numbers were measured on production workloads (March 2026). The 2 GB shared memory is required by Chromium for canvas and media operations.

Estimating capacity

On a node with 4 CPU cores and 8 GB RAM:
ConstraintConcurrent bots
CPU (by limit, worst case)4
CPU (by request, typical)16
Memory (by limit)8
Practical recommendation4-8
Scale horizontally by adding more nodes rather than increasing node size.

Orchestration Backends

Vexa’s Runtime API supports three container backends, configured via ORCHESTRATOR_BACKEND:
BackendValueCPU limitsMemory limitsBest for
KuberneteskubernetesEnforced (pod limits)Enforced (OOMKill)Production
DockerdockerNot enforcedEnforced (cgroups)Single-host, dev
ProcessprocessNot enforcedBest-effortVexa Lite, lightweight dev
The Docker backend silently ignores CPU limits. Bot containers get unlimited CPU access. Use Kubernetes for production workloads where resource isolation matters.

Kubernetes deployment

Vexa provides Helm charts at deploy/helm/:
# Install
helm install vexa deploy/helm/charts/vexa \
  -f your-values.yaml \
  --namespace vexa --create-namespace

# Upgrade
helm upgrade vexa deploy/helm/charts/vexa \
  -f your-values.yaml \
  --namespace vexa
Service resource allocations in the Helm chart:
ServiceCPU requestMemory limit
api-gateway100m512 Mi
meeting-api200m1 Gi
runtime-api100m512 Mi
redis100m1 Gi
postgres200m4 Gi
Bot containers are dynamically created as Kubernetes pods by the Runtime API — they are not part of the Helm release.

Docker Compose deployment

For single-host development and testing:
cd vexa
make all
Memory limits per service are defined in deploy/compose/docker-compose.yml. See Docker Compose Deployment for the full guide.

Bot Lifecycle and Cleanup

Bot containers have automatic timeouts:
TimeoutDefaultDescription
Waiting room15 minBot leaves if not admitted within 15 minutes
Everyone left15 minBot leaves 15 minutes after last participant leaves
No one joined2 minBot leaves if no participant joins within 2 minutes
Idle TTL5-60 minContainer removed after idle timeout (configurable per profile)
Containers are automatically cleaned up after meetings end. The Runtime API uses Redis-backed heartbeats to track liveness.

Per-User Concurrency Limits

The Admin API supports per-user bot limits:
curl -X POST "$API_BASE/admin/users" \
  -H "X-Admin-API-Key: $ADMIN_TOKEN" \
  -H "Content-Type: application/json" \
  -d '{"email": "user@example.com", "max_concurrent_bots": 5}'
Set max_concurrent_bots to limit how many simultaneous bots a user can run.

Transcription Service Scaling

If self-hosting transcription, a single GPU handles approximately 2 concurrent meetings with large-v3-turbo. The service returns 503 when the queue is full. For higher concurrency:
  • Run multiple transcription service replicas behind a load balancer
  • Use smaller models (small, base) for higher throughput at lower quality
  • Use INT8 compute type (default) for 50-60% VRAM reduction
See Transcription Quality for model selection details.

Deployment Options Summary

OptionBotsScalingComplexity
Vexa LiteProcess-based (in-container)Vertical onlyLowest
Docker ComposeDocker containersSingle-hostLow
Helm / KubernetesK8s podsHorizontalMedium