┌─────────────────┐ ┌──────────────────┐ ┌─────────────────┐
│ Frontend │ │ Backend │ │ Memory │
│ (Next.js) │◄──►│ (FastAPI) │◄──►│ (SQLite) │
│ Port: 3000 │ │ Port: 8001 │ │ Database │
└─────────────────┘ └──────────────────┘ └─────────────────┘
│ │ │
│ ┌──────────────────┐ │
└──────────────►│ Playground │◄────────────┘
│ (WebSocket) │
│ Port: 8765 │
└──────────────────┘
│
┌─────────────────────────┐
│ Voice Processing │
│ • STT (Whisper) │
│ • TTS (Multi-Engine) │
│ • LLM (LM Studio) │
└─────────────────────────┘
- Framework: React 18 with TypeScript
- Styling: Tailwind CSS with custom cyberpunk theme
- Animations: Framer Motion for 3D effects
- Audio: Web Audio API for interactive sounds
- State: React hooks with WebSocket integration
Key Pages:
/- Homepage with feature cards/memory-playground- Live voice chat interface/voice-clone- Voice cloning studio/conversation- Multi-speaker conversations
- Framework: FastAPI with async/await
- Database: SQLite with custom memory service
- Voice Processing: Multi-engine TTS integration
- GPU Acceleration: ONNX Runtime with CUDA
- API Documentation: Auto-generated OpenAPI/Swagger
Core Services:
STTService- Whisper-based speech recognitionTTSService- Multi-engine text-to-speechLLMService- LM Studio integrationVibeVoiceService- Voice cloning and synthesisConversationEngine- Multi-speaker dialogues
- Protocol: WebSocket for real-time communication
- Audio: PyAudio for recording/playback
- Recognition: SpeechRecognition with Google API
- Memory: Integration with SQLite memory service
- Multi-Engine: Support for VibeVoice, KaniTTS, IndexTTS2
- Database: SQLite with conversation tables
- Sessions: User session management
- Context: Conversation history with search
- Persistence: Cross-session memory retention
User Voice Input
↓
WebSocket → Playground
↓
STT (Whisper) → Text
↓
Memory Service → Context
↓
LLM (LM Studio) → Response
↓
TTS (Multi-Engine) → Audio
↓
WebSocket → Frontend → User
Audio Recording
↓
WebSocket → Playground
↓
Backend API → VibeVoice
↓
Model Training → Voice Clone
↓
Database Storage → Voice Profile
↓
TTS Integration → Available Voice
- Python 3.9+ - Backend services
- Node.js 18+ - Frontend build system
- TypeScript - Type-safe development
- SQLite - Lightweight database
- WebSocket - Real-time communication
- Whisper AI - Speech-to-text
- VibeVoice - Voice cloning
- LM Studio - Local LLM inference
- ONNX Runtime - GPU acceleration
- PyTorch - Deep learning framework
- PyAudio - Audio I/O
- SpeechRecognition - STT wrapper
- Pydub - Audio manipulation
- SoundFile - Audio file handling
- LibROSA - Audio analysis
- Local Processing - No cloud dependencies
- Encrypted Storage - SQLite with encryption options
- Session Management - Secure WebSocket connections
- File Validation - Audio file type checking
- CORS Configuration - Controlled cross-origin requests
- Input Validation - Pydantic models for data validation
- Error Handling - Secure error messages
- Rate Limiting - Request throttling (configurable)
- CUDA Support - RTX 5090 optimization
- ONNX Models - Optimized inference
- Model Caching - Prevent reloading
- Batch Processing - Efficient GPU utilization
- Connection Pooling - Database connections
- Audio Streaming - Chunked processing
- Cache Strategy - Voice sample caching
- Cleanup Routines - Temporary file management
- Microservices - Independent service scaling
- Load Balancing - Multiple backend instances
- Database Sharding - User-based partitioning
- CDN Integration - Static asset delivery
- GPU Scaling - Multi-GPU support
- Memory Optimization - Efficient data structures
- CPU Utilization - Async processing
- Storage Optimization - Compressed audio formats
- Health Checks - Service availability
- Performance Metrics - Response times
- Error Tracking - Exception monitoring
- Usage Analytics - Feature utilization
- GPU Utilization - CUDA metrics
- Memory Usage - RAM and VRAM tracking
- Disk I/O - Storage performance
- Network Latency - WebSocket performance
Architecture designed for enterprise-scale voice AI applications with real-time performance and GPU acceleration.