A comprehensive knowledge base system that supports multilingual document processing and intelligent query answering using Azure AI services and agentic retrieval capabilities.
This system enables users to upload documents in multiple formats (PDF, Word, Excel, PowerPoint, JPEG) and query them in both English and Korean. It leverages Azure's AI services and an agentic architecture for intelligent document processing and retrieval.
- ✅ Multilingual Support: English and Korean language processing
- ✅ Multiple Document Formats: PDF, DOCX, XLSX, PPTX, JPEG
- ✅ Agentic Retrieval: Intelligent multi-agent system for query processing
- ✅ Hybrid Search: Combination of semantic and keyword search
- ✅ Real-time Processing: Asynchronous document processing pipeline
- ✅ Bilingual Responses: Generate answers in both languages simultaneously
- ✅ Chat Interface: Interactive conversational UI with document upload
- ✅ Scalable Architecture: Built on Azure cloud services
This repository contains comprehensive documentation to guide you through the entire project:
-
- Business requirements and objectives
- Technical capabilities and specifications
- Technology stack requirements
-
- High-level system architecture
- Detailed component diagrams
- Agentic retrieval system design
- Data flow and processing pipelines
- Azure services integration
- Security and scalability architecture
- Technology stack mapping
-
- Complete REST API documentation
- WebSocket API specifications
- Request/response schemas
- Authentication and authorization
- Error handling and rate limiting
-
- Development environment setup
- Step-by-step implementation instructions
- Code examples and samples
- Testing strategies
- Best practices
-
DEPLOYMENT_OPERATIONS_GUIDE.md
- Azure resource provisioning
- Deployment procedures
- Monitoring and logging
- Backup and disaster recovery
- Scaling strategies
- Security hardening
- Troubleshooting guide
┌─────────────────────────────────────────────────────────────┐
│ React Frontend │
│ (Chat Interface + Document Management) │
└───────────────────────┬─────────────────────────────────────┘
│
↓
┌─────────────────────────────────────────────────────────────┐
│ Azure API Management / Gateway │
└───────────────────────┬─────────────────────────────────────┘
│
┌───────────────┼───────────────┐
↓ ↓ ↓
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Node.js API │ │Python Service│ │Azure Functions│
│ (Express) │ │ (FastAPI) │ │ (Serverless) │
└──────┬───────┘ └──────┬───────┘ └──────┬───────┘
│ │ │
└────────────────┼─────────────────┘
↓
┌──────────────────────────────┐
│ Agentic Orchestrator │
│ - Query Analyzer │
│ - Retrieval Coordinator │
│ - Context Builder │
│ - Response Generator │
└──────────────┬───────────────┘
│
┌───────────────┼───────────────┐
↓ ↓ ↓
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│Azure OpenAI │ │ AI Search │ │Cognitive Svcs│
│ (GPT-4) │ │(Hybrid Index)│ │ (OCR, Vision)│
└──────────────┘ └──────────────┘ └──────────────┘
│
┌───────────────┼───────────────┐
↓ ↓ ↓
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│Blob Storage │ │Cosmos DB │ │PostgreSQL │
│ (Documents) │ │ (Metadata) │ │(pgvector) │
└──────────────┘ └──────────────┘ └──────────────┘
- Azure Subscription with required permissions
- Node.js 20+ and npm/yarn
- Python 3.11+
- Docker Desktop
- Git
- Azure CLI
- VS Code (recommended)
-
Clone the repository
git clone <repository-url> cd az-ai-agentic-retrieval
-
Set up environment variables
cp .env.example .env # Edit .env with your Azure credentials -
Install dependencies
# Backend API cd backend/api npm install # Python Processing Service cd ../processing python -m venv venv source venv/bin/activate # On Windows: venv\Scripts\activate pip install -r requirements.txt # Frontend cd ../../frontend npm install
-
Start development environment
# Using Docker Compose (recommended) docker-compose up -d # Or start services individually cd backend/api && npm run dev cd backend/processing && uvicorn main:app --reload cd frontend && npm start
-
Access the application
- Frontend: http://localhost:3001
- API: http://localhost:3000
- Processing Service: http://localhost:8000
- React 18+ with TypeScript
- Redux Toolkit for state management
- Material-UI / Ant Design for UI components
- Socket.io for real-time updates
- Node.js 20+ with Express
- Python 3.11+ with FastAPI
- Azure Functions for serverless compute
- LangChain for LLM orchestration
- Azure OpenAI (GPT-4, Embeddings)
- Azure Cognitive Services (Vision, Language)
- Azure AI Search (Hybrid search)
- Azure Blob Storage (Document storage)
- Azure Cosmos DB (NoSQL database)
- Azure PostgreSQL with pgvector (Vector storage)
- Azure Functions (Event-driven processing)
- Azure API Management (API gateway)
- Application Insights (Monitoring)
- Docker for containerization
- GitHub Actions for CI/CD
- Jest for unit testing
- Playwright for E2E testing
- Bicep/Terraform for infrastructure as code
az-ai-agentic-retrieval/
├── backend/
│ ├── api/ # Node.js Express API
│ ├── processing/ # Python processing service
│ └── functions/ # Azure Functions
├── frontend/ # React application
├── infrastructure/ # IaC templates (Bicep/Terraform)
├── tests/ # Test suites
├── docs/ # Additional documentation
├── .github/workflows/ # CI/CD pipelines
├── docker-compose.yml # Local development setup
├── REQUIREMENTS.md # Business requirements
├── ARCHITECTURE.md # System architecture
├── API_SPECIFICATION.md # API documentation
├── IMPLEMENTATION_GUIDE.md # Implementation details
└── DEPLOYMENT_OPERATIONS_GUIDE.md # Deployment procedures
# Backend API
cd backend/api
npm test
# Python Service
cd backend/processing
pytest
# Frontend
cd frontend
npm testnpm run test:integrationnpm run test:e2e-
Provision Azure Resources
# Using Bicep az deployment sub create \ --location eastus \ --template-file infrastructure/bicep/main.bicep \ --parameters environment=production # Or using Terraform cd infrastructure/terraform terraform init terraform apply
-
Deploy Applications
# Deploy API cd backend/api npm run build az webapp deployment source config-zip \ --resource-group multilingual-kb-prd-rg \ --name multilingual-prd-api \ --src api.zip # Deploy Functions cd ../functions func azure functionapp publish multilingual-prd-func # Deploy Frontend cd ../../frontend npm run build az staticwebapp deploy --name multilingual-prd-frontend
-
Verify Deployment
# Check health endpoints curl https://api.multilingual-kb.azure.com/health
For detailed deployment instructions, see DEPLOYMENT_OPERATIONS_GUIDE.md.
The system includes comprehensive monitoring:
- Application Insights: Application performance monitoring
- Log Analytics: Centralized logging
- Azure Monitor: Infrastructure monitoring
- Custom Dashboards: Business metrics and KPIs
- Alerts: Proactive issue detection
Access dashboards:
- Azure Portal → Application Insights → Dashboard
- Grafana dashboards (if configured)
Security features include:
- Authentication: Azure AD B2C / OAuth 2.0
- Authorization: Role-based access control (RBAC)
- Data Encryption: At rest and in transit
- Network Security: VNet integration, private endpoints
- Secrets Management: Azure Key Vault
- API Security: Rate limiting, request validation
- Compliance: GDPR, SOC 2 ready
Automated pipelines using GitHub Actions:
┌──────────────┐ ┌──────────────┐ ┌──────────────┐
│ Code Push │────▶│ Build & Test│────▶│ Deploy │
│ (main) │ │ - Lint │ │ - Staging │
│ │ │ - Unit │ │ - Prod │
└──────────────┘ │ - Integration └──────────────┘
└──────────────┘
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
Complete API documentation is available at:
- Local: http://localhost:3000/api-docs
- Production: https://api.multilingual-kb.azure.com/api-docs
See API_SPECIFICATION.md for detailed specifications.
Common issues and solutions:
Solution: Check Azure Function logs and verify file format support
Solution: Review cache configuration and database query optimization
Solution: Verify Azure AD configuration and token validity
For more troubleshooting tips, see DEPLOYMENT_OPERATIONS_GUIDE.md.
| Metric | Target | Notes |
|---|---|---|
| Query Response | < 3 seconds | Cached/simple queries |
| Document Upload | < 10 seconds | Files up to 10MB |
| Document Processing | < 2 minutes | Complex multi-page docs |
| Concurrent Users | 1000+ | With auto-scaling |
| Availability | 99.9% | SLA target |
| Search Accuracy | > 90% | Top-5 relevance |
- Basic document upload and processing
- English and Korean language support
- Simple query interface
- Azure infrastructure setup
- Additional language support (Japanese, Chinese)
- Voice interface (speech-to-text)
- Mobile applications
- Advanced analytics dashboard
- Multi-tenant support
- Advanced collaboration features
- Custom model fine-tuning
- Enhanced security features
- Real-time collaborative querying
- AI-powered content recommendations
- Integration with external knowledge bases
- Advanced visualization tools
This project is licensed under the MIT License - see the LICENSE file for details.
- Project Lead: [Your Name]
- Development Team: [Team Names]
- Support: support@multilingual-kb.com
- Documentation: https://docs.multilingual-kb.com
- Azure OpenAI Documentation
- LangChain Documentation
- Azure AI Search Documentation
- React Documentation
- Documentation: Start with this README and linked guides
- Issues: Create a GitHub issue for bugs
- Discussions: Use GitHub Discussions for questions
- Email: support@multilingual-kb.com
- Microsoft Azure team for cloud services
- OpenAI for GPT models
- LangChain community for LLM orchestration tools
- All contributors to this project
Built with ❤️ using Azure AI Services
Last Updated: February 13, 2026