API Gateway Architecture

Overview

The API Gateway is a Node.js/Express-based microservice that serves as a proxy and management layer for multiple LLM (Large Language Model) providers. It provides unified API endpoints, token-based authentication, usage billing, and automatic failover between multiple AI service providers.

Version: 1.0.0 License: Unlicense (public domain) Runtime: Node.js 18+ Module System: ES Modules (type: "module")

System Architecture

High-Level Architecture

graph TB
    Clients[Clients]

    subgraph Gateway[API Gateway - Express]
        subgraph Controllers[Controllers Layer]
            CC[completions]
            TC[tokens]
            DC[dialogs]
            TRC[transcriptions]
            SC[speech]
            RC[referral]
        end

        subgraph Services[Services Layer]
            CS[CompletionsService]
            TS[TokensService]
            DS[DialogsService]
            RS[ReferralService]
            SMS[SystemMessageService]
        end

        subgraph Repositories[Repository Layer]
            TR[TokensRepository]
            DR[DialogsRepository]
            RR[ReferralRepository]
        end

        subgraph Database[Database Layer - LowDB/JSON]
            TDB[(tokens.json)]
            DDB[(dialogs.json)]
            REFA[(referrals.json)]
            UTB[(user_tokens.json)]
        end
    end

    subgraph Providers[External LLM Providers]
        OAI[OpenAI Original]
        DS_P[DeepSeek]
        OR[OpenRouter]
        DI[DeepInfra]
    end

    Clients --> Controllers
    Controllers --> Services
    Services --> Repositories
    Repositories --> Database
    Services --> Providers

    style Gateway fill:#0d47a1,stroke:#1976d2,stroke-width:3px,color:#fff
    style Controllers fill:#e65100,stroke:#ff6f00,stroke-width:3px,color:#fff
    style Services fill:#1b5e20,stroke:#4caf50,stroke-width:3px,color:#fff
    style Repositories fill:#4a148c,stroke:#9c27b0,stroke-width:3px,color:#fff
    style Database fill:#b71c1c,stroke:#f44336,stroke-width:3px,color:#fff
    style Providers fill:#006064,stroke:#00bcd4,stroke-width:3px,color:#fff

Directory Structure

api-gateway/
├── src/
│   ├── controllers/          # HTTP request handlers
│   │   ├── completionsController.js
│   │   ├── tokensController.js
│   │   ├── dialogsController.js
│   │   ├── transcriptionsController.js
│   │   ├── speechController.js
│   │   ├── systemMessagesController.js
│   │   └── referralController.js
│   ├── services/            # Business logic layer
│   │   ├── CompletionsService.js
│   │   ├── TokensService.js
│   │   ├── DialogsService.js
│   │   ├── ReferralService.js
│   │   ├── SystemMessageService.js
│   │   └── index.js
│   ├── repositories/        # Data access layer
│   │   ├── TokensRepository.js
│   │   ├── DialogsRepository.js
│   │   ├── ReferralRepository.js
│   │   ├── index.js
│   │   └── tests/
│   │       └── TokensRepository.test.js
│   ├── rest/               # HTTP response utilities
│   │   ├── rest.js         # Request wrapper
│   │   ├── HttpResponse.js # Standard HTTP response
│   │   ├── SSEResponse.js  # Server-Sent Events response
│   │   └── HttpException.js
│   ├── utils/              # Utility modules
│   │   ├── llmsConfig.js   # LLM provider configurations
│   │   └── dbManager.js    # Database management utilities
│   ├── db/                 # JSON database files
│   │   ├── tokens.json
│   │   ├── dialogs.json
│   │   ├── user_tokens.json
│   │   └── referrals.json
│   ├── logs/               # Application logs
│   │   └── server.*.log
│   ├── logger.js           # Pino logger configuration
│   └── server.js           # Application entry point
├── scripts/
│   ├── token-gen.js        # Token generation CLI
│   └── pull_update.js      # Update script
├── Dockerfile              # Container definition
├── package.json            # Dependencies & scripts
├── .prettierrc.js          # Code formatting config
└── README.md               # Documentation

Core Components

1. Server Entry Point (`server.js`)

Location: src/server.js

Responsibilities:

Express application initialization
Middleware configuration (CORS, body-parser with 300MB limit)
Controller routing
Global error handling
Server startup

Key Features:

Large payload support (300MB) for audio/video processing
CORS enabled for cross-origin requests
Uncaught exception and unhandled rejection handlers

2. Controllers Layer

Controllers handle HTTP requests and delegate business logic to services. All controllers use the rest() wrapper for consistent error handling.

completionsController.js

Endpoints:
- POST /v1/chat/completions - OpenAI-compatible chat completions
- POST /completions - Custom completions endpoint
Features:
- Streaming and non-streaming responses
- Token validation and balance checking
- Special handling for o1 and Claude models (convert streaming to non-streaming)
- SSE (Server-Sent Events) support
- Automatic failover between providers

tokensController.js

Endpoints:
- GET /token - Get user token
- GET /token/has - Check if user has token
- PUT /token - Update token balance
- POST /token - Regenerate token
Authentication: All endpoints require master token

dialogsController.js

Endpoints:
- DELETE /dialog - Clear dialog history
- GET /dialog - Get dialog history
Features: Maintains conversation context per user

transcriptionsController.js

Endpoints:
- POST /v1/audio/transcriptions - Audio-to-text transcription
Provider: DeepInfra (Whisper model)
Features:
- Multer for file uploads
- Token cost based on audio duration (15 tokens per second)
- Model mapping (whisper-1 → openai/whisper-large-v3-turbo)

speechController.js

Endpoints:
- POST /v1/audio/speech - Text-to-speech synthesis
Provider: OpenAI GoAPI
Features:
- Token cost calculation (0.5 tokens per character)
- Audio file streaming
- MP3 format output

referralController.js

Endpoints:
- POST /referral - Create referral relationship
- GET /referral - Get referral count
Features: Referral system for user acquisition tracking

3. Services Layer

Services contain business logic and orchestrate data access.

CompletionsService

Key Methods:

completions(params) - Main completion orchestration
tryEndpoints(params, endpoints) - Failover logic across providers
updateCompletionTokensByModel() - Token accounting
processDialogHistory() - Message processing for specific models
ensureCorrectMessageOrder() - Validates message alternation

Core Features:

Automatic Failover: Tries multiple endpoints until one succeeds
Provider Chain: Configurable fallback sequences per model
Token Conversion: Maps provider tokens to internal "energy" units
Profit Margin: Applies 50% markup on token costs
Special Processing: DeepSeek reasoner requires dialog history processing
Detailed Logging: Request tracking with unique IDs

Provider Retry Logic:

tryCompletionsConfig = {
  "gpt-4o": [
    "gpt-4o_go",      // Try GoAPI first
    "gpt-4o",         // Then official OpenAI
    "gpt-4o_guo",     // Then AiGuoGuo
    "deepseek-chat_openrouter",  // Then DeepSeek via OpenRouter
    "gpt-4o-mini_go", // Fallback to mini models
    // ... more fallbacks
  ]
}

TokensService

Key Methods:

isValidMasterToken() - Validates admin access
isAdminToken() - Validates user tokens
isHasBalanceToken() - Checks token balance
getTokenById(), getTokenByUserId() - Token retrieval
regenerateToken() - Token rotation

Security Features:

Bearer token extraction
Master token validation (from ADMIN_FIRST env var)
Balance enforcement (throws 429 on insufficient balance)
Auto-initialization to 10,000 tokens for new users

DialogsService

Key Methods:

addMessageToDialog() - Append user/assistant messages
getDialogWithSystem() - Build message array with system prompt
clearDialog() - Reset conversation history
findDialogById() - Retrieve dialog by user ID

Features:

System message injection (except for o1 models)
Per-user conversation tracking
Message history persistence

ReferralService

Key Methods:

createReferral() - Establish referral relationship
getReferralCount() - Count referrals per user

Features:

Prevents self-referrals
Token bonus distribution

4. Repository Layer

Repositories abstract data access using LowDB (JSON file storage).

TokensRepository

Data Structure:

{
  "tokens": [
    {
      "id": "abc123...",        // 32-char hex token
      "user_id": "user_123",     // User identifier
      "tokens_gpt": 10000        // Energy balance
    }
  ]
}

Methods:

generateToken() - Create new token with initial balance
getTokenById(), getTokenByUserId() - Retrieval methods
updateTokenByUserId() - Balance updates
hasUserToken() - Existence check

DialogsRepository

Data Structure:

{
  "dialogs": [
    {
      "name": "user_123",       // User ID
      "messages": [
        { "role": "system", "content": "..." },
        { "role": "user", "content": "..." },
        { "role": "assistant", "content": "..." }
      ]
    }
  ]
}

Methods:

addMessageToDialog() - Append message
clearDialog() - Reset conversation
findDialogById() - Retrieve dialog

ReferralRepository

Data Structure:

{
  "referrals": [
    {
      "referrer_id": "user_123",
      "referred_id": "user_456"
    }
  ]
}

5. REST Utilities

rest.js

Request wrapper that:

Catches exceptions and converts to HTTP errors
Handles HttpResponse objects
Handles SSEResponse objects for streaming
Provides consistent error responses

HttpResponse

Simple wrapper for standard HTTP responses:

new HttpResponse(200, { data: "..." })

SSEResponse

Handles Server-Sent Events for streaming:

new SSEResponse(async () => {
  res.write(SSEResponse.sendJSONEvent(chunk));
  res.write(SSEResponse.sendSSEEvent("[DONE]"));
  res.end();
});

HttpException

Custom error class with HTTP status codes:

throw new HttpException(401, "Invalid token");

6. LLM Configuration (`llmsConfig.js`)

Defines all supported LLM providers and models.

Supported Providers:

OpenAI Original: Official OpenAI API
OpenAI GoAPI: Alternative OpenAI gateway
OpenAI Opensource: Open-source models via custom endpoint
AiGuoGuo: Third-party OpenAI gateway
DeepSeek: DeepSeek official API
OpenRouter: Multi-provider aggregator

Model Configuration:

{
  "gpt-4o": {
    modelName: "gpt-4o",
    endpoint: openai_original,
    convertationEnergy: 1.2  // 1 token = 1.2 energy units
  }
}

Supported Models (46 total):

GPT-4o variants (official, go, guo)
GPT-4o-mini variants
GPT-3.5-turbo variants
o1-preview, o1-mini, o3-mini
Claude 3.5 Sonnet, Haiku, Opus
Claude 3.7 Sonnet, Sonnet 4
GPT-4.1, 4.1-mini, 4.1-nano (custom)
DeepSeek Chat, DeepSeek Reasoner
Meta Llama 3.1 (8B, 70B, 405B)
WizardLM-2 variants
Uncensored models

Energy Conversion Rates:

High-tier reasoning (o1-preview, Claude Opus): 0.12-0.2
Mid-tier (GPT-4o, Claude Sonnet): 0.8-1.2
Efficient (Claude Haiku, GPT-4o-mini): 2.4-20
Open-source (Llama, WizardLM): 1.7-40

7. Database Manager (`dbManager.js`)

Legacy utility for file-based database operations (mostly superseded by repositories).

Functions:

initializeFiles() - Create initial JSON files
generateUserToken(), generateAdminToken() - Token creation
addNewMessage(), addNewDialogs() - Dialog management
deleteFirstMessage() - Context window management
clearDialog() - Reset conversation
isValidAdminToken(), isValidUserToken() - Authentication

Database Files:

src/db/tokens.json - Admin tokens
src/db/user_tokens.json - User tokens
src/db/dialogs.json - Conversation history
src/db/referrals.json - Referral relationships

8. Logger (`logger.js`)

Library: Pino + pino-roll

Configuration:

Dual output: console (STDOUT) and file
Daily rotation with 7-day retention
20MB size limit per file
Format: server.YYYY-MM-DD.N.log
Directory: src/logs/

Features:

High-performance JSON logging
Automatic log rotation
Structured logging support

API Endpoints

Completions

`POST /v1/chat/completions`

OpenAI-compatible chat completions endpoint.

Authentication: Bearer token (admin token)

Request Body:

{
  "model": "gpt-4o",
  "messages": [
    { "role": "system", "content": "You are a helpful assistant." },
    { "role": "user", "content": "Hello!" }
  ],
  "stream": false,
  "temperature": 0.7
}

Response (non-streaming):

{
  "id": "chatcmpl-123",
  "object": "chat.completion",
  "created": 1677652288,
  "model": "gpt-4o",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": "Hello! How can I help you today?"
    },
    "finish_reason": "stop"
  }],
  "usage": {
    "prompt_tokens": 12,
    "completion_tokens": 9,
    "total_tokens": 21,
    "energy": 15.75  // Custom field
  }
}

Response (streaming): Server-Sent Events stream:

data: {"id":"chatcmpl-123","choices":[{"delta":{"role":"assistant"},"index":0}],...}

data: {"id":"chatcmpl-123","choices":[{"delta":{"content":"Hello"},"index":0}],...}

data: {"id":"chatcmpl-123","choices":[{"delta":{},"finish_reason":"stop","index":0}],...}

data: {"id":"chatcmpl-123","usage":{"total_tokens":21,"energy":15.75},...}

data: [DONE]

Special Behavior:

o1/Claude models: Always converted to non-streaming internally
Automatic provider failover on errors
Token balance deducted after successful completion

`POST /completions`

Custom completions endpoint with dialog management.

Authentication: Query parameter ?masterToken=...

Request Body:

{
  "model": "gpt-4o",
  "content": "What is the weather?",
  "systemMessage": "You are a weather assistant.",
  "userId": "user_123"
}

Features:

Automatic dialog history management
System message injection
User token lookup
Response added to dialog automatically

Tokens

`GET /token?masterToken=...&userId=...`

Retrieve user token information.

Response:

{
  "id": "abc123...",
  "user_id": "user_123",
  "tokens_gpt": 8500
}

`GET /token/has?masterToken=...&userId=...`

Check if user has a token.

Response:

{
  "hasUser": true
}

`PUT /token?masterToken=...&userId=...`

Update token balance.

Request Body:

{
  "operation": "add",  // or "subtract"
  "amount": 1000
}

`POST /token?masterToken=...&userId=...`

Regenerate user token (security measure).

Dialogs

`DELETE /dialog?masterToken=...&userId=...`

Clear user's conversation history.

Response:

{
  "cleared": true
}

`GET /dialog?masterToken=...&userId=...`

Retrieve user's dialog history.

Response:

{
  "name": "user_123",
  "messages": [
    { "role": "user", "content": "Hello" },
    { "role": "assistant", "content": "Hi there!" }
  ]
}

Audio

`POST /v1/audio/transcriptions`

Transcribe audio to text (OpenAI Whisper-compatible).

Authentication: Bearer token

Request: multipart/form-data

file: Audio file (mp3, wav, etc.)
model: "whisper-1"
language: "en" (optional)

Response:

{
  "text": "Hello, this is a transcription.",
  "duration": 3.5,
  "language": "en"
}

Pricing: 15 tokens per second of audio (rounded up)

`POST /v1/audio/speech`

Generate speech from text (OpenAI TTS-compatible).

Authentication: Bearer token

Request Body:

{
  "model": "tts-1",
  "input": "Hello world",
  "voice": "alloy"
}

Response: Binary audio stream (audio/mpeg)

Headers:

Content-Type: audio/mpeg
Content-Disposition: attachment; filename="speech.mp3"
X-Token-Cost: 6 (cost in tokens)

Pricing: 0.5 tokens per character

Referrals

`POST /referral?masterToken=...`

Create referral relationship.

Request Body:

{
  "referrerId": "user_123",
  "referredId": "user_456"
}

`GET /referral?masterToken=...&userId=...`

Get referral count for user.

Response:

{
  "count": 5
}

Data Flow

1. Chat Completion Flow

sequenceDiagram
    participant Client
    participant Controller as completionsController
    participant TokenService as TokensService
    participant CompService as CompletionsService
    participant Provider1 as Provider 1<br/>(gpt-4o_go)
    participant Provider2 as Provider 2<br/>(gpt-4o)
    participant Repository as TokensRepository

    Client->>Controller: POST /v1/chat/completions
    Controller->>TokenService: isAdminToken()
    Controller->>TokenService: isHasBalanceToken()
    Controller->>CompService: completions(params)
    CompService->>CompService: tryEndpoints(params, modelChain)

    CompService->>Provider1: Try provider 1
    alt Provider 1 Success
        Provider1-->>CompService: Response
    else Provider 1 Failure
        Provider1-->>CompService: Error
        CompService->>Provider2: Try provider 2
        alt Provider 2 Success
            Provider2-->>CompService: Response
        else Provider 2 Failure
            Provider2-->>CompService: Error
            Note over CompService: Continue chain...<br/>or throw if all fail
        end
    end

    CompService->>CompService: updateCompletionTokensByModel()
    Note over CompService: Calculate energy:<br/>tokens / rate * (1 + margin)
    CompService->>Repository: updateTokenByUserId()<br/>(Deduct balance)
    CompService-->>Controller: Response
    Controller-->>Client: Return response

2. Dialog-based Completion Flow

sequenceDiagram
    participant Client
    participant Controller as completionsController
    participant TokenService as TokensService
    participant DialogService as DialogsService
    participant DialogRepo as DialogsRepository
    participant CompService as CompletionsService

    Client->>Controller: POST /completions
    Controller->>TokenService: isValidMasterToken()
    Controller->>TokenService: getTokenByUserId(userId)
    TokenService-->>Controller: token object

    Controller->>DialogService: addMessageToDialog(userId, content)
    DialogService->>DialogRepo: addMessageToDialog()
    DialogRepo-->>DialogService: saved

    Controller->>DialogService: getDialogWithSystem(userId, systemMessage)
    Note over DialogService: Build messages array:<br/>[system, ...history]
    DialogService-->>Controller: messages[]

    Controller->>CompService: completions({model, messages})
    Note over CompService: Same failover flow<br/>as previous diagram
    CompService-->>Controller: completion

    Controller->>DialogService: addMessageToDialog(userId, aiResponse)
    DialogService->>DialogRepo: addMessageToDialog()

    Controller-->>Client: Return response

3. Token Management Flow

flowchart TD
    A[New User Request] --> B[TokensService.getTokenByUserId]
    B --> C[TokensRepository.getTokenByUserId]
    C --> D{Token Found?}
    D -->|Yes| E[Return existing token]
    D -->|No| F[TokensRepository.generateToken]
    F --> G["Generate crypto-random
    32-char hex ID"]
    G --> H["Create token object:
    id, user_id, tokens_gpt: 10000"]
    H --> I[Save to tokens.json]
    I --> J[Return token object]

    style D fill:#e65100,stroke:#ff6f00,stroke-width:2px,color:#fff
    style F fill:#1b5e20,stroke:#4caf50,stroke-width:2px,color:#fff
    style G fill:#01579b,stroke:#0277bd,stroke-width:2px,color:#fff

Configuration

Environment Variables

Required:

PORT - Server port (default: 8088)
ADMIN_FIRST - Master token for admin operations

OpenAI Original:

OPENAI_ORIGINAL_API_KEY
OPENAI_ORIGINAL_BASE_URL

OpenAI GoAPI:

OPENAI_API_KEY
OPENAI_BASE_URL

OpenAI Opensource:

FREE_OPENAI_KEY
FREE_OPENAI_BASE_URL

AiGuoGuo:

AIGUOGUO_API_KEY
AIGUOGUO_BASE_URL

DeepSeek:

DEEPSEEK_API_KEY
DEEPSEEK_BASE_URL

OpenRouter:

OPENROUTER_API_KEY
OPENROUTER_BASE_URL

GraphQL (Legacy):

GQL_URN
GQL_SSL
GQL_TOKEN
SPACE_ID_ARGUMENT

Deployment

Docker

Dockerfile (Node 18-based):

FROM node:18
WORKDIR /usr/src/app
COPY package*.json ./
RUN npm install
COPY . .
EXPOSE 8088
CMD ["npm", "start"]

Build:

docker build -t api-gateway .

Run:

docker run -d \
  -p 8088:8088 \
  -v ./src/db:/usr/src/app/src/db \
  -e PORT=8088 \
  -e ADMIN_FIRST=your_master_token \
  -e OPENAI_API_KEY=sk-... \
  --name api-gateway \
  api-gateway

Docker Compose

version: '3.8'
services:
  api-gateway:
    build: .
    container_name: api-gateway
    ports:
      - "8088:8088"
    volumes:
      - ./src/db:/usr/src/app/src/db
    environment:
      - PORT=8088
      - ADMIN_FIRST=${ADMIN_FIRST}
      - OPENAI_API_KEY=${OPENAI_API_KEY}
      - OPENAI_BASE_URL=${OPENAI_BASE_URL}
      # ... other env vars
    restart: unless-stopped

Token Economy

Energy Conversion System

The gateway uses an internal "energy" currency to normalize costs across providers.

Formula:

energy = (provider_tokens / conversion_rate) * (1 + profit_margin)

Example (GPT-4o):

Provider tokens: 1000
Conversion rate: 1.2
Profit margin: 50%
Energy cost: (1000 / 1.2) * 1.5 = 1250 energy units

Conversion Rates by Tier:

Premium Reasoning (0.12-0.2): o1-preview, Claude Opus
High-Performance (0.8-1.2): GPT-4o, Claude Sonnet, DeepSeek Reasoner
Efficient (2.4-20): GPT-4o-mini, Claude Haiku, o1-mini
Open Source (1.7-40): Llama 3.1, WizardLM-2

Special Cases:

Audio transcription: 15 energy/second
Text-to-speech: 0.5 energy/character
Bonus account (user_id "666"): Free tier if balance > 100k

Error Handling

HTTP Status Codes

200 - Success
400 - Bad request (missing parameters)
401 - Unauthorized (invalid token)
429 - Insufficient balance
500 - Server error / All providers unavailable

Error Response Format

{
  "error": "Error message",
  "message": "Detailed description",
  "status": "error"
}

Failover Behavior

When a provider fails:

Log error with details (provider, model, error type)
Try next provider in chain
If all providers fail, return aggregated error:

{
  "error": "All providers unavailable: Rate limit exceeded, API key expired, ...",
  "message": "...",
  "status": "error"
}

Global Error Handlers

process.on('uncaughtException', (err) => {
  console.error('[UNCAUGHT EXCEPTION]', err);
});

process.on('unhandledRejection', (reason, promise) => {
  console.error('[UNHANDLED REJECTION]', reason);
});

Logging

Log Levels

Pino supports standard log levels, but the codebase primarily uses console.log() for compatibility.

Log Format

Request Tracking:

[abc123] 🚀 Starting connection attempts to providers for model gpt-4o
[abc123] 📋 Available endpoints: [gpt-4o_go, gpt-4o, ...]
[abc123] 🔄 Attempt 1/5: gpt-4o_go → gpt-4o (api.goapi.ai)
[abc123] 💬 User message: "Hello world..."
[abc123] ✅ Successful response from api.goapi.ai in 1234ms

Token Operations:

[ запрос на токен abc123... ]
[ проверка админ токена abc123... пройдена ]
[ проверка баланса у пользователя abc123... проверка пройдена. баланс: 8500]

Dialog Operations:

[ добавление нового сообщения в диалог user_123 ]
[ новое сообщение добавлено в диалог user_123 ]

Log Rotation

Daily rotation at midnight
7-day retention (automatic deletion of older logs)
20MB size limit per file
Location: src/logs/server.YYYY-MM-DD.N.log

Security Considerations

Authentication

Master Token (ADMIN_FIRST env var):
- Required for all admin operations
- Token management endpoints
- Dialog management endpoints
- Referral operations
User Tokens:
- 32-character hex strings (crypto-random)
- Used as Bearer tokens
- Validated against tokens.json
- Can be regenerated for security

Token Rotation

# Using CLI
node scripts/token-gen.js --type admin --token 15000

# Via API
POST /token?masterToken=...&userId=user_123

Rate Limiting

Balance-based throttling (no rate limits, but balance must be positive)
429 status when balance insufficient
No request-per-second limits (relies on balance depletion)

Data Storage

JSON files are stored in src/db/
No encryption at rest (consider encrypting volume in production)
No PII validation (consider adding GDPR compliance)
Single-threaded writes (LowDB handles concurrency)

CORS

Wide open: cors() middleware with default settings
Allows all origins (consider restricting in production)

Performance Characteristics

Concurrency

Single process (no clustering by default)
Async I/O (non-blocking via async/await)
Streaming support for long completions
LowDB locking (JSON file access is serialized)

Timeouts

Completions: 50-180 seconds (varies by provider)
TTS: 30 seconds
Transcription: Default fetch timeout
Body parser: No timeout (300MB limit)

Resource Limits

Body size: 300MB (for large audio files)
Concurrent connections: Limited by Node.js/OS
Database: In-memory + file-persisted (scales to ~10k users)

Optimization Opportunities

Database: Migrate to links-notation (file-based human-readable) or link-cli (binary file database with high efficiency) for > 10k users
Caching: Add Redis for token lookups
Clustering: Use PM2 or k8s for horizontal scaling
Rate Limiting: Add Redis-backed rate limiter
Queue System: Offload long completions to queue (Bull/BullMQ)

Testing

Test Files

src/repositories/tests/TokensRepository.test.js

Test Framework

Vitest (configured in package.json)
Babel + Jest (legacy support)

Running Tests

npm run vitest

Test Coverage

Currently minimal - only TokensRepository has tests.

Recommended additions:

Unit tests for all services
Integration tests for controllers
E2E tests for critical flows
Load testing for failover logic

Development

Scripts

# Start server
npm start

# Run tests
npm run vitest

# Format code
npm run format

# Generate admin token
npm run generate-token -- --type admin --token 15000

# Generate user token
npm run generate-token -- --type user --userName john --token 5000

# Pull updates (custom script)
npm run pull_update

Code Style

Prettier configured (.prettierrc.js)
ES Modules (import/export syntax)
No semicolons (Prettier default)
2-space indentation

Adding a New Model

Add provider client in llmsConfig.js:

const newProvider = new OpenAI({
  apiKey: process.env.NEW_PROVIDER_KEY,
  baseURL: process.env.NEW_PROVIDER_URL,
  timeout: 60000
});

Add model config:

"new-model": {
  modelName: "actual-model-name",
  endpoint: newProvider,
  convertationEnergy: 2.0
}

Add to retry chain in tryCompletionsConfig:

"new-model": [
  "new-model",
  "gpt-4o",  // Fallback
  // ...
]

Adding a New Endpoint

Create controller in src/controllers/:

import express from "express";
import { rest } from "../rest/rest.js";
import { HttpResponse } from "../rest/HttpResponse.js";

const myController = express.Router();

myController.get("/my-endpoint", rest(async ({ req }) => {
  // Logic here
  return new HttpResponse(200, { data: "..." });
}));

export default myController;

import myController from "./controllers/myController.js";
app.use("/", myController);

Dependencies

Production

express (4.18.1) - Web framework
body-parser (1.20.0) - Request parsing
cors (2.8.5) - CORS middleware
openai (4.63.0) - OpenAI SDK (also works for compatible APIs)
axios (0.27.2) - HTTP client
lowdb (7.0.1) - JSON database
pino (9.6.0) - Logger
pino-roll (3.0.0) - Log rotation
multer (1.4.5-lts.1) - File upload handling
form-data (4.0.0) - Multipart form construction
node-fetch (3.3.2) - Fetch API
uuid (10.0.0) - UUID generation
yargs (17.5.1) - CLI argument parsing
cron (3.1.7) - Scheduled tasks
dotenv (16.0.1) - Environment variables
deepinfra (2.0.2) - DeepInfra SDK
@google/generative-ai (0.20.0) - Google AI SDK (unused?)
stream-mime-type (2.0.0) - MIME detection

Development

prettier (3.3.3) - Code formatter
vitest (2.1.3) - Test runner
jest (29.7.0) - Test framework
babel-jest (29.7.0) - Babel integration
@babel/preset-env (7.25.8) - Babel presets
babel-plugin-transform-import-meta (2.2.1) - Import meta transform

Known Issues & TODOs

TODOs in Code

completionsController.js:11 - "рефакторинг" (refactoring needed)
Token cost calculation - Hardcoded values for TTS/transcription
Error messages - Mix of Russian and English
GraphQL integration - Appears unused (GQL env vars present)
dbManager.js - Legacy code still in use, should migrate fully to repositories

Potential Issues

Concurrency: JSON file writes may conflict under high load
Error Leakage: Some provider errors exposed to clients
No Request IDs: Difficult to trace requests across logs (partially implemented)
Hardcoded Profit Margin: 50% markup is not configurable
No Health Checks: No /health endpoint for monitoring
Large Payload Risk: 300MB limit may cause memory issues

Recommended Improvements

Add proper health/readiness endpoints
Implement request ID middleware
Migrate to PostgreSQL for production
Add Prometheus metrics
Implement circuit breaker pattern for providers
Add comprehensive test coverage
Standardize logging (remove console.log, use Pino everywhere)
Add API versioning strategy
Implement proper API documentation (OpenAPI/Swagger)
Add monitoring/alerting for provider failures

Troubleshooting

Common Issues

1. "Невалидный мастер токен" (Invalid master token)

Check ADMIN_FIRST environment variable
Ensure masterToken query parameter matches

2. "Не хватает баланса" (Insufficient balance)

Check user token balance: GET /token?masterToken=...&userId=...
Top up balance: PUT /token with operation="add"

3. All providers unavailable

Check provider API keys in environment variables
Check network connectivity to provider endpoints
Review logs for specific provider errors
Verify provider account status/quotas

4. Logs not rotating

Check src/logs/ directory exists and is writable
Verify pino-roll configuration
Check disk space

5. Audio transcription failing

Verify file format (mp3, wav, m4a supported)
Check file size (under 300MB)
Ensure DeepInfra API key is valid

Glossary

Energy: Internal currency unit (normalized from provider tokens)
Conversion Rate: Ratio of provider tokens to energy units
Profit Margin: Markup applied to energy costs (default 50%)
Master Token: Admin authentication token
User Token: Bearer token for API access
Dialog: Conversation history per user
Completion: LLM response generation
SSE: Server-Sent Events (streaming protocol)
Failover: Automatic provider switching on errors
Provider Chain: Ordered list of providers to try

License

This software is released into the public domain under the Unlicense.

Important: The Unlicense is NOT the same as having no license or being "unlicensed". The Unlicense is a specific public domain dedication that explicitly grants everyone the freedom to use, modify, and distribute this software without restrictions.

Anyone is free to copy, modify, publish, use, compile, sell, or distribute this software, either in source code form or as a compiled binary, for any purpose, commercial or non-commercial, and by any means.

For the full license text, see the LICENSE file or visit unlicense.org.

Contributors

Deep.Assistant Team

Architecture Diagram (Detailed)

graph TB
    subgraph ClientLayer[Client Layer]
        WebApp[Web App]
        MobileApp[Mobile App]
        CLI[CLI Tool]
        ExternalAPI[External API]
    end

    subgraph Middleware[Express.js Middleware]
        CORS[CORS - allow all]
        BodyParser[body-parser - 300MB limit]
        Logger[Logger - Pino]
    end

    subgraph Controllers[Controllers Layer]
        CompController[Completions Controller]
        TokenController[Tokens Controller]
        DialogController[Dialogs Controller]
    end

    subgraph RestWrapper[rest - Wrapper]
        ExceptionHandler[Exception handling]
        ResponseRouter[HttpResponse/SSEResponse routing]
        ErrorFormatter[Consistent error formatting]
    end

    subgraph Services[Services Layer]
        CompService["CompletionsService
        • tryEndpoints
        • completions
        • updateTokens"]
        TokenService["TokensService
        • isAdminToken
        • hasBalance
        • getToken"]
        DialogService["DialogsService
        • getDialog
        • addMessage
        • clearDialog"]
    end

    subgraph Repositories[Repository Layer]
        TokenRepo[Tokens Repository]
        DialogRepo[Dialogs Repository]
    end

    subgraph Database[LowDB - JSON Database]
        TokensDB[("tokens.json
        tokens: []")]
        DialogsDB[("dialogs.json
        dialogs: []")]
        FileSystem[File System: src/db/]
    end

    subgraph FailoverChain[LLM Provider Failover Chain]
        P1[Provider 1: gpt-4o_go]
        P2[Provider 2: gpt-4o]
        P3[Provider 3: gpt-4o_guo]
        P4[Provider 4: deepseek]
    end

    subgraph ExternalProviders[External API Providers]
        OpenAI[OpenAI Official]
        DeepSeek[DeepSeek]
        OpenRouter[OpenRouter]
        DeepInfra[DeepInfra]
    end

    ClientLayer -->|HTTP/HTTPS REST| Middleware
    Middleware --> Controllers
    Controllers --> RestWrapper
    RestWrapper --> Services
    Services --> Repositories
    Repositories --> Database
    Database -.->|File I/O| FileSystem

    Services --> FailoverChain
    P1 -->|Fail| P2
    P2 -->|Fail| P3
    P3 -->|Fail| P4
    P4 -->|Success| Services

    FailoverChain --> ExternalProviders
    TokenService -.->|uses| CompService

    style ClientLayer fill:#01579b,stroke:#0277bd,stroke-width:3px,color:#fff
    style Middleware fill:#e65100,stroke:#ff6f00,stroke-width:3px,color:#fff
    style Controllers fill:#1b5e20,stroke:#4caf50,stroke-width:3px,color:#fff
    style RestWrapper fill:#880e4f,stroke:#c2185b,stroke-width:3px,color:#fff
    style Services fill:#4a148c,stroke:#9c27b0,stroke-width:3px,color:#fff
    style Repositories fill:#004d40,stroke:#00897b,stroke-width:3px,color:#fff
    style Database fill:#b71c1c,stroke:#f44336,stroke-width:3px,color:#fff
    style FailoverChain fill:#e65100,stroke:#ff9800,stroke-width:3px,color:#fff
    style ExternalProviders fill:#33691e,stroke:#8bc34a,stroke-width:3px,color:#fff

Document generated: 2025-10-25 Version: 1.0.0

FilesExpand file tree

ARCHITECTURE.md

Latest commit

History

ARCHITECTURE.md

File metadata and controls

API Gateway Architecture

Overview

System Architecture

High-Level Architecture

Directory Structure

Core Components

1. Server Entry Point (server.js)

2. Controllers Layer

completionsController.js

tokensController.js

dialogsController.js

transcriptionsController.js

speechController.js

referralController.js

3. Services Layer

CompletionsService

TokensService

DialogsService

ReferralService

4. Repository Layer

TokensRepository

DialogsRepository

ReferralRepository

5. REST Utilities

rest.js

HttpResponse

SSEResponse

HttpException

6. LLM Configuration (llmsConfig.js)

7. Database Manager (dbManager.js)

8. Logger (logger.js)

API Endpoints

Completions

POST /v1/chat/completions

POST /completions

Tokens

GET /token?masterToken=...&userId=...

GET /token/has?masterToken=...&userId=...

PUT /token?masterToken=...&userId=...

POST /token?masterToken=...&userId=...

Dialogs

DELETE /dialog?masterToken=...&userId=...

GET /dialog?masterToken=...&userId=...

Audio

POST /v1/audio/transcriptions

POST /v1/audio/speech

Referrals

POST /referral?masterToken=...

GET /referral?masterToken=...&userId=...

Data Flow

1. Chat Completion Flow

2. Dialog-based Completion Flow

3. Token Management Flow

Configuration

Environment Variables

Deployment

Docker

Docker Compose

Token Economy

Energy Conversion System

Error Handling

HTTP Status Codes

Error Response Format

Failover Behavior

Global Error Handlers

Logging

Log Levels

Log Format

Log Rotation

Security Considerations

Authentication

Token Rotation

Rate Limiting

Data Storage

1. Server Entry Point (`server.js`)

6. LLM Configuration (`llmsConfig.js`)

7. Database Manager (`dbManager.js`)

8. Logger (`logger.js`)

`POST /v1/chat/completions`

`POST /completions`

`GET /token?masterToken=...&userId=...`

`GET /token/has?masterToken=...&userId=...`

`PUT /token?masterToken=...&userId=...`

`POST /token?masterToken=...&userId=...`

`DELETE /dialog?masterToken=...&userId=...`

`GET /dialog?masterToken=...&userId=...`

`POST /v1/audio/transcriptions`

`POST /v1/audio/speech`

`POST /referral?masterToken=...`

`GET /referral?masterToken=...&userId=...`