Lexora AI is a FastAPI-based document question-answering platform. Users upload PDF, TXT, Markdown, or DOCX files, Lexora extracts and chunks their text, stores embeddings in a per-user FAISS index, and answers natural-language questions using retrieval-augmented generation (RAG) with source attribution.
- A user registers and authenticates with JWT access and refresh tokens.
- The user uploads a supported document.
- The backend validates and stores the file, extracts text, splits it into chunks, embeds those chunks, and writes vectors plus metadata to FAISS.
- The user asks a question through the chat API.
- The retrieval layer embeds the query, searches the user's vector store, optionally filters by selected documents, and builds an LLM context.
- The LLM service generates either a normal response or a Server-Sent Events streaming response.
- Conversations, messages, documents, and users are persisted in the database.
| Area | Technology |
|---|---|
| API | FastAPI, Uvicorn |
| Validation/config | Pydantic v2, pydantic-settings |
| Database | SQLAlchemy 2 async ORM, PostgreSQL in production, SQLite for tests/local checks |
| Auth | JWT via python-jose, password hashing via passlib/bcrypt |
| Vector search | FAISS |
| AI orchestration | LangChain, OpenAI chat and embedding models |
| Cache | Redis async client |
| Background jobs | Celery |
| Observability | Prometheus metrics, structlog JSON logging |
| Tests | pytest, pytest-asyncio, httpx ASGI transport |
- User registration, login, refresh token flow, and authenticated
/meendpoint - Document upload and validation for
pdf,txt,md, anddocx - Text extraction and chunking utilities
- Per-user vector store isolation with FAISS persistence
- Retrieval-augmented chat with source metadata
- Streaming chat responses over SSE
- Conversation and message history
- Redis-backed retrieval cache
- Prometheus metrics endpoint at
/metrics - Health endpoint at
/healthand readiness endpoint at/ready - Docker Compose stack for PostgreSQL, Redis, app, Celery worker, and Nginx
lexoraai/
├── app/
│ ├── api/v1/ FastAPI route modules for auth, documents, and chat
│ ├── core/ Exceptions, logging, and security helpers
│ ├── models/ Pydantic request/response models
│ ├── schemas/ SQLAlchemy ORM models and DB session setup
│ ├── services/ Business logic for documents, chat, embeddings, vectors, cache, retrieval, and LLMs
│ ├── tasks/ Celery worker entry points
│ ├── utils/ Document parsing and text chunking utilities
│ ├── config.py Environment-driven settings
│ ├── deps.py FastAPI dependencies
│ └── main.py Application factory and global routes
├── alembic/ Database migration assets
├── docker/ Dockerfile, Compose stack, and Nginx config
├── scripts/ Operational helper scripts
├── tests/ Unit and integration tests
├── requirements.txt Runtime dependencies
├── requirements-dev.txt Test/lint/dev dependencies
└── pyproject.toml Tooling configuration
- Python 3.11 is recommended for the pinned dependency set.
- PostgreSQL 15+ for full application use.
- Redis 7+ for cache and Celery broker/result backend.
- OpenAI API key for real embedding and chat generation.
- Docker and Docker Compose if you want the packaged local stack.
Note: the current pinned dependencies were validated in this workspace with the existing virtual environment, but a full reinstall under Python 3.13 may require dependency upgrades because packages such as
psycopg2-binary==2.9.9may attempt a source build.
Create a local .env file from the provided template and set at least these values:
| Variable | Purpose | Example |
|---|---|---|
DATABASE_URL |
Async SQLAlchemy database URL | postgresql+asyncpg://postgres:postgres@localhost:5432/lexora |
REDIS_URL |
Redis cache URL | redis://localhost:6379/0 |
SECRET_KEY |
JWT signing key, use a strong 32+ character secret | change-this-to-a-real-secret-value |
OPENAI_API_KEY |
OpenAI API key for embeddings and chat | sk-... |
OPENAI_MODEL |
Chat model | gpt-4-turbo-preview |
OPENAI_EMBEDDING_MODEL |
Embedding model | text-embedding-3-small |
OPENAI_EMBEDDING_DIMENSIONS |
Embedding vector dimension | 1536 |
UPLOAD_DIR |
Uploaded file storage path | ./uploads |
FAISS_INDEX_PATH |
FAISS index storage path | ./data/faiss |
CELERY_BROKER_URL |
Celery broker URL | redis://localhost:6379/1 |
CELERY_RESULT_BACKEND |
Celery result backend URL | redis://localhost:6379/2 |
DOCUMENT_PROCESSING_MODE |
inline for local/dev processing or background for Celery-based processing |
inline |
CORS_ORIGINS |
Comma-separated allowed origins | http://localhost:3000,http://localhost:8000 |
Linux/macOS:
python3.11 -m venv venv
source venv/bin/activateWindows PowerShell:
py -3.11 -m venv venv
.\venv\Scripts\Activate.ps1pip install -r requirements.txt -r requirements-dev.txtdocker compose -f docker/docker-compose.yml up -d postgres rediscp .env.example .envThen edit .env with your database, Redis, secret, and OpenAI values.
uvicorn app.main:app --reloadThe API will be available at:
- API root: http://localhost:8000
- Interactive docs: http://localhost:8000/docs
- Health check: http://localhost:8000/health
- Readiness check: http://localhost:8000/ready
- Metrics: http://localhost:8000/metrics
To run the full containerized stack:
docker compose -f docker/docker-compose.yml up --buildThe Compose stack includes:
postgres: PostgreSQL databaseredis: Redis cache/brokerapp: FastAPI applicationcelery-worker: Celery worker processnginx: reverse proxy
Register a user:
curl -X POST http://localhost:8000/api/v1/auth/register \
-H "Content-Type: application/json" \
-d '{"email":"user@example.com","password":"password123","full_name":"Demo User"}'Login:
curl -X POST http://localhost:8000/api/v1/auth/login \
-H "Content-Type: application/x-www-form-urlencoded" \
-d "username=user@example.com&password=password123"Upload a document:
curl -X POST http://localhost:8000/api/v1/documents \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-F "file=@path/to/document.pdf"List documents:
curl http://localhost:8000/api/v1/documents \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN"Ask a non-streaming chat question:
curl -X POST http://localhost:8000/api/v1/chat/message \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"message":"What is this document about?"}'Ask a streaming chat question:
curl -N -X POST http://localhost:8000/api/v1/chat/stream \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"message":"Summarize the key points."}'Create a conversation:
curl -X POST http://localhost:8000/api/v1/chat/conversations \
-H "Authorization: Bearer YOUR_ACCESS_TOKEN" \
-H "Content-Type: application/json" \
-d '{"title":"Research notes"}'Run the test suite:
python -m pytestIn this workspace, the test suite currently passes:
31 passed- Coverage:
46%
Important test notes:
- Tests use an in-memory SQLite database through dependency overrides.
- Tests set local environment variables in
tests/conftest.py. - Tests do not call the real OpenAI API.
The app was started successfully with a SQLite run-check database and test settings. Startup completed, the database initialized, Redis connected, and Uvicorn served the API on 127.0.0.1:8010 before the bounded run command was stopped.
A fresh screenshot of the running app health endpoint was captured from a headless Chromium browser:
To regenerate this screenshot locally, run:
venv/Scripts/python.exe scripts/capture_readme_screenshot.py- Retrieval filtering happens inside the retrieval/vector-search path instead of filtering only source metadata after context construction. This avoids sending context from documents outside a requested document filter.
- Document processing can run inline for local development or be queued to Celery with
DOCUMENT_PROCESSING_MODE=background. - JWTs include
jtiidentifiers, logout revokes the current access token in Redis until token expiry, and refresh-token rotation blacklists the used refresh token. - Database and token timestamps use timezone-aware UTC values instead of deprecated
datetime.utcnow()calls. - Chat history is now passed to the LLM as real user/assistant turns instead of user-only messages with blank assistant responses.
- FAISS metadata stores embeddings alongside chunk metadata so index rebuilds after deletion avoid unnecessary re-embedding when possible.
- A module-level
get_file_type()wrapper exists for compatibility with the test suite and public utility-style imports. - Test settings cache clearing uses
get_settings.cache_clear(), which is the actual cached settings function. - FAISS indexes are isolated per user under
FAISS_INDEX_PATH/<user_id>/. - Redis retrieval cache keys include user ID, query hash, and document filter to prevent cross-user or cross-filter leakage.
- FAISS still rebuilds the user index on deletion. Stored embeddings reduce rebuild cost, but high-churn production deployments should still consider a deletion-friendly vector database or FAISS strategy.
- Document processing defaults to
inlinefor safer local development. SetDOCUMENT_PROCESSING_MODE=backgroundin production when the Celery worker is running. - Test coverage is improved for chat-history behavior, but document ingestion, retrieval, vector storage, cache, and full chat orchestration should still receive more unit/integration tests.
SECRET_KEYdefaults are development-only and must be overridden in production.
- Never commit real
.envsecrets or OpenAI API keys. - Use a strong production
SECRET_KEY. - Restrict
CORS_ORIGINSto trusted frontend origins. - Token revocation uses Redis, so production deployments should keep Redis highly available.
- Add rate limiting enforcement before public deployment.
MIT License
