AI-powered PDF search with hybrid semantic capabilities using Elasticsearch 9.3 vector search and Ollama embeddings.
- π§ AI Hybrid Search - Combines keyword matching with semantic understanding (RRF algorithm)
- π Page-level PDF search with Elasticsearch 9.3 vector search
- π Multiple search modes: Hybrid AI, Exact match, Prefix match
- π AI-powered PDF translation (Ollama qwen2.5:3b, ~52s/page on CPU)
- π Async job processing with Symfony Messenger
- π Analytics Dashboard - Real-time search metrics: trends, click position distribution, CTR, CSV/JSON export
- π OCR for scanned PDFs - Automatic text layer via
ocrmypdf(enables search & highlighting) - π± Responsive Vue.js frontend with in-PDF highlighting
- π§ͺ Full test suite β PHPUnit (93% PHP coverage) + Vitest (89% JS/Vue coverage, 172 tests)
# 1. Clone and start
git clone https://github.com/josego85/pdf-content-search.git
cd pdf-content-search
make dev
# 2. Add PDFs and index them
cp your-pdfs/*.pdf public/pdfs/
docker compose -p pdf-content-search exec php php bin/console app:index-pdfs
# 3. Open: http://localhostWhat make dev does automatically:
- β Installs dependencies (Composer + NPM)
- β Runs database migrations
- β Creates Elasticsearch index structure
- β Builds frontend assets
Prerequisites: Ollama must be installed and running natively on the host before
make dev. See Getting Started for setup.Note:
.envis committed with safe defaults (Symfony standard).
make help # Show all available commands
make dev # Start development (http://localhost)
make prod # Start production (http://localhost:8080)
make down # Stop environment
make logs # View logs (add SERVICE=php for specific service)
make shell # Open shell in PHP container
make test # Run PHPUnit tests (93% PHP coverage)
make status # Show all environments status
# Translation monitoring (helper scripts)
./bin/monitor-jobs.sh --watch # Real-time job tracking
./bin/worker-logs.sh -f # Worker logs
# Translation monitoring (full commands)
docker compose -p pdf-content-search exec php php bin/console app:translation:monitor # Active jobs
docker compose -p pdf-content-search exec php php bin/console app:translation:monitor --all # All jobs
docker compose -p pdf-content-search exec php php bin/console app:translation:monitor --watch # Watch mode- Backend: PHP 8.4, Symfony 7.4, PostgreSQL 16
- Search: Elasticsearch 9.3 (vector search, HNSW)
- Frontend: Vue.js 3.5, Tailwind CSS 3.4, PDF.js 5.4, ApexCharts
- AI: Ollama native (qwen2.5:3b translations, nomic-embed-text embeddings)
- Queue: Symfony Messenger (3 workers)
- Analytics: PostgreSQL 16 (metrics storage), Vue.js dashboard
- Testing: PHPUnit (PHP), Vitest + happy-dom (JS/Vue)
- Getting Started - Complete setup in 5 minutes
- Configuration - Environment variables & advanced settings
- Production - Deploy, optimization & security
- Testing - PHPUnit (93% PHP) + Vitest (89% JS/Vue, 172 tests)
- Troubleshooting - Common issues & solutions
- Analytics Dashboard - Search metrics & KPIs (http://localhost/analytics)
- REST API - API reference & endpoints
- PDF Translation - Ollama translation & job tracking
- Frontend Architecture - Webpack, Vue.js, Tailwind
Licensed under GNU General Public License v3.0.