Skip to content

josego85/pdf-content-search

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

295 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

PDF Content Search

Version PHP Symfony Elasticsearch Vue.js Ollama Docker PHP Tests JS Tests License: GPL-3.0

AI-powered PDF search with hybrid semantic capabilities using Elasticsearch 9.3 vector search and Ollama embeddings.

Features

  • 🧠 AI Hybrid Search - Combines keyword matching with semantic understanding (RRF algorithm)
  • πŸ“„ Page-level PDF search with Elasticsearch 9.3 vector search
  • πŸ” Multiple search modes: Hybrid AI, Exact match, Prefix match
  • 🌍 AI-powered PDF translation (Ollama qwen2.5:3b, ~52s/page on CPU)
  • πŸ”„ Async job processing with Symfony Messenger
  • πŸ“Š Analytics Dashboard - Real-time search metrics: trends, click position distribution, CTR, CSV/JSON export
  • πŸ“ OCR for scanned PDFs - Automatic text layer via ocrmypdf (enables search & highlighting)
  • πŸ“± Responsive Vue.js frontend with in-PDF highlighting
  • πŸ§ͺ Full test suite β€” PHPUnit (93% PHP coverage) + Vitest (89% JS/Vue coverage, 172 tests)

Quick Start

# 1. Clone and start
git clone https://github.com/josego85/pdf-content-search.git
cd pdf-content-search
make dev

# 2. Add PDFs and index them
cp your-pdfs/*.pdf public/pdfs/
docker compose -p pdf-content-search exec php php bin/console app:index-pdfs

# 3. Open: http://localhost

What make dev does automatically:

  • βœ… Installs dependencies (Composer + NPM)
  • βœ… Runs database migrations
  • βœ… Creates Elasticsearch index structure
  • βœ… Builds frontend assets

Prerequisites: Ollama must be installed and running natively on the host before make dev. See Getting Started for setup.

Note: .env is committed with safe defaults (Symfony standard).

Common Commands

make help          # Show all available commands
make dev           # Start development (http://localhost)
make prod          # Start production (http://localhost:8080)
make down          # Stop environment
make logs          # View logs (add SERVICE=php for specific service)
make shell         # Open shell in PHP container
make test          # Run PHPUnit tests (93% PHP coverage)
make status        # Show all environments status

# Translation monitoring (helper scripts)
./bin/monitor-jobs.sh --watch   # Real-time job tracking
./bin/worker-logs.sh -f         # Worker logs

# Translation monitoring (full commands)
docker compose -p pdf-content-search exec php php bin/console app:translation:monitor        # Active jobs
docker compose -p pdf-content-search exec php php bin/console app:translation:monitor --all  # All jobs
docker compose -p pdf-content-search exec php php bin/console app:translation:monitor --watch # Watch mode

Stack

  • Backend: PHP 8.4, Symfony 7.4, PostgreSQL 16
  • Search: Elasticsearch 9.3 (vector search, HNSW)
  • Frontend: Vue.js 3.5, Tailwind CSS 3.4, PDF.js 5.4, ApexCharts
  • AI: Ollama native (qwen2.5:3b translations, nomic-embed-text embeddings)
  • Queue: Symfony Messenger (3 workers)
  • Analytics: PostgreSQL 16 (metrics storage), Vue.js dashboard
  • Testing: PHPUnit (PHP), Vitest + happy-dom (JS/Vue)

Documentation

Getting Started

Features

Reference

License

Licensed under GNU General Public License v3.0.

About

πŸ” AI-powered PDF search with OCR support for scanned documents, local AI via Ollama, and in-PDF highlighting πŸ“„

Topics

Resources

License

Security policy

Stars

Watchers

Forks

Packages

 
 
 

Contributors