This project provides a Dockerized setup to run Qwen3-2B-VL with vLLM @ FP8 (testing on a 3060 12GB) and a Streamlit frontend for performing OCR on PDF documents.
- Upload PDF files via Streamlit UI
- Rasterizes PDF pages to images
- Sends images to Qwen3-2B-VL for text extraction
- View extracted text per page in the browser
- Download combined OCR output as a
.txtfile
- Docker & Docker Compose installed
- GPU with CUDA support (optional but recommended)
- Internet connection to download the model from Hugging Face
- vLLM API running (inside Docker)
vllmocrexp/
├── docker-compose.yml
├── streamlit/
│ ├── app.py
│ ├── requirements.txt
│ └── Dockerfile
├── models/ # optional, local model storage
└── tmp/ # temporary PDFs/images (ignored in git)
- Clone the repo
git clone https://github.com/ikantkode/qwen3-2b-ocr-app
cd qwen3-2b-ocr-app- Build and start the containers
docker compose build --no-cache && docker compose up
OR
docker compose build --no-cache && docker compose up -dqwen-vlmcontainer runs the Qwen3-2B-VL model with vLLMstreamlit-uicontainer runs the frontend at: http://localhost:8501
- Upload a PDF in the Streamlit UI
- Each page will be displayed as an image
- OCR text is extracted for each page
- Full text can be downloaded as
ocr_output.txt
To remove temporary files, Docker volumes, and images:
docker compose down -v
rm -rf tmp/MIT License.