HunyuanOCR is a lightweight, high-performance OCR application powered by the Tencent HunyuanOCR 1B model. It supports document parsing, text detection, translation, and custom prompts. The system outputs tables as HTML, formulas as LaTeX, and text in a structured markdown format.
- Document Parsing: Extract all content from images in Markdown, with tables as HTML and formulas as LaTeX.
- Text Detection: Identify text and output coordinates.
- Translation: Extract text and translate to English.
- Custom Prompt: Flexible OCR tasks using user-defined prompts.
- Lightweight deployment using Streamlit.
- Python 3.10+
- Packages listed in
requirements.txt:
streamlit>=1.38.0
torch>=2.3.0
pillow>=10.0.0
accelerate>=0.33.0
transformers@git+https://github.com/huggingface/transformers@82a06db03535c49aa987719ed0746a76093b1ec4
- Docker (optional, for GPU deployment)
- NVIDIA GPU + CUDA 12+ (optional for Docker GPU deployment)
- Clone the repository:
git clone https://github.com/ikantkode/hunyuan-1b-ocr-app.git
cd hunyuan-ocr-app- Create and activate a virtual environment:
python -m venv venv
source venv/bin/activate # Linux / Mac
venv\Scripts\activate # Windows- Install dependencies:
pip install -r requirements.txt- Run the application:
streamlit run app.py- Load the model from the sidebar and upload an image to start OCR.
- Build the Docker image:
docker-compose build- Start the container:
docker-compose up -d- Access the app in your browser at
http://localhost:8501
Note: The Docker setup requires an NVIDIA GPU with CUDA support and the NVIDIA Container Toolkit installed.
hunyuan-ocr-app/
├── app.py # Main Streamlit app
├── requirements.txt # Python dependencies
├── Dockerfile # GPU-optimized container setup
├── docker-compose.yml # Docker Compose config
├── data/ # Stores temporary files
└── hOCR/ # (Optional) Additional project files, ignored in git
Note: The
venv/andhOCR/folders are ignored in Git and should not be pushed.
- Select the language (English or Chinese).
- Choose a task: Document Parsing, Text Detection, Translation, or Custom Prompt.
- Upload an image (PNG, JPG, JPEG, BMP, WEBP).
- Click Load Model (if not loaded) and then Run OCR.
- View results in the app and optionally download as
.txt.
This project is for personal and research use. The OCR model is provided by Tencent via Hugging Face: tencent/HunyuanOCR.