HunyuanOCR – Tencent 1B OCR Expert

HunyuanOCR is a lightweight, high-performance OCR application powered by the Tencent HunyuanOCR 1B model. It supports document parsing, text detection, translation, and custom prompts. The system outputs tables as HTML, formulas as LaTeX, and text in a structured markdown format.

Features

Document Parsing: Extract all content from images in Markdown, with tables as HTML and formulas as LaTeX.
Text Detection: Identify text and output coordinates.
Translation: Extract text and translate to English.
Custom Prompt: Flexible OCR tasks using user-defined prompts.
Lightweight deployment using Streamlit.

Requirements

Python 3.10+
Packages listed in requirements.txt:

streamlit>=1.38.0
torch>=2.3.0
pillow>=10.0.0
accelerate>=0.33.0
transformers@git+https://github.com/huggingface/transformers@82a06db03535c49aa987719ed0746a76093b1ec4

Docker (optional, for GPU deployment)
NVIDIA GPU + CUDA 12+ (optional for Docker GPU deployment)

Setup & Deployment

Local Python Deployment

Clone the repository:

git clone https://github.com/ikantkode/hunyuan-1b-ocr-app.git
cd hunyuan-ocr-app

Create and activate a virtual environment:

python -m venv venv
source venv/bin/activate  # Linux / Mac
venv\Scripts\activate     # Windows

Install dependencies:

pip install -r requirements.txt

Run the application:

streamlit run app.py

Load the model from the sidebar and upload an image to start OCR.

Docker Deployment (GPU Optimized)

Build the Docker image:

docker-compose build

Start the container:

docker-compose up -d

Access the app in your browser at http://localhost:8501

Note: The Docker setup requires an NVIDIA GPU with CUDA support and the NVIDIA Container Toolkit installed.

Folder Structure

hunyuan-ocr-app/
├── app.py                # Main Streamlit app
├── requirements.txt      # Python dependencies
├── Dockerfile            # GPU-optimized container setup
├── docker-compose.yml    # Docker Compose config
├── data/                 # Stores temporary files
└── hOCR/                 # (Optional) Additional project files, ignored in git

Note: The venv/ and hOCR/ folders are ignored in Git and should not be pushed.

Usage

Select the language (English or Chinese).
Choose a task: Document Parsing, Text Detection, Translation, or Custom Prompt.
Upload an image (PNG, JPG, JPEG, BMP, WEBP).
Click Load Model (if not loaded) and then Run OCR.
View results in the app and optionally download as .txt.

License

This project is for personal and research use. The OCR model is provided by Tencent via Hugging Face: tencent/HunyuanOCR.

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
tests		tests
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
app.py		app.py
docker-compose.yml		docker-compose.yml
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

HunyuanOCR – Tencent 1B OCR Expert

Features

Requirements

Setup & Deployment

Local Python Deployment

Docker Deployment (GPU Optimized)

Folder Structure

Usage

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

HunyuanOCR – Tencent 1B OCR Expert

Features

Requirements

Setup & Deployment

Local Python Deployment

Docker Deployment (GPU Optimized)

Folder Structure

Usage

License

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages