smart-ocr is a Node.js OCR library for:
- text-based PDFs
- scanned PDFs
- mixed PDFs with both text-native and scanned pages
- PNG and other common raster image formats
- optional AI-assisted structured output from extracted OCR text
For PDFs, each page is handled independently. If a page already contains selectable text, Smart OCR extracts it directly. If a page is image-only, it renders the page and falls back to OCR.
- Node.js
>=20.6.0
This package is designed for Node.js. It is not set up for browser use.
npm install smart-ocrimport { SmartOCR } from "smart-ocr";
import path from "node:path";
import { fileURLToPath } from "node:url";
const __filename = fileURLToPath(import.meta.url);
const __dirname = path.dirname(__filename);
const ocr = new SmartOCR({ language: "eng", workerCount: 2 });
try {
const pdfText = await ocr.processPDF(path.join(__dirname, "sample-scanned.pdf"));
console.log(pdfText);
} finally {
await ocr.terminate();
}Smart OCR can optionally turn extracted text into structured JSON.
- OCR still runs first
- the extracted text is then sent to an AI model to produce structured output
When structuredOutputOptions.ai is configured, processFile(), processPDF(), and processImage() return a JSON object instead of a plain text string.
Supported providers:
openai- uses structured outputs (response_format: json_schema)anthropic- uses tool use to enforce schema-shaped outputgemini- usesresponseMimeType: "application/json"withresponseSchema
Example (OpenAI):
import { SmartOCR } from "smart-ocr";
const ocr = new SmartOCR({
language: "eng",
structuredOutputOptions: {
ai: {
provider: "openai",
model: "gpt-4.1-mini",
apiKey: process.env.OPENAI_API_KEY,
prompt: "Extract the document fields. Use null when a value is missing or unclear.",
},
schema: {
type: "object",
properties: {
fullName: { type: ["string", "null"] },
idNumber: { type: ["string", "null"] },
dateOfBirth: { type: ["string", "null"] },
sex: { type: ["string", "null"] },
},
required: ["fullName", "idNumber", "dateOfBirth", "sex"],
additionalProperties: false,
},
},
});
try {
const result = await ocr.processFile("./id.pdf");
console.log(result);
} finally {
await ocr.terminate();
}Example (Anthropic):
const ocr = new SmartOCR({
structuredOutputOptions: {
ai: {
provider: "anthropic",
model: "claude-opus-4-5",
apiKey: process.env.ANTHROPIC_API_KEY,
},
schema: {
type: "object",
properties: {
fullName: { type: ["string", "null"] },
idNumber: { type: ["string", "null"] },
},
required: ["fullName", "idNumber"],
},
},
});Example (Gemini):
const ocr = new SmartOCR({
structuredOutputOptions: {
ai: {
provider: "gemini",
model: "gemini-2.0-flash",
apiKey: process.env.GOOGLE_API_KEY,
},
schema: {
type: "object",
properties: {
fullName: { type: ["string", "null"] },
idNumber: { type: ["string", "null"] },
},
required: ["fullName", "idNumber"],
},
},
});Notes for AI mode:
apiKeyis required for all providerspromptoverrides the default extraction instructionschemashould be a JSON schema describing the object you want back- for OpenAI strict mode,
requiredmust list every key inproperties - Gemini schemas are automatically normalized: array
typevalues (e.g.["string", "null"]) are converted tonullable: true, and unsupported fields likeadditionalPropertiesare stripped - when AI mode is enabled, the raw OCR text is not returned by these methods
Creates an OCR processor.
Options:
language: Tesseract language or language list. Default:"eng"pdfRenderScale: render scale used before OCR on scanned PDF pages. Default:2workerOptions: options passed to the Tesseract worker, such aslangPath,cachePath, orloggerworkerCount: Number of OCR workers to run in parallel.structuredOutputOptions: optional AI configuration for returning structured JSON instead of plain text
Language codes use Tesseract traineddata identifiers, not 2-letter locale codes. For example:
"eng"for English"spa"for Spanish"fra"for French["eng", "spa"]for multilingual OCR
Use "eng", not "en".
structuredOutputOptions shape:
ai.provider: AI provider name. One of"openai","anthropic", or"gemini"ai.model: model name to call for structured extractionai.apiKey: API key for the chosen providerai.prompt: optional custom extraction promptschema: JSON schema describing the expected response object. Gemini schemas are automatically normalized from JSON Schema to Gemini's OpenAPI 3.0 subset.
Routes a supported file to the correct handler based on file extension.
Returns:
- extracted text by default
- structured JSON when
structuredOutputOptions.aiis configured
Supported extensions:
.pdf.png.jpg.jpeg.tif.tiff.bmp.webp.gif
Extracts text from a PDF. Text-native pages are read directly. Scanned pages are rendered to images and OCRed.
The OCR language only affects scanned/image-only pages. If a PDF page already contains selectable text, Smart OCR returns that embedded text directly instead of re-OCRing it.
Returns:
- extracted text by default
- structured JSON when
structuredOutputOptions.aiis configured
Runs OCR on an image file.
Returns:
- extracted text by default
- structured JSON when
structuredOutputOptions.aiis configured
Eagerly initializes the Tesseract worker. This is optional because processing methods initialize on demand.
If you pass a language to init(language), Smart OCR keeps using that language for later OCR calls until you switch it again or create a new instance.
Terminates the Tesseract worker and frees resources.
- Smart OCR is optimized for Node.js workloads, not browser runtimes.
- Rendering uses
@napi-rs/canvas, which avoids the extra Cairo system setup required bycanvas. - Scanned PDFs are preprocessed before OCR so sparse content, such as ID cards on large blank pages, is easier to detect.
- Structured output is an optional post-processing step on top of OCR, not a replacement for OCR itself.
- AI mode supports OpenAI, Anthropic, and Gemini.
- OCR quality still depends on the source document quality, scan resolution, and language data.
npm run typecheck
npm run lint
npm test
npm run build
npm run samplenpm run sample builds the library and runs it against the bundled sample files in src/.
MIT