Using docling[chunking-openai] raises import RuntimeError

I'm using docling-serve to process the document but wanted to run the chunking locally using OpenAI's tiktoken. For this, I added `docling[chunking-openai]` to my pyproject.toml
However, importing the HybridChunker still fails with:
```
RuntimeError: Extra required by module: 'chunking' by default (or 'chunking-openai' if specifically using OpenAI tokenization); to install, run: `pip install 'docling-core[chunking]'` or `pip install 'docling-core[chunking-openai]'`
```

`docling-core[chunking-openai]` does not install transformers
https://github.com/docling-project/docling-core/blob/5ab8b8c2cc2fbd32104a6cc56d89b9f6126849e1/pyproject.toml#L81-L93

But transformers is imported regardless in https://github.com/docling-project/docling-core/blob/5ab8b8c2cc2fbd32104a6cc56d89b9f6126849e1/docling_core/transforms/chunker/hybrid_chunker.py#L14-L18

	chunking-openai = [
	# common:
	'semchunk (>=2.2.0,<3.0.0)',
	'tree-sitter (>=0.23.2,<1.0.0)',
	'tree-sitter-python (>=0.23.6,<1.0.0)',
	'tree-sitter-c (>=0.23.4,<1.0.0)',
	'tree-sitter-java (>=0.23.5,<1.0.0)',
	'tree-sitter-javascript (>=0.23.1,<1.0.0)',
	'tree-sitter-typescript (>=0.23.2,<1.0.0)',

	# specific:
	'tiktoken (>=0.9.0,<0.13.0)',
	]

	try:
	import semchunk
	from transformers import PreTrainedTokenizerBase
	except ImportError:
	raise RuntimeError(

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using docling[chunking-openai] raises import RuntimeError #459

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Using docling[chunking-openai] raises import RuntimeError #459

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions