I'm using docling-serve to process the document but wanted to run the chunking locally using OpenAI's tiktoken. For this, I added docling[chunking-openai] to my pyproject.toml
However, importing the HybridChunker still fails with:
RuntimeError: Extra required by module: 'chunking' by default (or 'chunking-openai' if specifically using OpenAI tokenization); to install, run: `pip install 'docling-core[chunking]'` or `pip install 'docling-core[chunking-openai]'`
docling-core[chunking-openai] does not install transformers
|
chunking-openai = [ |
|
# common: |
|
'semchunk (>=2.2.0,<3.0.0)', |
|
'tree-sitter (>=0.23.2,<1.0.0)', |
|
'tree-sitter-python (>=0.23.6,<1.0.0)', |
|
'tree-sitter-c (>=0.23.4,<1.0.0)', |
|
'tree-sitter-java (>=0.23.5,<1.0.0)', |
|
'tree-sitter-javascript (>=0.23.1,<1.0.0)', |
|
'tree-sitter-typescript (>=0.23.2,<1.0.0)', |
|
|
|
# specific: |
|
'tiktoken (>=0.9.0,<0.13.0)', |
|
] |
But transformers is imported regardless in
|
try: |
|
import semchunk |
|
from transformers import PreTrainedTokenizerBase |
|
except ImportError: |
|
raise RuntimeError( |
I'm using docling-serve to process the document but wanted to run the chunking locally using OpenAI's tiktoken. For this, I added
docling[chunking-openai]to my pyproject.tomlHowever, importing the HybridChunker still fails with:
docling-core[chunking-openai]does not install transformersdocling-core/pyproject.toml
Lines 81 to 93 in 5ab8b8c
But transformers is imported regardless in
docling-core/docling_core/transforms/chunker/hybrid_chunker.py
Lines 14 to 18 in 5ab8b8c