In document formats like Microsoft Word (.docx), comments can be anchored to specific text ranges within the document. The current data model supports storing comments in the ContentLayer.NOTES layer within GroupLabel.COMMENT_SECTION groups, but there is no mechanism to link a comment back to the specific DocItem(s) it annotates.
This linking capability is important for:
- Round-trip fidelity - reconstructing the original document structure
- Semantic understanding - knowing which content a comment refers to
- LLM/RAG applications - providing context about what text a comment is discussing
- Document analysis - tracing review/collaboration trails
In docling PR #2834 basic Word document comment extraction was implemented. The next step is enabling comment-to-content linking, which requires data model changes in docling-core first.
In document formats like Microsoft Word (.docx), comments can be anchored to specific text ranges within the document. The current data model supports storing comments in the
ContentLayer.NOTESlayer withinGroupLabel.COMMENT_SECTIONgroups, but there is no mechanism to link a comment back to the specific DocItem(s) it annotates.This linking capability is important for:
In docling PR #2834 basic Word document comment extraction was implemented. The next step is enabling comment-to-content linking, which requires data model changes in docling-core first.