Skip to content

[Question] About modeling document-chunk-entity graph in RAG use case #1616

@ggyuchive

Description

@ggyuchive

Do you need to ask a question?

  • I have searched the existing question and discussions and this question is not already answered.
  • I believe this is a legitimate question, not just a bug or feature request.

Your Question

Hi LightRAG team,

We’re working on ontology-based RAG project using supply-chain data and official USTR documents. Now using the LightRAG as a base and relying heavily on a graph engine to model relationships between document and domain, connected by event nodes which hold date information.

We have some questions about best practices for graph modeling for document:

1. About using chunk_id as source_id instead of document_id

In LightRAG’s example(example/insert_custom_kg.py), each chunk is assigned a unique source_id. We’re currently following this, but we wonder if this is ideal for our case.

Since our downstream queries and reasoning are often document-level (e.g., linking supply-chain events to official documents), would it make more sense to assign the source_id based on the document instead of each chunk?

2. Use existing entity node or create new one?

Now we extract entities and relationships from each chunk of a USTR document. Naturally, the same name of entities extract in multiple chunks with slightly different descriptions.
About this, we considers below two approaches and would love your advice:

Approach1. Use existing entity node
Pros: Avoid redundancy, easier to count node with filtering entity name
Cons: Hard to handle descriptions of entity, risk of losing context

Approach2. Create new entity node per chunk
Pros: Keep contextual info intact, no conflict in metadata
Cons: Cause duplication, harder to analyze globally

Thank you so much in advance :)
Your insights would be helpful as scale our RAG project.

Best Regards,
Byeonggyu

Additional Context

No response

Metadata

Metadata

Assignees

No one assigned

    Labels

    questionFurther information is requested

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions