Skip to content

Need some guidance on constructing schema #512

Description

@lzt5269

Hi,

I'm trying to extracting information from sentences of pathology report.

My input looks like this:
"Colorectal mucosa with cryptitis, increased lamina propria chronic inflammation, ulceration and focal mild crypt architectural irregularity. The patient's clinical history of Crohn's disease is noted. No granulomas and no dysplasia are identified."

I would expect the result like:
extracted_object:
pathology_statements:
- diagnosis: cryptitis
qualifiers: Not Specified,
severity: N/A,
part: mucosa,
subpart: Not Specified,
negative: 'false'
- diagnosis: inflammation
qualifiers: Not Specified,
severity: N/A,
part: mucosa,
subpart: lamina propria,
negative: 'false'
- diagnosis: ulceration
qualifiers: Not Specified,
severity: N/A,
part: mucosa,
subpart: Not Specified,
negative: 'false'
- diagnosis: architectural irregularity
qualifiers: focal,
severity: mild,
part: mucosa,
subpart: crypt,
negative: 'false'
- diagnosis: granulomas
qualifiers: Not Specified,
severity: N/A,
part: Not Specified,
subpart: Not Specified,
negative: 'true'
- diagnosis: dysplasia
qualifiers: Not Specified,
severity: N/A,
part: Not Specified,
subpart: Not Specified,
negative: 'true'
named_entities:

  • id: ...
    label: ...

I adapted the pathology.yaml like this:
`id: http://w3id.org/ontogpt/pathologytest
name: pathologytest
title: Pathology Grounding Template
description: >-
A template for extracting and grounding pathology descriptions from text.
license: https://creativecommons.org/publicdomain/zero/1.0/
prefixes:
rdf: http://www.w3.org/1999/02/22-rdf-syntax-ns#
PATO: http://purl.obolibrary.org/obo/PATO_
UBERON: http://purl.obolibrary.org/obo/UBERON_
linkml: https://w3id.org/linkml/
pathology: http://w3id.org/ontogpt/pathologytest
keywords:

  • pathology
  • phenotype

default_prefix: pathologytest
default_range: string

imports:

  • linkml:types
  • core

classes:

PathologyReport:
tree_root: true
description: >-
A collection of structured pathology statements extracted from text.
attributes:
pathology_statements:
description: >-
A list of pathology statements, each describing a single diagnosis
with its own context.
range: PathologyStatement
multivalued: true

PathologyStatement:
description: >-
A statement that describes a pathology, including any diagnoses,
one or more specific qualities being measured and the anatomical
location or tissue the pathology is measured in.
attributes:
diagnosis:
description: >-
The diagnosis or pathology being described. This may include
full diagnoses or observations, for example, "colitis",
"inflammation", "dysplasia", "polyp". If not specified, this value
must be "Clinical finding". If a diagnosis cannot be reached
(e.g., due to lack of tissue sample), this value must be
"Clinical finding". Do not include qualifiers in this field,
e.g., "active colitis" should be "colitis".
range: Diagnosis
annotations:
prompt.example: colitis, inflammation, dysplasia
qualifiers:
description: >-
A semicolon-delimited list of descriptors other than those for
severity. If not specified, this value must be "Not Specified".
any_of:
- range: Qualifier
- range: null
annotations:
prompt.example: focal, diffuse, patchy, extensive, ulcerative
severity:
description: >-
The severity of the pathology, for example, mild, moderate, or severe.
If not specified, this value must be "N/A".
any_of:
- range: SeverityLevel
- range: null
annotations:
prompt.example: mild, moderate, severe
part:
description: >-
The part of the anatomical entity (e.g., mucosa, polyp). If not specified, this value must be "Not Specified".
any_of:
- range: Part
- range: null
annotations:
prompt.example: mucosa, polyp
subpart:
description: >-
The subpart of the anatomical entity (e.g., crypts, epithelium). If not specified, this value must be "Not Specified".
any_of:
- range: Subpart
- range: null
annotations:
prompt.example: crypt, epithelium, Paneth cell
negative:
description: >-
Whether the pathology is negative or not present. This must be
explicitly stated in the input, e.g., "no granulomas", in order
to be true. It is otherwise false.
range: string
annotations:
prompt.example: true, false

Diagnosis:
is_a: NamedEntity
id_prefixes:
- SNOMEDCT
- ICD10CM
annotations:
annotators: bioportal:SNOMEDCT, bioportal:ICD10CM, sqlite:obo:ncit, sqlite:obo:mesh, sqlite:obo:mondo

Qualifier:
is_a: NamedEntity
id_prefixes:
- PATO
annotations:
annotators: sqlite:obo:pato

enums:
SeverityLevel:
description: >-
The severity of a pathology.
permissible_values:
mild:
description: >-
A pathology that is mild in severity.
meaning: PATO:0000394
moderate:
description: >-
A pathology that is moderate in severity.
meaning: PATO:0000395
severe:
description: >-
A pathology that is severe in severity.
meaning: PATO:0000396
Not Specified:
description: >-
The severity of the pathology is not specified.

Part:
description: >-
The part of the anatomical entity being measured.
permissible_values:
mucosa:
description: >-
The subjective tissue of the dianosis
meaning: UBERON:0000344
polyp:
description: >-
A polypoid lesion in the gastrointestinal tract.
meaning: MONDO:0005079
Not Specified:
description: >-
The part of the anatomical entity is not specified.

Subpart:
description: >-
The subpart of the anatomical entity being measured.
permissible_values:
lamina_propria:
description: >-
The lamina propria of the gastrointestinal tract.
meaning: UBERON:0000030
crypts:
description: >-
The crypts of the gastrointestinal tract.
meaning: UBERON:0013485
epithelium:
description: >-
The epithelium of the gastrointestinal tract.
meaning: BTO:0000416
Paneth_cell:
description: >-
A type of epithelial cell in the gastrointestinal tract.
meaning: SNOMED:84907006
Not Specified:
description: >-
The subpart of the anatomical entity is not specified.
However, my output doesn't generate multiple diagnosis and seems to have a lot of errors here:ontogpt extract -i 5865793727.txt -t ./pathology1.yaml --model ollama/hf.co/unsloth/phi-4-GGUF:Q8_0 --api-base http://127.0.0.1:5269
ERROR:root:Cannot find slot for here's_the_extraction_based_on_your_instruction in Here's the extraction based on your instructions:
ERROR:root:Cannot find slot for diagnosi in Diagnosis: Colorectal mucosa with cryptitis.;
ERROR:root:Cannot find slot for diagnosi in Diagnosis: Increased lamina propria chronic inflammation.;
ERROR:root:Cannot find slot for diagnosi in Diagnosis: Ulceration.;
ERROR:root:Cannot find slot for diagnosi in Diagnosis: Focal mild crypt architectural irregularity.;
ERROR:root:Cannot find slot for to_split_the_given_text_into_fields_based_on_the_specified_format in To split the given text into fields based on the specified format:
ERROR:root:Line '### Text Analysis' does not contain a colon; ignoring
ERROR:root:Cannot find slot for -_given_text in - Given Text: The patient has a clinical history of Crohn's disease.
ERROR:root:Line '### Field Extraction' does not contain a colon; ignoring
ERROR:root:Cannot find slot for 1._diagnosi in 1. Diagnosis:;- The text mentions Crohn's disease.
ERROR:root:Line '- According to instructions, we take just the main pathology without qualifiers.' does not contain a colon; ignoring
ERROR:root:Cannot find slot for -_value in - Value: Crohn's disease
ERROR:root:Cannot find slot for 2._qualifier in 2. Qualifiers:;- No specific descriptors other than those for severity are mentioned in the text.
ERROR:root:Cannot find slot for -_value in - Value: Not Specified
ERROR:root:Cannot find slot for 3._severity in 3. Severity:;- The severity of Crohn's disease is not specified in the text.
ERROR:root:Cannot find slot for -_value in - Value: N/A
ERROR:root:Cannot find slot for 4._part in 4. Part:;- No specific anatomical part is mentioned in the text.
ERROR:root:Cannot find slot for -_value in - Value: Not Specified
ERROR:root:Cannot find slot for 5._subpart in 5. Subpart:;- No specific subpart of an anatomical entity is mentioned in the text.
ERROR:root:Cannot find slot for -_value in - Value: Not Specified
ERROR:root:Cannot find slot for 6._negative in 6. Negative:;- There is no explicit statement indicating that a pathology is negative or not present.
ERROR:root:Cannot find slot for -_value in - Value: False
ERROR:root:Line '### Final Output' does not contain a colon; ignoring
ERROR:root:Line 'This breakdown follows the instructions for parsing and categorizing information from the provided text.' does not contain a colon; ignoring
ERROR:root:Cannot find slot for based_on_the_provided_text,_here_are_the_extracted_entitie in Based on the provided text, here are the extracted entities:
ERROR:root:Cannot find slot for -_diagnosi in - diagnosis: Clinical finding
ERROR:root:Cannot find slot for -_explanation in - Explanation: The absence of a specific pathological entity like granulomas leads to this classification.
ERROR:root:Cannot find slot for -_qualifier in - qualifiers: Not Specified
ERROR:root:Cannot find slot for -_explanation in - Explanation: No additional descriptors are given beyond the mention of granulomas.
ERROR:root:Cannot find slot for -_severity in - severity: N/A
ERROR:root:Cannot find slot for -_explanation in - Explanation: There is no indication of any pathology's severity since no specific condition is diagnosed.
ERROR:root:Cannot find slot for -_part in - part: Not Specified
ERROR:root:Cannot find slot for -_explanation in - Explanation: The text does not specify a particular anatomical part where a diagnosis might apply.
ERROR:root:Cannot find slot for -_subpart in - subpart: Not Specified
ERROR:root:Cannot find slot for -_explanation in - Explanation: Similarly, there is no mention of a specific subpart within an anatomical entity.
ERROR:root:Cannot find slot for -_negative in - negative: True
ERROR:root:Cannot find slot for -_explanation in - Explanation: The statement explicitly mentions the absence of granulomas with No granulomas are identified.
ERROR:root:Cannot find slot for explanation in Explanation:
ERROR:root:Cannot find slot for -_diagnosi in - Diagnosis: The text indicates the absence of dysplasia, which implies a clinical finding rather than a specific diagnosis of pathology. Hence, it's listed as Clinical finding.
ERROR:root:Cannot find slot for -_qualifier in - Qualifiers: There are no additional descriptors provided, so this is Not Specified.
ERROR:root:Cannot find slot for -_severity in - Severity: No severity is mentioned or implied in the text, hence it is N/A.
ERROR:root:Cannot find slot for -_part_and_subpart in - Part and Subpart: The anatomical part or subpart is not specified in the statement, resulting in both being Not Specified.
ERROR:root:Cannot find slot for -_negative in - Negative: The pathology (dysplasia) is explicitly stated as absent (No dysplasia), which qualifies this as negative.

input_text: Colorectal mucosa with cryptitis, increased lamina propria chronic inflammation,
ulceration and focal mild crypt architectural irregularity. The patient's clinical
history of Crohn's disease is noted. No granulomas and no dysplasia are identified.
raw_completion_output: |-
Here's the extraction based on your instructions:

Pathology Statements:

  1. Diagnosis: Colorectal mucosa with cryptitis.

    • Context: This involves inflammation of the glandular structures (crypts) in the colon lining, observed during examination.
  2. Diagnosis: Increased lamina propria chronic inflammation.

    • Context: Chronic inflammation noted in the supportive tissue layer beneath the epithelium of the intestinal mucosa.
  3. Diagnosis: Ulceration.

    • Context: The presence of ulcers (open sores) is identified on the mucosal surface of the colon.
  4. Diagnosis: Focal mild crypt architectural irregularity.

    • Context: Localized, minor structural abnormalities in the crypts within the colonic tissue were observed.

Additional Clinical Context:

  • The patient has a clinical history of Crohn's disease.
  • No granulomas are identified, which can sometimes be associated with Crohn’s.
  • No dysplasia is detected, indicating no presence of precancerous cells.
    prompt: |+
    From the text below, extract the following entities in the following format:

diagnosis: <The diagnosis or pathology being described. This may include full diagnoses or observations, for example, "colitis", "inflammation", "dysplasia", "polyp". If not specified, this value must be "Clinical finding". If a diagnosis cannot be reached (e.g., due to lack of tissue sample), this value must be "Clinical finding". Do not include qualifiers in this field, e.g., "active colitis" should be "colitis".>
qualifiers: <A semicolon-delimited list of descriptors other than those for severity. If not specified, this value must be "Not Specified".>
severity: <The severity of the pathology, for example, mild, moderate, or severe. If not specified, this value must be "N/A".>
part: <The part of the anatomical entity (e.g., mucosa, polyp). If not specified, this value must be "Not Specified".>
subpart: <The subpart of the anatomical entity (e.g., crypts, epithelium). If not specified, this value must be "Not Specified".>
negative: <Whether the pathology is negative or not present. This must be explicitly stated in the input, e.g., "no granulomas", in order to be true. It is otherwise false.>

Text:

  • No dysplasia is detected, indicating no presence of precancerous cells.

===

extracted_object:
pathology_statements:
- diagnosis: SNOMEDCT:404684003
qualifiers: Not Specified,
severity: N/A,
part: Not Specified,
subpart: Not Specified,
negative: 'true'
named_entities:

  • id: SNOMEDCT:404684003
    label: Clinical finding,`

Any tips could be helpful! Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    annotationStrategies for accomplishing or improving annotation

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions