Schemas are used to represent the structure that IATI XML is expected to be in. They contain a number of elements and attributes. Each of these has information that would be useful to extract. This includes descriptions, the occurrence properties, and XPaths that things occur at. Following research into this area, there does not appear to be a standard method to undertake this task using open tooling.
#64 provides an initial attempt at extracting this information. This is, however, using tools that aren't really designed for the job, leading to hundreds of lines of fairly confusing code that is hard to comprehend, doesn't really handle all the cases that it needs to, and would be a challenge to maintain.
It is therefore proposed to implement this functionality using a two-stage process:
- Utilise XSLT to transform the Schema into an Intermediate Representation (IR) that has the information structured in an easy-to-query format
- Have capabilities available within the
schemas module to access the information presented in the IR through a defined Python API
Based on preliminary investigation, the IR will likely:
- Treat elements and attributes as equivalents
- ie. an
optional attribute would become: min_occurs = 0 and max_occurs = 1
- Be designed such that the primary key is an XPath
Schemas are used to represent the structure that IATI XML is expected to be in. They contain a number of elements and attributes. Each of these has information that would be useful to extract. This includes descriptions, the occurrence properties, and XPaths that things occur at. Following research into this area, there does not appear to be a standard method to undertake this task using open tooling.
#64 provides an initial attempt at extracting this information. This is, however, using tools that aren't really designed for the job, leading to hundreds of lines of fairly confusing code that is hard to comprehend, doesn't really handle all the cases that it needs to, and would be a challenge to maintain.
It is therefore proposed to implement this functionality using a two-stage process:
schemasmodule to access the information presented in the IR through a defined Python APIBased on preliminary investigation, the IR will likely:
optionalattribute would become:min_occurs = 0andmax_occurs = 1