Fix issue 150: Create data model#158
Conversation
|
How are the png drawings created? I worry without the source it's an additional item that will need to be maintained when a change impacts any element. |
|
Looking at commit d5e85be, the data model seems to be a group of isolated classes. But looking at the details lots of commonalities can be identified. We have in fact many different places defining the same type of data! Consolidation is needed and will follow in posterior commits in this PR.
|
As of now, I'm using LinkML to generate PlantUML code, which I manually send to a PlantUML server to generate the PNGs. But that's just WIP for the time being. Before the PR is marked as ready for merging, I need add code to automatically generate the diagrams in SVG format, validate the examples and provide the JSON-Schemas for validation. |
|
@ajcraig @nilanjan-samajdar @singhmj-1 this is still a draft, not ready for review! Therefore I've removed all reviewers. Sorry, I've created it initially as "Ready to merge" and you probably got therefore a notification. |
|
Once it's ready for review, I'll ask any contributor to the different parts covered by the data model to review it and some specification maintainers. |
69989a7 to
4d2c376
Compare
|
@Silvanoc,
|
I'm working on it. The generation of the All other parts of the OpenAPI specification would be provided externally and they are simply appended programatically. But my intention is to have a LinkML generator that takes two arguments (at least):
The generator makes sure that any resource referenced in the |
Yes, for other elements of the OpenAPI/Swagger, maybe we can keep a template yaml that the LinkML generator uses. |
|
Data model currently looks so: The only thing that hasn't been generated with LinkML are the dashed lines. Because the references use "hidden" IDs (see #161) that cannot be natively modeled with LinkML. |
5367f5f to
856b95a
Compare
|
@Silvanoc I misread one of your comments yesterday and took this out of draft. After realizing my mistake, I put it back as a draft. |
|
@Silvanoc / @ajcraig - I have mixed feelings about this. Having a single source of truth is very helpful, but I'm concerned about the complexity this is introducing and the risk of someone accidentally missing something. This raises the bar for contributions quite high, with all the additional stuff someone will need to understand, instead of creating a simple markdown page. I like what this enables, but I think we'll need to figure out some way of managing this so we're not causing people to not be able/or want to contribute because of this additional overhead. |
As can be seen nicely in the data model graphs above, we already do have that complexity and it's likely to even more increase rather than decrease., i.e., there is this complexity (already now) and it's not going to go away. This is not introducing complexity but a means to tame the existing (and growing) complexity into a coherent and consistent single source of truth – which is really needed as the PlugFest has shown where we uncovered (very) small inconsistencies here and there that in sum break the whole thing. We cannot hide complexity, it's there, and trying hiding even parts of it makes it overall an inconsistent mess. The only question IMO is what is the right tooling to help us managing that complexity?
This is actually prevented by having rigor here.
Granted, this needs to be made as convenient as possible with automation and tooling. |
@stormc the complexity I was refering too is more on the tooling side. Contributors will need to learn how to use LinkML, Jinja, understand all the bash and Python scripts, and all the templates. If they want to make an update, they'll need to figure out a bunch of files that need to be updated and checked. If they want to create a new page, it's going to be even more complex. I acknowledge the need for something to keep all the content consistent, but unless we do something to help make creating and updating content easier, there is a good chance we'll see even fewer contributions. So, whether we have a small team of people that are available to help take someone's markdown and update all these files, or introduce some tooling or AI to make the process easier, we'll need to do something, I think. |
If you have to fully understand all the gory details of this, then the automation/tooling is insufficient. You will have to follow some (probably extra) steps, granted, but that shouldn't force you to understand the whole machinery. It will be a process getting to this stage, but I do not see an alternative to be honest.
Fully agree. We need tooling, good tooling, that doesn't stand in between you and contributing, quite the opposite. |
|
|
Combine all the individual data models into a unified data model for the whole Margo specification. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Validate the examples against the LinkML data model. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Add tool to generate JSON-Schemas to validate instances of the top-level data types: - ApplicationDeployment - ApplicationDescription - DesiredStateManifest - DeviceCapabilities Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Add tool that creates a class diagram in SVG format that shows all data types involved in the Margo specification. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Add DeploymentStatus and ComponentStatus missing in the data model. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Add class-relationships that cannot be modelled with LinkML and add possibility to generate multiple class-focused diagrams. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
c83b770 to
cf6cabb
Compare
|
Meanwhile I'm convinced that the change is wide/big enough to be a SUP worth: margo/specification-enhancements#48 That way we can also gather wider community feedback. |
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
|
Now also the JSON-Schemas of the top-level resources are generated and provided to be downloaded, see here for the ApplicationDescription. |
|
Thanks this is great work. With any generation tooling I have found that beyond "the most basic" items there is an element of validation that has to happen after etc. to confirm the results are accurate to what you intended. Is the plan here to have this and in the event there are incorrect or less than optimized outputs we go back to the source and modify the LinkML or do you modify the resulting artifact? Just wondering from the process side of its usage. |
@chrisgclayton thanks. And thank you for engaging in this conversation.
Fully agree. I assume that you don't mean the "accuracy" of what LinkML generates out of the models, right? LinkML has its own tests for that purpose and any doubts in that direction should result in improvements of LinkML's test suite. It's OSS, so we could contribute tests, I've done it myself. If excluding LinkML accuracy/correctness, then only the inputs (the model, tool parameters, custom templates,...) remain. We must differentiate two different aspects:
IMO for both we should persist the result into a git branch in which any changes generate a commit (I thought that We can also have some kind of custom tests, like a validation of the HMTL DOM vs. the model. You cannot have 100% coverage without somehow building a completely new generator. So finding the sweet spot that has a lot of coverage without much effort would be key. Any of these would be way better than what we have now: manual review of the output and manual consistency check between the different parts of the specification.
Reading this sentence again to write an answer, it now sounds to me as if you had some concerns about LinkML generation itself... If that's the case, my proposal is exactly the same for LinkML as it would be for any alternative: working upstream. LinkML is not only OSS, but also an open community in which fixes are highly welcomed. So we report issues, we fix them (if we can) and extend the tests. We can have temporary forks with the fixes until a new upstream version has incorporated them. In the bad case: that we don't get our fixes accepted, we can fork. In the wort case: that we don't get any fixes (incapable of fixing them ourselves, no fix from the community), then we can only fix the output. I hope I have addressed your concerns. |

Description
Provide a comprehensive data model using LinkML and generate the documentation and other validation tooling.
Issues Addressed
#150
Change Type
Please select the relevant options:
Checklist