Skip to content

Fix issue 150: Create data model#158

Draft
Silvanoc wants to merge 33 commits intomargo:pre-draftfrom
Silvanoc:create-data-model
Draft

Fix issue 150: Create data model#158
Silvanoc wants to merge 33 commits intomargo:pre-draftfrom
Silvanoc:create-data-model

Conversation

@Silvanoc
Copy link
Copy Markdown
Contributor

@Silvanoc Silvanoc commented Mar 13, 2026

Description

Provide a comprehensive data model using LinkML and generate the documentation and other validation tooling.

⚠️ IMPORTANT REMARK: the hereby generated specification webpage is not yet 100% equivalent to the current release. No more effort will be invested on polishing it until SUP specification-enhancements#48 has been approved. No more effort is needed if it gets rejected.

Issues Addressed

#150

Change Type

Please select the relevant options:

  • Fix (change that resolves an issue)
  • New enhancement (change that adds specification content)
  • Content edits (change that edits existing content)

Checklist

  • I have read the CONTRIBUTING document.
  • My changes adhere to the established patterns, and best practices.

@Silvanoc Silvanoc requested a review from a team as a code owner March 13, 2026 16:04
@Silvanoc Silvanoc marked this pull request as draft March 13, 2026 16:04
@ajcraig
Copy link
Copy Markdown
Contributor

ajcraig commented Mar 13, 2026

How are the png drawings created? I worry without the source it's an additional item that will need to be maintained when a change impacts any element.

@Silvanoc
Copy link
Copy Markdown
Contributor Author

Silvanoc commented Mar 13, 2026

Looking at commit d5e85be, the data model seems to be a group of isolated classes. But looking at the details lots of commonalities can be identified. We have in fact many different places defining the same type of data! Consolidation is needed and will follow in posterior commits in this PR.

image

@Silvanoc
Copy link
Copy Markdown
Contributor Author

How are the png drawings created? I worry without the source it's an additional item that will need to be maintained when a change impacts any element.

As of now, I'm using LinkML to generate PlantUML code, which I manually send to a PlantUML server to generate the PNGs. But that's just WIP for the time being.

Before the PR is marked as ready for merging, I need add code to automatically generate the diagrams in SVG format, validate the examples and provide the JSON-Schemas for validation.

@Silvanoc Silvanoc removed request for a team, nilanjan-samajdar and singhmj-1 March 13, 2026 16:16
@Silvanoc
Copy link
Copy Markdown
Contributor Author

@ajcraig @nilanjan-samajdar @singhmj-1 this is still a draft, not ready for review! Therefore I've removed all reviewers. Sorry, I've created it initially as "Ready to merge" and you probably got therefore a notification.

@Silvanoc
Copy link
Copy Markdown
Contributor Author

Once it's ready for review, I'll ask any contributor to the different parts covered by the data model to review it and some specification maintainers.

@Silvanoc Silvanoc force-pushed the create-data-model branch 2 times, most recently from 69989a7 to 4d2c376 Compare March 18, 2026 11:43
@nilanjan-samajdar
Copy link
Copy Markdown

@Silvanoc,
In order to have LinkML --> OpenAPI, can we take the LinkML “Data Model” and convert it into the OpenAPI/Swagger definition’s “components” section ?

  • Hence the data-objects get imported from LinkML, but the API paths/request/response needs to be written in swagger.
    This approach requires from scripting, but is doable.
  • Also, if you create an "Endpoint" object in the data-model and also specify the request and HTTP response list at the top, we can even do a way the process of manual OpenAPI API paths/request/response creation and do it all through script

@Silvanoc
Copy link
Copy Markdown
Contributor Author

@Silvanoc, In order to have LinkML --> OpenAPI, can we take the LinkML “Data Model” and convert it into the OpenAPI/Swagger definition’s “components” section ?

* Hence the data-objects get imported from LinkML, but the API paths/request/response needs to be written in swagger.
  This approach requires from scripting, but is doable.

* Also, if you create an "Endpoint" object in the data-model and also specify the request and HTTP response list at the top, we can even do a way the process of manual OpenAPI API paths/request/response creation and do it all through script

I'm working on it. The generation of the ./components/schemas section out of LinkML is already working in a prototype, but I'm considering an alternative, since I want to contribute it to LinkML.

All other parts of the OpenAPI specification would be provided externally and they are simply appended programatically. But my intention is to have a LinkML generator that takes two arguments (at least):

  1. The OpenAPI head (metadata, ./paths, ...) as a YAML file.
  2. The LinkML data model as a YAML file too.

The generator makes sure that any resource referenced in the ./paths exists in the data model.

@nilanjan-samajdar
Copy link
Copy Markdown

The generator makes sure that any resource referenced in the ./paths exists in the data model.

Yes, for other elements of the OpenAPI/Swagger, maybe we can keep a template yaml that the LinkML generator uses.

@Silvanoc
Copy link
Copy Markdown
Contributor Author

Data model currently looks so:
DataModel-ClassDiagram

The only thing that hasn't been generated with LinkML are the dashed lines. Because the references use "hidden" IDs (see #161) that cannot be natively modeled with LinkML.

@Silvanoc Silvanoc force-pushed the create-data-model branch 3 times, most recently from 5367f5f to 856b95a Compare March 23, 2026 12:29
@phil-abb phil-abb marked this pull request as ready for review March 26, 2026 15:20
@phil-abb phil-abb marked this pull request as draft March 27, 2026 11:37
@phil-abb
Copy link
Copy Markdown
Contributor

@Silvanoc I misread one of your comments yesterday and took this out of draft. After realizing my mistake, I put it back as a draft.

@phil-abb
Copy link
Copy Markdown
Contributor

@Silvanoc / @ajcraig - I have mixed feelings about this. Having a single source of truth is very helpful, but I'm concerned about the complexity this is introducing and the risk of someone accidentally missing something. This raises the bar for contributions quite high, with all the additional stuff someone will need to understand, instead of creating a simple markdown page.

I like what this enables, but I think we'll need to figure out some way of managing this so we're not causing people to not be able/or want to contribute because of this additional overhead.

@stormc
Copy link
Copy Markdown
Contributor

stormc commented Mar 27, 2026

@Silvanoc / @ajcraig - I have mixed feelings about this. Having a single source of truth is very helpful, but I'm concerned about the complexity this is introducing [...]

As can be seen nicely in the data model graphs above, we already do have that complexity and it's likely to even more increase rather than decrease., i.e., there is this complexity (already now) and it's not going to go away.

This is not introducing complexity but a means to tame the existing (and growing) complexity into a coherent and consistent single source of truth – which is really needed as the PlugFest has shown where we uncovered (very) small inconsistencies here and there that in sum break the whole thing.

We cannot hide complexity, it's there, and trying hiding even parts of it makes it overall an inconsistent mess. The only question IMO is what is the right tooling to help us managing that complexity?

[...] and the risk of someone accidentally missing something.

This is actually prevented by having rigor here.

This raises the bar for contributions quite high, with all the additional stuff someone will need to understand, instead of creating a simple markdown page. [...]
I like what this enables, but I think we'll need to figure out some way of managing this so we're not causing people to not be able/or want to contribute because of this additional overhead.

Granted, this needs to be made as convenient as possible with automation and tooling.

@phil-abb
Copy link
Copy Markdown
Contributor

As can be seen nicely in the data model graphs above, we already do have that complexity

@stormc the complexity I was refering too is more on the tooling side. Contributors will need to learn how to use LinkML, Jinja, understand all the bash and Python scripts, and all the templates. If they want to make an update, they'll need to figure out a bunch of files that need to be updated and checked. If they want to create a new page, it's going to be even more complex.

I acknowledge the need for something to keep all the content consistent, but unless we do something to help make creating and updating content easier, there is a good chance we'll see even fewer contributions. So, whether we have a small team of people that are available to help take someone's markdown and update all these files, or introduce some tooling or AI to make the process easier, we'll need to do something, I think.

@stormc
Copy link
Copy Markdown
Contributor

stormc commented Mar 27, 2026

As can be seen nicely in the data model graphs above, we already do have that complexity

@stormc the complexity I was refering too is more on the tooling side. Contributors will need to learn how to use LinkML, Jinja, understand all the bash and Python scripts, and all the templates. If they want to make an update, they'll need to figure out a bunch of files that need to be updated and checked. If they want to create a new page, it's going to be even more complex.

If you have to fully understand all the gory details of this, then the automation/tooling is insufficient. You will have to follow some (probably extra) steps, granted, but that shouldn't force you to understand the whole machinery. It will be a process getting to this stage, but I do not see an alternative to be honest.

I acknowledge the need for something to keep all the content consistent, but unless we do something to help make creating and updating content easier, there is a good chance we'll see even fewer contributions. So, whether we have a small team of people that are available to help take someone's markdown and update all these files, or introduce some tooling or AI to make the process easier, we'll need to do something, I think.

Fully agree. We need tooling, good tooling, that doesn't stand in between you and contributing, quite the opposite.

@Silvanoc
Copy link
Copy Markdown
Contributor Author

Silvanoc commented Apr 2, 2026

⚠️ I'm generating GitHub Pages in my namespace so that you can see the result. It is still only a draft, therefore some details don't fit yet. But it's enough to get a feeling an impression on the result.

Silvanoc added 23 commits April 2, 2026 11:15
Combine all the individual data models into a unified data model for the
whole Margo specification.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Validate the examples against the LinkML data model.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Add tool to generate JSON-Schemas to validate instances of the top-level
data types:
- ApplicationDeployment
- ApplicationDescription
- DesiredStateManifest
- DeviceCapabilities

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Add tool that creates a class diagram in SVG format that shows all data
types involved in the Margo specification.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Add DeploymentStatus and ComponentStatus missing in the data model.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Add class-relationships that cannot be modelled with LinkML and add
possibility to generate multiple class-focused diagrams.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
@Silvanoc Silvanoc force-pushed the create-data-model branch from c83b770 to cf6cabb Compare April 2, 2026 09:15
@Silvanoc
Copy link
Copy Markdown
Contributor Author

Silvanoc commented Apr 2, 2026

Meanwhile I'm convinced that the change is wide/big enough to be a SUP worth: margo/specification-enhancements#48

That way we can also gather wider community feedback.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>
@Silvanoc
Copy link
Copy Markdown
Contributor Author

Silvanoc commented Apr 2, 2026

Now also the JSON-Schemas of the top-level resources are generated and provided to be downloaded, see here for the ApplicationDescription.

@chrisgclayton
Copy link
Copy Markdown

Thanks this is great work. With any generation tooling I have found that beyond "the most basic" items there is an element of validation that has to happen after etc. to confirm the results are accurate to what you intended. Is the plan here to have this and in the event there are incorrect or less than optimized outputs we go back to the source and modify the LinkML or do you modify the resulting artifact? Just wondering from the process side of its usage.

@Silvanoc
Copy link
Copy Markdown
Contributor Author

Thanks this is great work.

@chrisgclayton thanks. And thank you for engaging in this conversation.

With any generation tooling I have found that beyond "the most basic" items there is an element of validation that has to happen after etc. to confirm the results are accurate to what you intended.

Fully agree.

I assume that you don't mean the "accuracy" of what LinkML generates out of the models, right? LinkML has its own tests for that purpose and any doubts in that direction should result in improvements of LinkML's test suite. It's OSS, so we could contribute tests, I've done it myself.

If excluding LinkML accuracy/correctness, then only the inputs (the model, tool parameters, custom templates,...) remain. We must differentiate two different aspects:

  1. accuracy of re-generation: is generation only changing was it meant to be changed and leaving the rest untouched?
  2. accuracy of newly generated: are the changes in the model resulting in the expected output?

IMO for both we should persist the result into a git branch in which any changes generate a commit (I thought that gh-pages would provide exactly that, but it only has 1 commit), the a human-being reviews only the changes.

We can also have some kind of custom tests, like a validation of the HMTL DOM vs. the model. You cannot have 100% coverage without somehow building a completely new generator. So finding the sweet spot that has a lot of coverage without much effort would be key.

Any of these would be way better than what we have now: manual review of the output and manual consistency check between the different parts of the specification.

Is the plan here to have this and in the event there are incorrect or less than optimized outputs we go back to the source and modify the LinkML or do you modify the resulting artifact? Just wondering from the process side of its usage.

Reading this sentence again to write an answer, it now sounds to me as if you had some concerns about LinkML generation itself... If that's the case, my proposal is exactly the same for LinkML as it would be for any alternative: working upstream.

LinkML is not only OSS, but also an open community in which fixes are highly welcomed. So we report issues, we fix them (if we can) and extend the tests. We can have temporary forks with the fixes until a new upstream version has incorporated them.

In the bad case: that we don't get our fixes accepted, we can fork.

In the wort case: that we don't get any fixes (incapable of fixing them ourselves, no fix from the community), then we can only fix the output.

I hope I have addressed your concerns.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

6 participants