Fix issue 150: Create data model by Silvanoc · Pull Request #158 · margo/specification

Silvanoc · 2026-03-13T16:04:28Z

Description

Provide a comprehensive data model using LinkML and generate the documentation and other validation tooling.

⚠️ IMPORTANT REMARK: the hereby generated specification webpage is not yet 100% equivalent to the current release. No more effort will be invested on polishing it until SUP specification-enhancements#48 has been approved. No more effort is needed if it gets rejected.

Issues Addressed

#150

Change Type

Please select the relevant options:

Fix (change that resolves an issue)
New enhancement (change that adds specification content)
Content edits (change that edits existing content)

Checklist

I have read the CONTRIBUTING document.
My changes adhere to the established patterns, and best practices.

ajcraig · 2026-03-13T16:11:10Z

How are the png drawings created? I worry without the source it's an additional item that will need to be maintained when a change impacts any element.

Silvanoc · 2026-03-13T16:12:44Z

Looking at commit d5e85be, the data model seems to be a group of isolated classes. But looking at the details lots of commonalities can be identified. We have in fact many different places defining the same type of data! Consolidation is needed and will follow in posterior commits in this PR.

Silvanoc · 2026-03-13T16:15:39Z

How are the png drawings created? I worry without the source it's an additional item that will need to be maintained when a change impacts any element.

As of now, I'm using LinkML to generate PlantUML code, which I manually send to a PlantUML server to generate the PNGs. But that's just WIP for the time being.

Before the PR is marked as ready for merging, I need add code to automatically generate the diagrams in SVG format, validate the examples and provide the JSON-Schemas for validation.

Silvanoc · 2026-03-13T16:17:49Z

@ajcraig @nilanjan-samajdar @singhmj-1 this is still a draft, not ready for review! Therefore I've removed all reviewers. Sorry, I've created it initially as "Ready to merge" and you probably got therefore a notification.

Silvanoc · 2026-03-13T16:20:07Z

Once it's ready for review, I'll ask any contributor to the different parts covered by the data model to review it and some specification maintainers.

nilanjan-samajdar · 2026-03-19T16:08:09Z

@Silvanoc,
In order to have LinkML --> OpenAPI, can we take the LinkML “Data Model” and convert it into the OpenAPI/Swagger definition’s “components” section ?

Hence the data-objects get imported from LinkML, but the API paths/request/response needs to be written in swagger.
This approach requires from scripting, but is doable.
Also, if you create an "Endpoint" object in the data-model and also specify the request and HTTP response list at the top, we can even do a way the process of manual OpenAPI API paths/request/response creation and do it all through script

Silvanoc · 2026-03-19T16:23:32Z

@Silvanoc, In order to have LinkML --> OpenAPI, can we take the LinkML “Data Model” and convert it into the OpenAPI/Swagger definition’s “components” section ?

* Hence the data-objects get imported from LinkML, but the API paths/request/response needs to be written in swagger.
  This approach requires from scripting, but is doable.

* Also, if you create an "Endpoint" object in the data-model and also specify the request and HTTP response list at the top, we can even do a way the process of manual OpenAPI API paths/request/response creation and do it all through script

I'm working on it. The generation of the ./components/schemas section out of LinkML is already working in a prototype, but I'm considering an alternative, since I want to contribute it to LinkML.

All other parts of the OpenAPI specification would be provided externally and they are simply appended programatically. But my intention is to have a LinkML generator that takes two arguments (at least):

The OpenAPI head (metadata, ./paths, ...) as a YAML file.
The LinkML data model as a YAML file too.

The generator makes sure that any resource referenced in the ./paths exists in the data model.

nilanjan-samajdar · 2026-03-19T16:36:40Z

The generator makes sure that any resource referenced in the ./paths exists in the data model.

Yes, for other elements of the OpenAPI/Swagger, maybe we can keep a template yaml that the LinkML generator uses.

Silvanoc · 2026-03-20T14:51:23Z

Data model currently looks so:

The only thing that hasn't been generated with LinkML are the dashed lines. Because the references use "hidden" IDs (see #161) that cannot be natively modeled with LinkML.

phil-abb · 2026-03-27T11:41:52Z

@Silvanoc I misread one of your comments yesterday and took this out of draft. After realizing my mistake, I put it back as a draft.

phil-abb · 2026-03-27T11:54:08Z

@Silvanoc / @ajcraig - I have mixed feelings about this. Having a single source of truth is very helpful, but I'm concerned about the complexity this is introducing and the risk of someone accidentally missing something. This raises the bar for contributions quite high, with all the additional stuff someone will need to understand, instead of creating a simple markdown page.

I like what this enables, but I think we'll need to figure out some way of managing this so we're not causing people to not be able/or want to contribute because of this additional overhead.

stormc · 2026-03-27T13:13:14Z

@Silvanoc / @ajcraig - I have mixed feelings about this. Having a single source of truth is very helpful, but I'm concerned about the complexity this is introducing [...]

As can be seen nicely in the data model graphs above, we already do have that complexity and it's likely to even more increase rather than decrease., i.e., there is this complexity (already now) and it's not going to go away.

This is not introducing complexity but a means to tame the existing (and growing) complexity into a coherent and consistent single source of truth – which is really needed as the PlugFest has shown where we uncovered (very) small inconsistencies here and there that in sum break the whole thing.

We cannot hide complexity, it's there, and trying hiding even parts of it makes it overall an inconsistent mess. The only question IMO is what is the right tooling to help us managing that complexity?

[...] and the risk of someone accidentally missing something.

This is actually prevented by having rigor here.

This raises the bar for contributions quite high, with all the additional stuff someone will need to understand, instead of creating a simple markdown page. [...]
I like what this enables, but I think we'll need to figure out some way of managing this so we're not causing people to not be able/or want to contribute because of this additional overhead.

Granted, this needs to be made as convenient as possible with automation and tooling.

phil-abb · 2026-03-27T13:27:53Z

As can be seen nicely in the data model graphs above, we already do have that complexity

@stormc the complexity I was refering too is more on the tooling side. Contributors will need to learn how to use LinkML, Jinja, understand all the bash and Python scripts, and all the templates. If they want to make an update, they'll need to figure out a bunch of files that need to be updated and checked. If they want to create a new page, it's going to be even more complex.

I acknowledge the need for something to keep all the content consistent, but unless we do something to help make creating and updating content easier, there is a good chance we'll see even fewer contributions. So, whether we have a small team of people that are available to help take someone's markdown and update all these files, or introduce some tooling or AI to make the process easier, we'll need to do something, I think.

stormc · 2026-03-27T14:09:26Z

As can be seen nicely in the data model graphs above, we already do have that complexity

@stormc the complexity I was refering too is more on the tooling side. Contributors will need to learn how to use LinkML, Jinja, understand all the bash and Python scripts, and all the templates. If they want to make an update, they'll need to figure out a bunch of files that need to be updated and checked. If they want to create a new page, it's going to be even more complex.

If you have to fully understand all the gory details of this, then the automation/tooling is insufficient. You will have to follow some (probably extra) steps, granted, but that shouldn't force you to understand the whole machinery. It will be a process getting to this stage, but I do not see an alternative to be honest.

I acknowledge the need for something to keep all the content consistent, but unless we do something to help make creating and updating content easier, there is a good chance we'll see even fewer contributions. So, whether we have a small team of people that are available to help take someone's markdown and update all these files, or introduce some tooling or AI to make the process easier, we'll need to do something, I think.

Fully agree. We need tooling, good tooling, that doesn't stand in between you and contributing, quite the opposite.

Silvanoc · 2026-04-02T09:13:28Z

⚠️ I'm generating GitHub Pages in my namespace so that you can see the result. It is still only a draft, therefore some details don't fit yet. But it's enough to get a feeling an impression on the result.

Combine all the individual data models into a unified data model for the whole Margo specification. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

Validate the examples against the LinkML data model. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

Add tool to generate JSON-Schemas to validate instances of the top-level data types: - ApplicationDeployment - ApplicationDescription - DesiredStateManifest - DeviceCapabilities Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

Add tool that creates a class diagram in SVG format that shows all data types involved in the Margo specification. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

Add DeploymentStatus and ComponentStatus missing in the data model. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

Add class-relationships that cannot be modelled with LinkML and add possibility to generate multiple class-focused diagrams. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

Silvanoc · 2026-04-02T11:59:37Z

Meanwhile I'm convinced that the change is wide/big enough to be a SUP worth: margo/specification-enhancements#48

That way we can also gather wider community feedback.

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

Silvanoc · 2026-04-02T16:19:18Z

Now also the JSON-Schemas of the top-level resources are generated and provided to be downloaded, see here for the ApplicationDescription.

chrisgclayton · 2026-04-20T17:10:19Z

Thanks this is great work. With any generation tooling I have found that beyond "the most basic" items there is an element of validation that has to happen after etc. to confirm the results are accurate to what you intended. Is the plan here to have this and in the event there are incorrect or less than optimized outputs we go back to the source and modify the LinkML or do you modify the resulting artifact? Just wondering from the process side of its usage.

Silvanoc · 2026-04-21T06:40:07Z

Thanks this is great work.

@chrisgclayton thanks. And thank you for engaging in this conversation.

With any generation tooling I have found that beyond "the most basic" items there is an element of validation that has to happen after etc. to confirm the results are accurate to what you intended.

Fully agree.

I assume that you don't mean the "accuracy" of what LinkML generates out of the models, right? LinkML has its own tests for that purpose and any doubts in that direction should result in improvements of LinkML's test suite. It's OSS, so we could contribute tests, I've done it myself.

If excluding LinkML accuracy/correctness, then only the inputs (the model, tool parameters, custom templates,...) remain. We must differentiate two different aspects:

accuracy of re-generation: is generation only changing was it meant to be changed and leaving the rest untouched?
accuracy of newly generated: are the changes in the model resulting in the expected output?

IMO for both we should persist the result into a git branch in which any changes generate a commit (I thought that gh-pages would provide exactly that, but it only has 1 commit), the a human-being reviews only the changes.

We can also have some kind of custom tests, like a validation of the HMTL DOM vs. the model. You cannot have 100% coverage without somehow building a completely new generator. So finding the sweet spot that has a lot of coverage without much effort would be key.

Any of these would be way better than what we have now: manual review of the output and manual consistency check between the different parts of the specification.

Is the plan here to have this and in the event there are incorrect or less than optimized outputs we go back to the source and modify the LinkML or do you modify the resulting artifact? Just wondering from the process side of its usage.

Reading this sentence again to write an answer, it now sounds to me as if you had some concerns about LinkML generation itself... If that's the case, my proposal is exactly the same for LinkML as it would be for any alternative: working upstream.

LinkML is not only OSS, but also an open community in which fixes are highly welcomed. So we report issues, we fix them (if we can) and extend the tests. We can have temporary forks with the fixes until a new upstream version has incorporated them.

In the bad case: that we don't get our fixes accepted, we can fork.

In the wort case: that we don't get any fixes (incapable of fixing them ourselves, no fix from the community), then we can only fix the output.

I hope I have addressed your concerns.

Silvanoc requested a review from a team as a code owner March 13, 2026 16:04

Silvanoc marked this pull request as draft March 13, 2026 16:04

ajcraig requested review from nilanjan-samajdar and singhmj-1 March 13, 2026 16:12

Silvanoc removed request for a team, nilanjan-samajdar and singhmj-1 March 13, 2026 16:16

Silvanoc force-pushed the create-data-model branch 2 times, most recently from 69989a7 to 4d2c376 Compare March 18, 2026 11:43

ajcraig mentioned this pull request Mar 18, 2026

[TASK: ] LinkML integration margo/sandbox#230

Closed

4 tasks

Silvanoc force-pushed the create-data-model branch 3 times, most recently from 5367f5f to 856b95a Compare March 23, 2026 12:29

phil-abb marked this pull request as ready for review March 26, 2026 15:20

phil-abb marked this pull request as draft March 27, 2026 11:37

Silvanoc added 23 commits April 2, 2026 11:15

consolidate data models into single one

5a3436d

Combine all the individual data models into a unified data model for the whole Margo specification. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

add complete data model diagram

fcf8ccd

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

add tool to validate examples

5a80356

Validate the examples against the LinkML data model. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

add tool to draw class diagram with all data types

7d79c92

Add tool that creates a class diagram in SVG format that shows all data types involved in the Margo specification. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

add missing status data models

fb30c94

Add DeploymentStatus and ComponentStatus missing in the data model. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

improve diagram generation

1232646

Add class-relationships that cannot be modelled with LinkML and add possibility to generate multiple class-focused diagrams. Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

add generation of openapi yaml specification

eb2dffc

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

add generation of html documentation

7ba293a

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

add tool to generate all needed artifacts

864bca2

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

improve bash script format

07a5c18

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

add examples to documentation

c78a5ac

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

improve data model documentation generation

f589e20

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

improve data model documentation

06d813e

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

add html headers to make generated files recognizable

080fa60

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

use jinja2 comments instead of html ones

2acf0d2

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

reduce visibility of detailed main class views

c127c0b

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

beautify documentation pages

40de7fc

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

update documentation generation to changes

ae6f8b0

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

generate full data model diagram also in png

bd1bd34

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

make data-model diagram easy to view

91b263b

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

beautify documentation

71846ac

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

no mkdocs build in docs generation

cf6cabb

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

Silvanoc force-pushed the create-data-model branch from c83b770 to cf6cabb Compare April 2, 2026 09:15

make json-schemas downloadable

ce5da4f

Signed-off-by: Silvano Cirujano Cuesta <silvano.cirujano-cuesta@siemens.com>

phil-abb mentioned this pull request Apr 16, 2026

Data model with LinkML margo/specification-enhancements#48

Merged

Conversation

Silvanoc commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Issues Addressed

Change Type

Checklist

Uh oh!

ajcraig commented Mar 13, 2026

Uh oh!

Silvanoc commented Mar 13, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Silvanoc commented Mar 13, 2026

Uh oh!

Silvanoc commented Mar 13, 2026

Uh oh!

Silvanoc commented Mar 13, 2026

Uh oh!

nilanjan-samajdar commented Mar 19, 2026

Uh oh!

Silvanoc commented Mar 19, 2026

Uh oh!

nilanjan-samajdar commented Mar 19, 2026

Uh oh!

Silvanoc commented Mar 20, 2026

Uh oh!

phil-abb commented Mar 27, 2026

Uh oh!

phil-abb commented Mar 27, 2026

Uh oh!

stormc commented Mar 27, 2026

Uh oh!

phil-abb commented Mar 27, 2026

Uh oh!

stormc commented Mar 27, 2026

Uh oh!

Silvanoc commented Apr 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Silvanoc commented Apr 2, 2026

Uh oh!

Silvanoc commented Apr 2, 2026

Uh oh!

chrisgclayton commented Apr 20, 2026

Uh oh!

Silvanoc commented Apr 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

6 participants

Silvanoc commented Mar 13, 2026 •

edited

Loading

Silvanoc commented Mar 13, 2026 •

edited

Loading

Silvanoc commented Apr 2, 2026 •

edited

Loading