Context
With BAIS2 now merged into the APEx Algorithm Catalogue (#7), we have a validated end-to-end publication path. However, the process was entirely manual: hand-crafting the OGC API record JSON, manually copying images, fixing URLs during review, running QA tests locally, and coordinating across two repositories over several weeks.
This ticket is about designing and implementing a generic, reproducible publication workflow — the tooling, conventions, and automation that make publishing a UDP to APEx a routine step rather than a project in itself.
Problem
The BAIS2 publication surfaced the following pain points:
- Record authoring is error-prone — the
<alg>.json record requires ~10 interlinked URLs (application, webapp, notebook, preview, thumbnail, provider, platform, service, code, about). Getting them all right — including the refs/heads/main vs branch-ref distinction — took multiple review rounds with the APEx team.
- No structured metadata in notebooks — title, description, keywords, license, original evalscript URL, and author attribution are written as prose in markdown cells. There is nothing machine-readable to extract from.
- Image generation is ad-hoc —
preview.png and thumbnail.png were manually exported. There is no convention for size, format, or which notebook output to use.
- Validation requires cloning a separate repo — running
pytest qa/unittests/tests/test_records.py from apex_algorithms is a manual step disconnected from our own CI.
- Two-repo coordination — every publication touches both
openeo-udp (source of truth) and apex_algorithms (catalogue registry). There is no automation bridging the two.
Objective
Design a publication pipeline where, given a validated notebook with proper metadata, publishing to APEx requires running a single command (or is triggered automatically on merge).
Design scope
1. Notebook metadata convention
Define a structured metadata cell or sidecar file that each notebook must include. This is the single source of truth for the APEx record. At minimum:
id — algorithm identifier (e.g. bais2, ndci)
title — human-readable name
description — algorithm summary
keywords — list of tags
license — SPDX identifier (e.g. CC-BY-4.0)
original_evalscript — URL to the Sentinel Hub custom script
author — original evalscript author name and attribution
citation — DOI or reference to the scientific paper
backend — target openEO backend URL
collection_id — the openEO collection used
preview_cell / thumbnail_cell — which notebook output cells to use for image generation (or explicit image paths)
Open question: JSON sidecar file per notebook (e.g. bais2.meta.json) vs. a tagged cell inside the notebook? Sidecar is easier to parse; tagged cell keeps everything in one file.
2. Record generator
A Python CLI/script that:
- Reads the notebook metadata
- Reads the exported UDP JSON (already produced by notebooks)
- Generates the complete
<alg>.json OGC API Record with all required links, correctly templated for the apex_algorithms directory structure
- Generates the
webapp URL with proper openEO Editor query parameters
- Outputs the full directory tree ready to copy into
apex_algorithms:
algorithm_catalog/developmentseed/<ALG_ID>/
├── openeo_udp/<ALG_ID>.json
└── records/
├── <ALG_ID>.json
├── preview.png
└── thumbnail.png
3. Image extraction
Standardize preview/thumbnail generation:
- Define target dimensions and format (the APEx catalogue likely has constraints worth documenting)
- Either extract from notebook output cells automatically (e.g. via
nbconvert or nbformat to pull specific cell outputs) or require them as committed files in a known location
- Resize/crop to thumbnail dimensions
4. Local validation
Integrate APEx QA validation into our own workflow:
- Vendor or wrap the
apex_algorithms QA tooling so it can be run from openeo-udp against generated records without cloning the full apex_algorithms repo
- Or: add a
make validate-apex ALG=bais2 target that runs the checks locally
5. CI/CD automation (GitHub Actions)
On merge to main in openeo-udp, when a notebook and its UDP JSON are present:
- Extract metadata and generate the APEx record files
- Run APEx QA validation
- Optionally: open a draft PR on
apex_algorithms via GitHub Actions (cross-repo PR or bot-assisted)
- At minimum: produce the ready-to-submit files as a CI artifact that can be downloaded and used to manually open the PR
6. Documentation
- Update
docs/publication.md to reflect the automated workflow
- Update
CONTRIBUTING.md to document the metadata convention
- Provide a worked example showing the full flow from notebook to catalogue entry
Non-goals (for now)
- Fully automated merge into
apex_algorithms without human review — the APEx team requires PR review, and the preview feature only works from their repo. The goal is to automate our side of the preparation.
- Multi-platform records (e.g. CDSE + TiTiler-openEO in one record) — keep it single-platform for now, extensible later.
Approach
- Audit the BAIS2 record — extract the exact schema used and identify which fields are static (provider, platform, code) vs. per-algorithm (title, description, keywords, application URL, images).
- Define the metadata spec — propose the convention, get team agreement.
- Build the generator — Python script, tested against BAIS2 as the reference.
- Retrofit BAIS2 and NDCI — add metadata to existing notebooks, verify the generator reproduces the correct BAIS2 record.
- Add CI — GitHub Actions workflow.
- Document — update publication guide and contributing docs.
References
Acceptance Criteria
Context
With BAIS2 now merged into the APEx Algorithm Catalogue (#7), we have a validated end-to-end publication path. However, the process was entirely manual: hand-crafting the OGC API record JSON, manually copying images, fixing URLs during review, running QA tests locally, and coordinating across two repositories over several weeks.
This ticket is about designing and implementing a generic, reproducible publication workflow — the tooling, conventions, and automation that make publishing a UDP to APEx a routine step rather than a project in itself.
Problem
The BAIS2 publication surfaced the following pain points:
<alg>.jsonrecord requires ~10 interlinked URLs (application, webapp, notebook, preview, thumbnail, provider, platform, service, code, about). Getting them all right — including therefs/heads/mainvs branch-ref distinction — took multiple review rounds with the APEx team.preview.pngandthumbnail.pngwere manually exported. There is no convention for size, format, or which notebook output to use.pytest qa/unittests/tests/test_records.pyfromapex_algorithmsis a manual step disconnected from our own CI.openeo-udp(source of truth) andapex_algorithms(catalogue registry). There is no automation bridging the two.Objective
Design a publication pipeline where, given a validated notebook with proper metadata, publishing to APEx requires running a single command (or is triggered automatically on merge).
Design scope
1. Notebook metadata convention
Define a structured metadata cell or sidecar file that each notebook must include. This is the single source of truth for the APEx record. At minimum:
id— algorithm identifier (e.g.bais2,ndci)title— human-readable namedescription— algorithm summarykeywords— list of tagslicense— SPDX identifier (e.g.CC-BY-4.0)original_evalscript— URL to the Sentinel Hub custom scriptauthor— original evalscript author name and attributioncitation— DOI or reference to the scientific paperbackend— target openEO backend URLcollection_id— the openEO collection usedpreview_cell/thumbnail_cell— which notebook output cells to use for image generation (or explicit image paths)Open question: JSON sidecar file per notebook (e.g.
bais2.meta.json) vs. a tagged cell inside the notebook? Sidecar is easier to parse; tagged cell keeps everything in one file.2. Record generator
A Python CLI/script that:
<alg>.jsonOGC API Record with all required links, correctly templated for theapex_algorithmsdirectory structurewebappURL with proper openEO Editor query parametersapex_algorithms:3. Image extraction
Standardize preview/thumbnail generation:
nbconvertornbformatto pull specific cell outputs) or require them as committed files in a known location4. Local validation
Integrate APEx QA validation into our own workflow:
apex_algorithmsQA tooling so it can be run fromopeneo-udpagainst generated records without cloning the fullapex_algorithmsrepomake validate-apex ALG=bais2target that runs the checks locally5. CI/CD automation (GitHub Actions)
On merge to
maininopeneo-udp, when a notebook and its UDP JSON are present:apex_algorithmsvia GitHub Actions (cross-repo PR or bot-assisted)6. Documentation
docs/publication.mdto reflect the automated workflowCONTRIBUTING.mdto document the metadata conventionNon-goals (for now)
apex_algorithmswithout human review — the APEx team requires PR review, and the preview feature only works from their repo. The goal is to automate our side of the preparation.Approach
References
Acceptance Criteria
docs/publication.mdandCONTRIBUTING.mdupdated