Skip to content

Design generic UDP publication pipeline for APEx Algorithm Catalogue #32

@emmanuelmathot

Description

@emmanuelmathot

Context

With BAIS2 now merged into the APEx Algorithm Catalogue (#7), we have a validated end-to-end publication path. However, the process was entirely manual: hand-crafting the OGC API record JSON, manually copying images, fixing URLs during review, running QA tests locally, and coordinating across two repositories over several weeks.

This ticket is about designing and implementing a generic, reproducible publication workflow — the tooling, conventions, and automation that make publishing a UDP to APEx a routine step rather than a project in itself.

Problem

The BAIS2 publication surfaced the following pain points:

  1. Record authoring is error-prone — the <alg>.json record requires ~10 interlinked URLs (application, webapp, notebook, preview, thumbnail, provider, platform, service, code, about). Getting them all right — including the refs/heads/main vs branch-ref distinction — took multiple review rounds with the APEx team.
  2. No structured metadata in notebooks — title, description, keywords, license, original evalscript URL, and author attribution are written as prose in markdown cells. There is nothing machine-readable to extract from.
  3. Image generation is ad-hocpreview.png and thumbnail.png were manually exported. There is no convention for size, format, or which notebook output to use.
  4. Validation requires cloning a separate repo — running pytest qa/unittests/tests/test_records.py from apex_algorithms is a manual step disconnected from our own CI.
  5. Two-repo coordination — every publication touches both openeo-udp (source of truth) and apex_algorithms (catalogue registry). There is no automation bridging the two.

Objective

Design a publication pipeline where, given a validated notebook with proper metadata, publishing to APEx requires running a single command (or is triggered automatically on merge).

Design scope

1. Notebook metadata convention

Define a structured metadata cell or sidecar file that each notebook must include. This is the single source of truth for the APEx record. At minimum:

  • id — algorithm identifier (e.g. bais2, ndci)
  • title — human-readable name
  • description — algorithm summary
  • keywords — list of tags
  • license — SPDX identifier (e.g. CC-BY-4.0)
  • original_evalscript — URL to the Sentinel Hub custom script
  • author — original evalscript author name and attribution
  • citation — DOI or reference to the scientific paper
  • backend — target openEO backend URL
  • collection_id — the openEO collection used
  • preview_cell / thumbnail_cell — which notebook output cells to use for image generation (or explicit image paths)

Open question: JSON sidecar file per notebook (e.g. bais2.meta.json) vs. a tagged cell inside the notebook? Sidecar is easier to parse; tagged cell keeps everything in one file.

2. Record generator

A Python CLI/script that:

  • Reads the notebook metadata
  • Reads the exported UDP JSON (already produced by notebooks)
  • Generates the complete <alg>.json OGC API Record with all required links, correctly templated for the apex_algorithms directory structure
  • Generates the webapp URL with proper openEO Editor query parameters
  • Outputs the full directory tree ready to copy into apex_algorithms:
    algorithm_catalog/developmentseed/<ALG_ID>/
    ├── openeo_udp/<ALG_ID>.json
    └── records/
        ├── <ALG_ID>.json
        ├── preview.png
        └── thumbnail.png
    

3. Image extraction

Standardize preview/thumbnail generation:

  • Define target dimensions and format (the APEx catalogue likely has constraints worth documenting)
  • Either extract from notebook output cells automatically (e.g. via nbconvert or nbformat to pull specific cell outputs) or require them as committed files in a known location
  • Resize/crop to thumbnail dimensions

4. Local validation

Integrate APEx QA validation into our own workflow:

  • Vendor or wrap the apex_algorithms QA tooling so it can be run from openeo-udp against generated records without cloning the full apex_algorithms repo
  • Or: add a make validate-apex ALG=bais2 target that runs the checks locally

5. CI/CD automation (GitHub Actions)

On merge to main in openeo-udp, when a notebook and its UDP JSON are present:

  • Extract metadata and generate the APEx record files
  • Run APEx QA validation
  • Optionally: open a draft PR on apex_algorithms via GitHub Actions (cross-repo PR or bot-assisted)
  • At minimum: produce the ready-to-submit files as a CI artifact that can be downloaded and used to manually open the PR

6. Documentation

  • Update docs/publication.md to reflect the automated workflow
  • Update CONTRIBUTING.md to document the metadata convention
  • Provide a worked example showing the full flow from notebook to catalogue entry

Non-goals (for now)

  • Fully automated merge into apex_algorithms without human review — the APEx team requires PR review, and the preview feature only works from their repo. The goal is to automate our side of the preparation.
  • Multi-platform records (e.g. CDSE + TiTiler-openEO in one record) — keep it single-platform for now, extensible later.

Approach

  1. Audit the BAIS2 record — extract the exact schema used and identify which fields are static (provider, platform, code) vs. per-algorithm (title, description, keywords, application URL, images).
  2. Define the metadata spec — propose the convention, get team agreement.
  3. Build the generator — Python script, tested against BAIS2 as the reference.
  4. Retrofit BAIS2 and NDCI — add metadata to existing notebooks, verify the generator reproduces the correct BAIS2 record.
  5. Add CI — GitHub Actions workflow.
  6. Document — update publication guide and contributing docs.

References

Acceptance Criteria

  • Metadata convention defined and documented
  • Generator script produces a valid APEx record from notebook metadata + UDP JSON
  • Generator output passes APEx QA validation for at least BAIS2 (retrofit)
  • CI workflow runs on merge and produces publication-ready artifacts
  • docs/publication.md and CONTRIBUTING.md updated
  • At least one new UDP published using the automated pipeline (proving it works beyond BAIS2)

Metadata

Metadata

Assignees

Labels

No labels
No labels

Type

No type
No fields configured for issues without a type.

Projects

Status

In progress

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions