Skip to content

Sample bundle: ML experiment trail (suggested types: Experiment, Model, Submission, Lesson) #60

@topprismdata

Description

@topprismdata

Summary

The three published sample bundles (ga4, stackoverflow, crypto_bitcoin) are all data catalog style — they describe external resources (BigQuery tables, Stack Overflow tags, Bitcoin data). I built a fourth style: an ML experiment trail (see bundle at /tmp/s6e2-okf-bundle/).

Suggested new type values for ML domain

The SPEC (§4.1) explicitly says types are free, but it would help consumers (visualize, future readers) if a few ML-canonical types were documented:

  • Experiment — a single training run / iteration (R0, v15f, etc.)
  • Model — a trained model (e.g., CatBoost BAG L1, OOF 0.9555)
  • Submission — a file submitted to a Kaggle LB
  • Lesson — a discovered or validated skill/principle
  • Competition — a Kaggle competition as a whole (timeline container)
  • Bundle — entry point for the whole bundle (top-level index.md)

Why a sample bundle helps

The current GA4 bundle shows OKF for static metadata. An experiment-trail bundle would demonstrate:

  1. Cross-link graph shape — 11 concepts / 23 edges in my mini-bundle, dense and navigable.
  2. Type-driven color coding in visualize — already works; would just need ML types documented.
  3. Reference to non-file resourcesresource: https://github.com/.../pull/5 (PR/issue links) and resource: file://... patterns.
  4. Time-evolving knowledgetimestamp field shines here: each run is dated, the dashboard.md can be regenerated from latest timestamp.

My S6E2 mini-bundle (12 concepts, 6 types)

The mini-bundle I built covers an AutoGluon rerun of Kaggle Playground Series S6E2 (Heart Disease). It demonstrates:

  • 1 Bundle (root)
  • 3 Index (subdirectory hubs)
  • 3 Experiment (R0 baseline, R0a bug, R0b fix)
  • 3 Model (CatBoost BAG L1, LightGBM BAG L1, Weighted Ensemble L3)
  • 2 Submission (0/1 vs probability)
  • 4 Lesson (4 validated/refined skills)

I'd be happy to donate this as a sample bundle under okf/samples/ if the maintainers want.

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions