Summary
The three published sample bundles (ga4, stackoverflow, crypto_bitcoin) are all data catalog style — they describe external resources (BigQuery tables, Stack Overflow tags, Bitcoin data). I built a fourth style: an ML experiment trail (see bundle at /tmp/s6e2-okf-bundle/).
Suggested new type values for ML domain
The SPEC (§4.1) explicitly says types are free, but it would help consumers (visualize, future readers) if a few ML-canonical types were documented:
Experiment — a single training run / iteration (R0, v15f, etc.)
Model — a trained model (e.g., CatBoost BAG L1, OOF 0.9555)
Submission — a file submitted to a Kaggle LB
Lesson — a discovered or validated skill/principle
Competition — a Kaggle competition as a whole (timeline container)
Bundle — entry point for the whole bundle (top-level index.md)
Why a sample bundle helps
The current GA4 bundle shows OKF for static metadata. An experiment-trail bundle would demonstrate:
- Cross-link graph shape — 11 concepts / 23 edges in my mini-bundle, dense and navigable.
- Type-driven color coding in visualize — already works; would just need ML types documented.
- Reference to non-file resources —
resource: https://github.com/.../pull/5 (PR/issue links) and resource: file://... patterns.
- Time-evolving knowledge —
timestamp field shines here: each run is dated, the dashboard.md can be regenerated from latest timestamp.
My S6E2 mini-bundle (12 concepts, 6 types)
The mini-bundle I built covers an AutoGluon rerun of Kaggle Playground Series S6E2 (Heart Disease). It demonstrates:
- 1
Bundle (root)
- 3
Index (subdirectory hubs)
- 3
Experiment (R0 baseline, R0a bug, R0b fix)
- 3
Model (CatBoost BAG L1, LightGBM BAG L1, Weighted Ensemble L3)
- 2
Submission (0/1 vs probability)
- 4
Lesson (4 validated/refined skills)
I'd be happy to donate this as a sample bundle under okf/samples/ if the maintainers want.
Related
Summary
The three published sample bundles (ga4, stackoverflow, crypto_bitcoin) are all data catalog style — they describe external resources (BigQuery tables, Stack Overflow tags, Bitcoin data). I built a fourth style: an ML experiment trail (see bundle at
/tmp/s6e2-okf-bundle/).Suggested new
typevalues for ML domainThe SPEC (§4.1) explicitly says types are free, but it would help consumers (visualize, future readers) if a few ML-canonical types were documented:
Experiment— a single training run / iteration (R0,v15f, etc.)Model— a trained model (e.g., CatBoost BAG L1, OOF 0.9555)Submission— a file submitted to a Kaggle LBLesson— a discovered or validated skill/principleCompetition— a Kaggle competition as a whole (timeline container)Bundle— entry point for the whole bundle (top-levelindex.md)Why a sample bundle helps
The current GA4 bundle shows OKF for static metadata. An experiment-trail bundle would demonstrate:
resource: https://github.com/.../pull/5(PR/issue links) andresource: file://...patterns.timestampfield shines here: each run is dated, the dashboard.md can be regenerated from latesttimestamp.My S6E2 mini-bundle (12 concepts, 6 types)
The mini-bundle I built covers an AutoGluon rerun of Kaggle Playground Series S6E2 (Heart Disease). It demonstrates:
Bundle(root)Index(subdirectory hubs)Experiment(R0 baseline, R0a bug, R0b fix)Model(CatBoost BAG L1, LightGBM BAG L1, Weighted Ensemble L3)Submission(0/1 vs probability)Lesson(4 validated/refined skills)I'd be happy to donate this as a sample bundle under
okf/samples/if the maintainers want.Related
Submissionconcepts useresource: file://and could benefit frompublished_atonce that lands.