Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions docs/_static/custom.css
Original file line number Diff line number Diff line change
Expand Up @@ -23,3 +23,9 @@
color: var(--pst-color-text-base) !important;
font-weight: 500;
}

/* Make the Examples top-level navigation header extra bold and larger */
.bd-links li.toctree-l1 > a {
font-weight: 800 !important;
font-size: 1.05rem !important;
}
1 change: 1 addition & 0 deletions docs/conf.py
Original file line number Diff line number Diff line change
Expand Up @@ -54,6 +54,7 @@
intersphinx_disabled_domains = ["std"]

templates_path = ["_templates"]
exclude_patterns = ["_build", "Thumbs.db", ".DS_Store"]

# -- Options for HTML output

Expand Down
17 changes: 14 additions & 3 deletions docs/guides/examples.md → docs/examples.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,16 @@
# Examples

```{toctree}
:maxdepth: 2
:hidden:

examples/keras_training
examples/jax_training
examples/pytorch_training
examples/gemma4_finetuning
examples/llm_finetuning
```

A catalog of runnable example scripts using Kinetic. Click any card to open the source code on GitHub.

Tier badges:
Expand Down Expand Up @@ -225,7 +236,7 @@ example of forwarding Kaggle credentials into the remote pod.

## Related pages

- [Getting Started](../getting_started.md): your first run, end-to-end.
- [Keras Training](keras_training.md): patterns for Keras users.
- [LLM Fine-tuning](llm_finetuning.md): extended walkthrough using the
- [Getting Started](getting_started.md): your first run, end-to-end.
- [Keras Training](examples/keras_training.md): patterns for Keras users.
- [LLM Fine-tuning](guides/llm_finetuning.md): extended walkthrough using the
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The link to the LLM Fine-tuning guide is broken. The file was moved from docs/guides/ to docs/examples/. Since this file (docs/examples.md) is now located at the root of the docs/ directory, the link should point to examples/llm_finetuning.md.

Suggested change
- [LLM Fine-tuning](guides/llm_finetuning.md): extended walkthrough using the
- [LLM Fine-tuning](examples/llm_finetuning.md): extended walkthrough using the

Gemma examples.
Original file line number Diff line number Diff line change
Expand Up @@ -51,7 +51,7 @@ def fine_tune_gemma4():
...
```

This pattern is covered in depth in the [Environment Variables](env_vars.md) guide.
This pattern is covered in depth in the [Environment Variables](../guides/env_vars.md) guide.

`keras-hub` and its tokenizer backends are not installed in the Kinetic base container by default. Add a `requirements.txt` to your project so Kinetic picks them up automatically:

Expand All @@ -62,7 +62,7 @@ tokenizers==0.22.2
sentencepiece==0.2.1
```

Kinetic detects changes to this file and rebuilds the container only when needed. See the [Managing Dependencies](dependencies.md) guide for details.
Kinetic detects changes to this file and rebuilds the container only when needed. See the [Managing Dependencies](../guides/dependencies.md) guide for details.

## Fine-tuning with LoRA

Expand Down Expand Up @@ -326,5 +326,5 @@ kinetic down --project your-project-id

## Next Steps

- **Checkpointing during training:** use Orbax to save intermediate checkpoints so a long run can resume if interrupted. See the [Checkpointing](checkpointing.md) guide.
- **Distributed training:** scale to larger TPU slices or multiple hosts. See the [Distributed Training](distributed_training.md) guide.
- **Checkpointing during training:** use Orbax to save intermediate checkpoints so a long run can resume if interrupted. See the [Checkpointing](../guides/checkpointing.md) guide.
- **Distributed training:** scale to larger TPU slices or multiple hosts. See the [Distributed Training](../guides/distributed_training.md) guide.
16 changes: 8 additions & 8 deletions docs/guides/jax_training.md → docs/examples/jax_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -70,7 +70,7 @@ for you:
in `requirements.txt`.
- **JAX packages in your `requirements.txt` are filtered out** before
install so they don't shadow the accelerator-correct copy in the
image. See [Dependencies](dependencies.md) for the filter behavior.
image. See [Dependencies](../guides/dependencies.md) for the filter behavior.

Inside the function, `jax.devices()` returns whatever the pod sees: an
8-chip TPU slice for `tpu-v6e-8`, an 8-device array for
Expand Down Expand Up @@ -114,7 +114,7 @@ def train_distributed():
```

Without `backend="pathways"`, multi-host JAX collectives won't have a
working coordinator. See [Distributed Training](distributed_training.md)
working coordinator. See [Distributed Training](../guides/distributed_training.md)
for the full multi-host setup.

## Data
Expand Down Expand Up @@ -146,21 +146,21 @@ train(Data("gs://my-bucket/dataset/"))
train(Data("gs://my-bucket/large/", fuse=True))
```

`Data` accepts both local paths and `gs://` URIs. See [Data](data.md)
`Data` accepts both local paths and `gs://` URIs. See [Data](../guides/data.md)
for the decision matrix between downloaded, FUSE-mounted, and direct
access patterns.

## Next steps

- [Distributed Training](distributed_training.md) — multi-host JAX with
- [Distributed Training](../guides/distributed_training.md) — multi-host JAX with
Pathways.
- [Checkpointing](checkpointing.md) — Orbax checkpoint patterns under
- [Checkpointing](../guides/checkpointing.md) — Orbax checkpoint patterns under
`KINETIC_OUTPUT_DIR`.

## Related pages

- [Distributed Training](distributed_training.md) — Pathways and
- [Distributed Training](../guides/distributed_training.md) — Pathways and
multi-host coordination.
- [Dependencies](dependencies.md) — JAX filtering and what gets
- [Dependencies](../guides/dependencies.md) — JAX filtering and what gets
installed.
- [Checkpointing](checkpointing.md) — Orbax + `KINETIC_OUTPUT_DIR`.
- [Checkpointing](../guides/checkpointing.md) — Orbax + `KINETIC_OUTPUT_DIR`.
20 changes: 10 additions & 10 deletions docs/guides/keras_training.md → docs/examples/keras_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -44,7 +44,7 @@ A few things to note:
[Accelerators](../accelerators.md).

For the canonical end-to-end example with a real dataset, see
[`fashion_mnist.py`](examples.md) (first entry under Quickstart).
[`fashion_mnist.py`](../examples.md) (first entry under Quickstart).

## How to think about it

Expand All @@ -53,7 +53,7 @@ remote node. That has two practical consequences:

- **No local state crosses the boundary.** Anything the function needs
must either be passed as an argument, captured by closure, or shipped
via [`kinetic.Data`](data.md). Locally-loaded variables that you reference
via [`kinetic.Data`](../guides/data.md). Locally-loaded variables that you reference
by global name will not be there on the remote.
- **The Keras backend is whatever the remote has installed.** By default
Kinetic's prebuilt and bundled images use JAX. Set `KERAS_BACKEND` if
Expand All @@ -76,8 +76,8 @@ def train_distributed():
...
```

See [Distributed Training](distributed_training.md) for the full
multi-host setup, and [LLM Fine-tuning](llm_finetuning.md) for a
See [Distributed Training](../guides/distributed_training.md) for the full
multi-host setup, and [LLM Fine-tuning](../guides/llm_finetuning.md) for a
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The link to the LLM Fine-tuning guide is incorrect. Both keras_training.md and llm_finetuning.md are now located in the same docs/examples/ directory, so the link should be relative to the current directory rather than pointing to the old guides/ path.

Suggested change
multi-host setup, and [LLM Fine-tuning](../guides/llm_finetuning.md) for a
multi-host setup, and [LLM Fine-tuning](llm_finetuning.md) for a

concrete Gemma example.

## Data
Expand Down Expand Up @@ -109,20 +109,20 @@ train(Data("gs://my-bucket/dataset/"))
train(Data("gs://my-bucket/large/", fuse=True))
```

`Data` accepts both local paths and `gs://` URIs. See [Data](data.md)
`Data` accepts both local paths and `gs://` URIs. See [Data](../guides/data.md)
for the decision matrix between downloaded, FUSE-mounted, and direct
access patterns.

## Next steps

- [`fashion_mnist.py`](examples.md) — full working example with a real
- [`fashion_mnist.py`](../examples.md) — full working example with a real
dataset (first entry under Quickstart).
- [Checkpointing](checkpointing.md) — persist model weights and resume
- [Checkpointing](../guides/checkpointing.md) — persist model weights and resume
across runs.

## Related pages

- [Data](data.md) — shipping local files and reading from GCS.
- [Checkpointing](checkpointing.md) — `KINETIC_OUTPUT_DIR` and resumable
- [Data](../guides/data.md) — shipping local files and reading from GCS.
- [Checkpointing](../guides/checkpointing.md) — `KINETIC_OUTPUT_DIR` and resumable
training.
- [LLM Fine-tuning](llm_finetuning.md) — KerasHub + Gemma walkthrough.
- [LLM Fine-tuning](../guides/llm_finetuning.md) — KerasHub + Gemma walkthrough.
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The link to the LLM Fine-tuning guide is incorrect. Since both files are in the same directory (docs/examples/), use a direct relative link.

Suggested change
- [LLM Fine-tuning](../guides/llm_finetuning.md) — KerasHub + Gemma walkthrough.
- [LLM Fine-tuning](llm_finetuning.md) — KerasHub + Gemma walkthrough.

Original file line number Diff line number Diff line change
Expand Up @@ -63,5 +63,5 @@ See the [Distributed Training](distributed_training.md) guide for more details o
Pathways setup that LLM fine-tuning typically needs.
- [Checkpointing](checkpointing.md) — Orbax + `KINETIC_OUTPUT_DIR`
for resumable fine-tuning runs.
- [Examples](examples.md) — the Gemma SFT examples are full
- [Examples](../examples.md) — the Gemma SFT examples are full
end-to-end LLM fine-tuning walkthroughs.
Original file line number Diff line number Diff line change
Expand Up @@ -11,7 +11,7 @@ torch
torchvision
```

Kinetic will install these in the remote container automatically. See [Managing Dependencies](dependencies.md) for details on how dependency detection works.
Kinetic will install these in the remote container automatically. See [Managing Dependencies](../guides/dependencies.md) for details on how dependency detection works.

## Basic Usage

Expand Down Expand Up @@ -106,9 +106,9 @@ def train():

## Related pages

- [Dependencies](dependencies.md) — how `torch` gets installed in
- [Dependencies](../guides/dependencies.md) — how `torch` gets installed in
the remote container.
- [Accelerators](../accelerators.md) — full list of GPUs and
multi-GPU configurations.
- [Cost Optimization](cost_optimization.md) — spot capacity for
- [Cost Optimization](../guides/cost_optimization.md) — spot capacity for
GPU workloads.
2 changes: 1 addition & 1 deletion docs/getting_started.md
Original file line number Diff line number Diff line change
Expand Up @@ -128,7 +128,7 @@ python fashion_mnist.py

After your first run works, the most useful follow-ups are:

- [Examples](guides/examples.md): a catalog of runnable scripts that
- [Examples](examples.md): a catalog of runnable scripts that
cover async jobs, data, checkpoints, parallel sweeps, and LLM
fine-tuning. The fastest way to see real patterns end to end.
- [Execution Modes](guides/execution_modes.md): bundled vs prebuilt
Expand Down
File renamed without changes.
File renamed without changes.
File renamed without changes.
File renamed without changes.
2 changes: 1 addition & 1 deletion docs/guides/data.md
Original file line number Diff line number Diff line change
Expand Up @@ -156,7 +156,7 @@ payload — no redundant upload of the same bytes.
## Related pages

- [Checkpointing](checkpointing.md): durable outputs and `KINETIC_OUTPUT_DIR`.
- [Examples](examples.md): walks through the Data API end-to-end.
- [Examples](../examples.md): walks through the Data API end-to-end.
- [Cost Optimization](cost_optimization.md): FUSE vs download tradeoffs
for repeated jobs.

Expand Down
4 changes: 2 additions & 2 deletions docs/guides/distributed_training.md
Original file line number Diff line number Diff line change
Expand Up @@ -62,8 +62,8 @@ def train_data_parallel():
```

For a richer end-to-end example using a real model, see
[`pathways_example.py`](examples.md) and
[`gemma_sft_pathways_distributed.py`](examples.md).
[`pathways_example.py`](../examples.md) and
[`gemma_sft_pathways_distributed.py`](../examples.md).

## How to think about it

Expand Down
File renamed without changes.
25 changes: 8 additions & 17 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -14,29 +14,20 @@ Kinetic: Run ML workloads on cloud TPUs and GPUs
:caption: Core Workflows
:hidden:

guides/keras_training
guides/jax_training
advanced/async_jobs
guides/data
guides/checkpointing
guides/dependencies
guides/env_vars
guides/async_jobs
guides/batched_jobs
guides/debugging
guides/examples

.. toctree::
:caption: Scaling and Operations
:hidden:

guides/cost_optimization
advanced/clusters
guides/clusters
guides/distributed_training
advanced/batched_jobs
guides/llm_finetuning
guides/gemma4_finetuning
guides/pytorch_training
advanced/containers
advanced/reservations
guides/containers
guides/reservations

examples

.. toctree::
:caption: Reference
Expand Down Expand Up @@ -87,7 +78,7 @@ Three entry points cover what most new users need first:
* - Install, point at a cluster, and run a real Keras job in minutes.
:doc:`Getting Started <getting_started>`.
- Switch from blocking ``run()`` to detached ``submit()`` for jobs
that take hours. :doc:`Detached Jobs <advanced/async_jobs>`.
that take hours. :doc:`Detached Jobs <guides/async_jobs>`.
- Ship local files in, write durable artifacts back out via
``KINETIC_OUTPUT_DIR``. :doc:`Data <guides/data>` and
:doc:`Checkpointing <guides/checkpointing>`.
Expand Down
Loading