studio: show HF model download progress in training start overlay#4894
studio: show HF model download progress in training start overlay#4894danielhanchen wants to merge 1 commit intomainfrom
Conversation
During the training setup phase, the overlay only displayed a static "Loading model..." line while model weights were being downloaded from Hugging Face. On slow connections this looked like the app had frozen. This adds a small self-contained progress block inside the existing TrainingStartOverlay that polls the existing GET /api/models/download-progress endpoint and renders a Progress bar with bytes downloaded, total bytes, and percent complete. Notes: - Frontend only change. No backend, worker, SSE, or runtime store edits. - Reuses the existing getDownloadProgress client wrapper and the existing /api/models/download-progress endpoint that already scans the HF blob cache for completed and .incomplete files. - selectedModel is read directly from useTrainingConfigStore inside the overlay, so no prop drilling and live-training-view.tsx is unchanged. - Polling runs at 1500 ms and is gated on the HF repo regex (^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$), the same regex the backend uses, so local paths and empty form state never hit the endpoint. - Polling stops once progress reaches 1.0 so the bar can stay at 100 until the overlay hides on the first training step. - Network errors are silently swallowed, matching the chat side flow (the bar simply freezes at the last value). - When downloadedBytes is 0 the block is hidden entirely, so cached models do not flash a progress bar. - When the HF API cannot determine the total size, the block falls back to "X downloaded" with no percent and no bar. Verified with bun run build (tsc -b plus vite build, no TypeScript errors).
There was a problem hiding this comment.
Code Review
This pull request introduces a model download progress indicator to the training start overlay. It adds a custom hook, useModelDownloadProgress, which polls the backend for download status and updates the UI with a progress bar and formatted byte counts. A review comment suggests resetting the download state when the model name changes to prevent the UI from displaying stale progress data from previous runs.
| if (!modelName || !HF_REPO_REGEX.test(modelName) || !shouldPoll) { | ||
| return; | ||
| } | ||
|
|
There was a problem hiding this comment.
When modelName changes or a new training run starts, the state should be reset to avoid showing stale progress data from a previous run while waiting for the first poll of the new model to complete. This prevents the progress bar from flickering with old values, ensuring a consistent user experience during transient states.
| setState(EMPTY_DOWNLOAD_STATE); | |
References
- When a UI element depends on data from the backend, provide a reasonable fallback or reset state to handle transient states like waiting for a backend response to avoid poor user experience.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: f13e709ac2
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| <AnimatedSpan className="mt-2 text-muted-foreground"> | ||
| {`> ${message || "starting training..."} | waiting for first step... (${currentStep})`} | ||
| </AnimatedSpan> | ||
| {download.downloadedBytes > 0 ? ( |
There was a problem hiding this comment.
Avoid showing download banner for fully cached models
The new render guard download.downloadedBytes > 0 also matches models that are already fully cached, because /api/models/download-progress reports nonzero downloaded_bytes for completed blobs as well. In that case the overlay now shows “Downloading model weights...” with 100% even though no download is happening, which is misleading for every cached-model start and can look like unnecessary startup work.
Useful? React with 👍 / 👎.
Summary
The training start overlay used to show a static "Loading model..." line while model weights were being pulled from Hugging Face. On slow connections this looked like Studio had frozen, with no indication that anything was happening.
This adds a small progress block inside
TrainingStartOverlaythat polls the existingGET /api/models/download-progressendpoint and shows bytes downloaded, total bytes, and percent complete with aProgressbar.Single file change. No backend, worker, SSE, or runtime store edits.
Before / After
Before:
After (mid download):
Implementation
All changes are in
studio/frontend/src/features/studio/training-start-overlay.tsx:useModelDownloadProgress(modelName)hook, kept local to this file since there is only one consumer.getDownloadProgress(modelName)every 1500 ms while the overlay is mounted and the runtime is in a starting or preparing phase (configuring,downloading_model,downloading_dataset,loading_model,loading_dataset).^[A-Za-z0-9._-]+/[A-Za-z0-9._-]+$, the same regex the backend already uses in_VALID_REPO_ID. Local paths and empty form state never hit the endpoint.progress >= 1.0so the bar can stay at 100 until the overlay hides on the first training step.use-chat-model-runtime.ts. The bar simply freezes at the last value rather than disappearing.modelNamechange so a new run with a different model starts a fresh poll.formatByteshelper for B / KB / MB / GB output.selectedModelis read directly fromuseTrainingConfigStoreinside the overlay.live-training-view.tsxis unchanged and there is no prop drilling.Reused, not added
getDownloadProgressfromstudio/frontend/src/features/chat/api/chat-api.tsProgressfromstudio/frontend/src/components/ui/progress.tsxuseTrainingConfigStoreanduseTrainingRuntimeStorefrom@/features/trainingGET /api/models/download-progressinstudio/backend/routes/models.py(auth gated, scans the HF blob cache for completed and.incompletefiles, returns{downloaded_bytes, expected_bytes, progress})No new endpoints, no new dependencies, no backend restart required. Studio serves
studio/frontend/dist/as static files, so a freshbun run buildis picked up on the next page load.Edge cases handled
downloadedBytes === 0from the endpoint), overlay transitions straight to training. No flicker./models/foo)try/catchswallows, bar freezes at last value.expected_bytes: 0, UI falls back to "X.X GB downloaded" with no percent and no bar.modelNamedependency changes, cleanup fires, fresh poll starts.What is explicitly not changed
live-training-view.tsxis unchangedTest plan
cd studio/frontend && bun run buildrunstsc -b && vite buildcleanly with zero TypeScript errors~/.cache/huggingface/hub/. Confirm the overlay transitions straight to training with no progress block flash~/.cache/huggingface/hub/models--unsloth--<small-model>and start a training run with that model. Confirm the bar appears within ~1.5 s and advances smoothly to 100 percent. ConfirmGET /api/models/download-progress?repo_id=...fires every ~1.5 s in the Network tabdownload-progressrequests are made and no bar appearslogs/studio_backend.logshows no new errors after the change