English · Русский
Drag a 128 kbps MP3 in — get a full-bandwidth stereo FLAC out, with an A/B player and the spectrogram proof.
Hear it: lossy input (mp3-128k) → restored → lossless reference · Artificial.Music — “Gold” (CC-BY 4.0)
Streaming taught us that music is access, not a thing — and access keeps getting pricier and less reliable. People are going back to their own libraries — and those libraries are full of 128–192 kbps files from the 2000s, hard-capped at ~17 kHz by the encoders of the era. north-star is a complete self-hosted station for those collections: import your folders (tags + cover art preserved), restore lossy files to full-bandwidth FLAC, listen in a built-in web player, and — if you want to go deeper — train the model on your own collection and watch every step live.
Scientific honesty: the model synthesizes a plausible reconstruction of the missing band. It does not recover the original lost data, and the UI says so.
Everything runs locally. Your files never leave your machine.
restoreCLI — the daily driver. Any codec in (mp3/aac/opus/…), 44.1 kHz/24-bit stereo FLAC out, source tags + cover art carried over, input folder tree mirrored, already-done files skipped. ~15× realtime per channel on an M4 Max; runs fully offline after the first checkpoint fetch.- Web UI for the rest: drag-drop restore with synchronized A/B/C player and before/after spectrograms, music library with covers, live training dashboards, blind A/B harness, experiment history.
- Deficit-gated synthesis (this repo's contribution): the model touches a time-frequency band only where the input falls measurably below the model's own estimate — codec holes get filled, intact bands pass through bit-exact. Calibrated so the residual perceptual cost vs the input is −0.020 MOS, and on mp3-96k the restoration beats the input on ViSQOL.
- Honest benchmarking built in: paired protocols, bootstrap CIs, Wilcoxon tests, per-sample persistence, OOD set, three systems compared. All scripts in the repo.
- Train on your own collection: degradation pipeline, live-metric trainer, by-artist anti-leak splits, fake-lossless detector that keeps transcodes out of your targets.
Paired benchmark — held-out 100 tracks × 7 codec profiles = 700 pairs (mp3 96/128/192k, aac 96/128k, opus 64/96k), both systems restoring the byte-identical degraded input, by-artist disjoint split, delay-aligned degradation, paired-bootstrap 95% CIs + Wilcoxon over per-sample results. Ours: 8.8 M-param magnitude U-Net (freq self-attention, adversarial fine-tune, phase reuse, deficit-gated bandwidth gate). Baseline: the real Apollo (ICASSP 2025) checkpoint, 16.5 M params, at its intended 44.1 kHz config.
| metric | lossy input | north-star | Apollo | Δ (ours − Apollo) | 95% CI | win-rate |
|---|---|---|---|---|---|---|
| LSD ↓ | 12.23 | 6.44 | 8.24 | −1.80 | [−1.95, −1.64] | 80% |
| HF-LSD ↓ | 19.45 | 9.43 | 11.42 | −1.99 | [−2.21, −1.76] | 75% |
| MR-STFT ↓ | 0.875 | 0.454 | 0.647 | −0.193 | [−0.208, −0.177] | 84% |
| SI-SDR ↑ | 21.98 | 21.84 | 19.09 | +2.74 | [+2.45, +3.04] | 89% |
| ViSQOL ↑ | 4.652 | 4.633 | 4.362 | +0.271 | [+0.242, +0.300] | 87% |
5/5 metrics, every CI excluding zero (Wilcoxon p ≤ 1e−53), with a ~1.9× smaller, fully self-built model. The mechanism is the point: the gate preserves the codec's near-transparent low band exactly and only synthesizes the missing top (HF-LSD improves by 10 dB), while full-spectrum regeneration pays ~3 dB of waveform fidelity for touching what was already fine.
It holds up away from home turf:
- Out-of-distribution (51 openly-licensed external tracks — netlabel electronica/ambient + Musopen orchestral, 357 pairs): 5/5 again with larger margins — ViSQOL Δ +0.628 [+0.57, +0.69], win-rate 94%; our residual harm vs the input is the same −0.020 MOS as in-distribution.
- MP3-only (Apollo's exact training degradation, 300 pairs): 5/5, all significant.
- vs AudioSR (ICASSP 2024 diffusion SR, 105 pairs): 98–100% win-rate on all five metrics — universal super-resolution is not codec restoration.
The deficit-gated gate (production default) nearly eliminates the perceptual cost of synthesis on near-transparent profiles — and on mp3-96k the restoration now beats the input.
Honest caveats: no human MUSHRA yet (ViSQOL correlates only moderately with human
MOS for neural audio and is near-saturated here); the SI-SDR win is by design
(low-band preservation); the held-out corpus is one collection. Full story, including
two benchmark bugs we found in our own pipeline and what they overturned:
docs/ACHIEVEMENTS.md, research log
docs/ml-research.md, and the long-form article in
docs/article/.
Prereqs: uv, ffmpeg.
git clone https://github.com/Lercas/north-star && cd north-star/backend
uv sync
export NORTH_STAR_CKPT_URL=<direct URL to a published checkpoint .pt> # see Releases
uv run python scripts/restore.py ~/Music/lossy --out ~/Music/restored -r --skip-losslessThe checkpoint is cached under ~/.cache/north-star/ on first run; everything after
that is fully offline. Useful flags: --dry-run, --format wav, --no-tags,
--no-adaptive (fixed additive gate), --no-gate (raw model output; not recommended).
A publicly distributable checkpoint (trained on openly-licensed music) is in preparation — see License & rights. Until then: bring your own checkpoint, or train one on your collection (Path B).
Prereqs: just Docker.
git clone https://github.com/Lercas/north-star && cd north-star
cp .env.example .env # fill the secrets
export MUSIC_DIR=~/Music # mounted read-only at /music
docker compose --profile app up -d --build # infra + api + worker + web
open http://localhost:8080 # log in with ADMIN_EMAIL/ADMIN_PASSWORDImport your library from the UI ("Import folder" → /music) or from the CLI:
docker compose exec api python scripts/import_library.py /musicInside Docker, torch runs on CPU — plenty for restoring and serving the library
(on Linux you can pass --build-arg TORCH_INDEX=https://download.pytorch.org/whl/cu124
for CUDA). For training, prefer the native path below.
Prereqs: Docker (infra), uv, Node 22 + pnpm, ffmpeg.
cp .env.example .env # fill the secrets
docker compose up -d # infra only: MinIO, Postgres, Redis
cd backend && uv sync && uv run alembic upgrade head
uv run uvicorn app.main:app --port 8000 # API
uv run celery -A app.workers.celery_app:celery_app worker # jobs (separate shell)
cd ../frontend && pnpm install && pnpm dev # http://localhost:5173Log in with ADMIN_EMAIL/ADMIN_PASSWORD from .env, then import your music:
cd backend && uv run python scripts/import_library.py ~/Music
# or the "Import folder" button in the web UIEverything lives on your machine. The app layer runs natively (Apple Silicon/MPS or
CUDA — fastest for training) or fully in Docker (--profile app, CPU torch);
the infra layer is Docker either way, locally or on a home server (*_HOST in .env).
flowchart LR
subgraph you["Your machine — nothing leaves it"]
direction LR
MUS[/"your music folders"/]
subgraph app["app layer — native (MPS/CUDA) or Docker profile (CPU)"]
UI["React UI<br/>library · restore · training · A/B"]
API["FastAPI<br/>REST + SSE live progress"]
WK["Celery worker<br/>import · train · infer"]
CLI["restore CLI<br/>batch, offline-capable"]
end
subgraph infra["infra layer — Docker"]
PG[("Postgres<br/>metadata · by-artist splits")]
S3[("MinIO<br/>audio · derived · models")]
RD[("Redis<br/>queue")]
end
CACHE[("checkpoint cache<br/>~/.cache/north-star")]
end
URL(["NORTH_STAR_CKPT_URL<br/>fresh-clone bootstrap"])
MUS -- "import: tags + covers" --> API
UI <--> API
API <--> PG
API <--> S3
API -- enqueue --> RD
RD --> WK
WK <--> S3
WK <--> PG
CLI <--> CACHE
S3 -. "first run" .-> CACHE
URL -. "no-infra path" .-> CACHE
classDef brand fill:#0e7490,stroke:#38bdf8,color:#e6f6ff
classDef store fill:#13202b,stroke:#38bdf8,color:#bfe9ff
classDef ext fill:#1a1426,stroke:#8b5cf6,color:#e9d5ff
class UI,API,WK,CLI brand
class PG,S3,RD,CACHE store
class URL,MUS ext
The model itself — log-magnitude STFT U-Net (base 32, depth 4, frequency self-attention, 8.8 M params) + adversarial fine-tune (MPD + MRD), phase reuse from the input, and the deficit-gated bandwidth gate at inference:
flowchart LR
IN["lossy input<br/>44.1 kHz"] --> ST["STFT<br/>1024 / 256, Hann"]
ST -- "log-magnitude" --> UN["U-Net 8.8M<br/>freq self-attention<br/>adv. fine-tuned"]
UN -- "predicted magnitude" --> G{"deficit gate<br/>λ = clip((d−10 dB)/10 dB)<br/>per 1 kHz × 5 frames"}
ST -- "input magnitude" --> G
G -- "(1−λ)·input + λ·max(input, pred)" --> IS["iSTFT"]
ST -- "phase — reused" --> IS
IS --> OUT["restored<br/>44.1 kHz / 24-bit FLAC"]
classDef brand fill:#0e7490,stroke:#38bdf8,color:#e6f6ff
classDef io fill:#13202b,stroke:#38bdf8,color:#bfe9ff
class UN,G brand
class IN,ST,IS,OUT io
Full synthesis in codec holes, bit-exact passthrough where the input is already intact — that single decision is behind both the SI-SDR win and the OOD robustness.
Stack: Python 3.12 · FastAPI · Celery + Redis · PostgreSQL + SQLAlchemy + Alembic · MinIO · PyTorch/torchaudio · React + TS + Vite + Tailwind · WaveSurfer.js · SSE.
Every number above comes from a script in backend/scripts/:
| script | what it does |
|---|---|
bench_paired.py |
the authoritative paired run: one degrade() per sample, both systems on identical input, per-sample JSONL, bootstrap CI + Wilcoxon |
bench_adaptive.py |
validates a gate config against a stored paired run (reuses its inputs deterministically) |
bench_ood.py |
same paired protocol over any folder of lossless files (the CC set) |
bench_audiosr.py |
AudioSR baseline on the same inputs (needs Python 3.11) |
fad_eval.py |
Fréchet Audio Distance, VGGish (≤8 kHz — blind to BWE, probes the low band) and music-CLAP (full band) |
tune_lambda.py |
the λ-threshold grid sweep |
fetch_cc_corpus.py |
builds the openly-licensed corpus from archive.org with a license manifest |
Degradation is ffmpeg round-trips with GCC-PHAT delay alignment (codec priming delay corrupts SI-SDR by 25–33 dB if ignored — we learned the hard way; see the benchmark corrections).
- Blind MUSHRA listening test — the harness ships in the UI; needs ears.
- Public checkpoint trained on openly-licensed music (MUSDB18-HQ + CC netlabels) —
unlocks a default
NORTH_STAR_CKPT_URLanduvx-style one-command install. - HF phase head — the last structural artifact source (synthesized top reuses the input's phase).
- Wider OOD coverage and a public leaderboard for lossy-music restoration.
Secrets and the Telethon session live in .env (gitignored), never in the repo. The
optional Telegram importer uses MTProto (userbot), ships disabled, and exists for
channels whose content you have rights to. With APP_ENV ≠ dev, the backend
refuses to boot on default secrets.
- Code: Apache-2.0 (see NOTICE).
- Your music stays yours. The tool processes your files locally; nothing is uploaded anywhere. The intended training workflow is train on your own collection.
- Model weights are not part of this repository. The research checkpoint was trained on a personal collection and is therefore not distributed. A public checkpoint trained on openly-licensed music ships separately with a model card stating exactly what it learned from.
- Apollo (CC-BY-SA-4.0) and AudioSR are research baselines only — cloned/installed on demand, never redistributed, no production dependency.
- Benchmark/demo audio in the docs is CC-BY / public domain (netlabels on
archive.org, Musopen),
with per-file attribution in
external/cc_ood/manifest.json.
@software{north_star_2026,
title = {north-star: self-hosted neural bandwidth extension for music collections},
author = {Lercas},
year = {2026},
url = {https://github.com/Lercas/north-star},
license = {Apache-2.0}
}Built with love for music that deserves to outlive its codecs.






