GitHub - Lercas/north-star: Self-hosted neural music restoration: synthesize back the highs your MP3s lost

north-star — self-hosted music restoration

Before/after spectrogram slider: the codec-truncated top band is synthesized back

Drag a 128 kbps MP3 in — get a full-bandwidth stereo FLAC out, with an A/B player and the spectrogram proof.

Hear it: lossy input (mp3-128k) → restored → lossless reference · _{Artificial.Music — “Gold” (CC-BY 4.0)}

Streaming taught us that music is access, not a thing — and access keeps getting pricier and less reliable. People are going back to their own libraries — and those libraries are full of 128–192 kbps files from the 2000s, hard-capped at ~17 kHz by the encoders of the era. north-star is a complete self-hosted station for those collections: import your folders (tags + cover art preserved), restore lossy files to full-bandwidth FLAC, listen in a built-in web player, and — if you want to go deeper — train the model on your own collection and watch every step live.

Scientific honesty: the model synthesizes a plausible reconstruction of the missing band. It does not recover the original lost data, and the UI says so.

Everything runs locally. Your files never leave your machine.

Features

restore CLI — the daily driver. Any codec in (mp3/aac/opus/…), 44.1 kHz/24-bit stereo FLAC out, source tags + cover art carried over, input folder tree mirrored, already-done files skipped. ~15× realtime per channel on an M4 Max; runs fully offline after the first checkpoint fetch.
Web UI for the rest: drag-drop restore with synchronized A/B/C player and before/after spectrograms, music library with covers, live training dashboards, blind A/B harness, experiment history.
Deficit-gated synthesis (this repo's contribution): the model touches a time-frequency band only where the input falls measurably below the model's own estimate — codec holes get filled, intact bands pass through bit-exact. Calibrated so the residual perceptual cost vs the input is −0.020 MOS, and on mp3-96k the restoration beats the input on ViSQOL.
Honest benchmarking built in: paired protocols, bootstrap CIs, Wilcoxon tests, per-sample persistence, OOD set, three systems compared. All scripts in the repo.
Train on your own collection: degradation pipeline, live-metric trainer, by-artist anti-leak splits, fake-lossless detector that keeps transcodes out of your targets.

Results

Paired benchmark — held-out 100 tracks × 7 codec profiles = 700 pairs (mp3 96/128/192k, aac 96/128k, opus 64/96k), both systems restoring the byte-identical degraded input, by-artist disjoint split, delay-aligned degradation, paired-bootstrap 95% CIs + Wilcoxon over per-sample results. Ours: 8.8 M-param magnitude U-Net (freq self-attention, adversarial fine-tune, phase reuse, deficit-gated bandwidth gate). Baseline: the real Apollo (ICASSP 2025) checkpoint, 16.5 M params, at its intended 44.1 kHz config.

Five metrics, input vs north-star vs Apollo, with paired 95% CIs

metric	lossy input	north-star	Apollo	Δ (ours − Apollo)	95% CI	win-rate
LSD ↓	12.23	6.44	8.24	−1.80	[−1.95, −1.64]	80%
HF-LSD ↓	19.45	9.43	11.42	−1.99	[−2.21, −1.76]	75%
MR-STFT ↓	0.875	0.454	0.647	−0.193	[−0.208, −0.177]	84%
SI-SDR ↑	21.98	21.84	19.09	+2.74	[+2.45, +3.04]	89%
ViSQOL ↑	4.652	4.633	4.362	+0.271	[+0.242, +0.300]	87%

5/5 metrics, every CI excluding zero (Wilcoxon p ≤ 1e−53), with a ~1.9× smaller, fully self-built model. The mechanism is the point: the gate preserves the codec's near-transparent low band exactly and only synthesizes the missing top (HF-LSD improves by 10 dB), while full-spectrum regeneration pays ~3 dB of waveform fidelity for touching what was already fine.

It holds up away from home turf:

Out-of-distribution (51 openly-licensed external tracks — netlabel electronica/ambient + Musopen orchestral, 357 pairs): 5/5 again with larger margins — ViSQOL Δ +0.628 [+0.57, +0.69], win-rate 94%; our residual harm vs the input is the same −0.020 MOS as in-distribution.
MP3-only (Apollo's exact training degradation, 300 pairs): 5/5, all significant.
vs AudioSR (ICASSP 2024 diffusion SR, 105 pairs): 98–100% win-rate on all five metrics — universal super-resolution is not codec restoration.

Per-codec ViSQOL: the deficit-gated gate nearly eliminates the perceptual cost of synthesis

_{The deficit-gated gate (production default) nearly eliminates the perceptual
cost of synthesis on near-transparent profiles — and on mp3-96k the restoration now
beats the input.}

Honest caveats: no human MUSHRA yet (ViSQOL correlates only moderately with human MOS for neural audio and is near-saturated here); the SI-SDR win is by design (low-band preservation); the held-out corpus is one collection. Full story, including two benchmark bugs we found in our own pipeline and what they overturned: docs/ACHIEVEMENTS.md, research log docs/ml-research.md, and the long-form article in docs/article/.

Quick start

Path A — just restore music (no infrastructure)

Prereqs: uv, ffmpeg.

git clone https://github.com/Lercas/north-star && cd north-star/backend
uv sync
export NORTH_STAR_CKPT_URL=<direct URL to a published checkpoint .pt>   # see Releases
uv run python scripts/restore.py ~/Music/lossy --out ~/Music/restored -r --skip-lossless

The checkpoint is cached under ~/.cache/north-star/ on first run; everything after that is fully offline. Useful flags: --dry-run, --format wav, --no-tags, --no-adaptive (fixed additive gate), --no-gate (raw model output; not recommended).

A publicly distributable checkpoint (trained on openly-licensed music) is in preparation — see License & rights. Until then: bring your own checkpoint, or train one on your collection (Path B).

Path B — the full station, all in Docker

Prereqs: just Docker.

git clone https://github.com/Lercas/north-star && cd north-star
cp .env.example .env                       # fill the secrets
export MUSIC_DIR=~/Music                   # mounted read-only at /music
docker compose --profile app up -d --build # infra + api + worker + web
open http://localhost:8080                 # log in with ADMIN_EMAIL/ADMIN_PASSWORD

Import your library from the UI ("Import folder" → /music) or from the CLI:

docker compose exec api python scripts/import_library.py /music

Inside Docker, torch runs on CPU — plenty for restoring and serving the library (on Linux you can pass --build-arg TORCH_INDEX=https://download.pytorch.org/whl/cu124 for CUDA). For training, prefer the native path below.

Path C — native app layer (fastest: MPS/CUDA training)

Prereqs: Docker (infra), uv, Node 22 + pnpm, ffmpeg.

cp .env.example .env                  # fill the secrets
docker compose up -d                  # infra only: MinIO, Postgres, Redis
cd backend && uv sync && uv run alembic upgrade head
uv run uvicorn app.main:app --port 8000          # API
uv run celery -A app.workers.celery_app:celery_app worker   # jobs (separate shell)
cd ../frontend && pnpm install && pnpm dev       # http://localhost:5173

Log in with ADMIN_EMAIL/ADMIN_PASSWORD from .env, then import your music:

cd backend && uv run python scripts/import_library.py ~/Music
# or the "Import folder" button in the web UI

_{Library — your collection with covers, formats and genuine-lossless badges (a spectral detector flags transcodes)}	_{Overview — corpus, experiments, champion model, live training and service health at a glance}
_{Restore — per-band energy report (“what was added where”) and a synchronized lossy/restored A/B player}	_{Training — live loss/LSD curves and original/lossy/restored spectrogram triptychs, streamed over SSE}
_{Experiments — every run with its held-out metric panel; pick any two to compare side by side}	_{Deficit-gated synthesis — λ-map of where the model is allowed to act: codec holes light up, intact bands stay untouched}

Architecture

Everything lives on your machine. The app layer runs natively (Apple Silicon/MPS or CUDA — fastest for training) or fully in Docker (--profile app, CPU torch); the infra layer is Docker either way, locally or on a home server (*_HOST in .env).

flowchart LR
    subgraph you["Your machine — nothing leaves it"]
        direction LR
        MUS[/"your music folders"/]
        subgraph app["app layer — native (MPS/CUDA) or Docker profile (CPU)"]
            UI["React UI<br/>library · restore · training · A/B"]
            API["FastAPI<br/>REST + SSE live progress"]
            WK["Celery worker<br/>import · train · infer"]
            CLI["restore CLI<br/>batch, offline-capable"]
        end
        subgraph infra["infra layer — Docker"]
            PG[("Postgres<br/>metadata · by-artist splits")]
            S3[("MinIO<br/>audio · derived · models")]
            RD[("Redis<br/>queue")]
        end
        CACHE[("checkpoint cache<br/>~/.cache/north-star")]
    end
    URL(["NORTH_STAR_CKPT_URL<br/>fresh-clone bootstrap"])

    MUS -- "import: tags + covers" --> API
    UI <--> API
    API <--> PG
    API <--> S3
    API -- enqueue --> RD
    RD --> WK
    WK <--> S3
    WK <--> PG
    CLI <--> CACHE
    S3 -. "first run" .-> CACHE
    URL -. "no-infra path" .-> CACHE

    classDef brand fill:#0e7490,stroke:#38bdf8,color:#e6f6ff
    classDef store fill:#13202b,stroke:#38bdf8,color:#bfe9ff
    classDef ext fill:#1a1426,stroke:#8b5cf6,color:#e9d5ff
    class UI,API,WK,CLI brand
    class PG,S3,RD,CACHE store
    class URL,MUS ext

The model itself — log-magnitude STFT U-Net (base 32, depth 4, frequency self-attention, 8.8 M params) + adversarial fine-tune (MPD + MRD), phase reuse from the input, and the deficit-gated bandwidth gate at inference:

flowchart LR
    IN["lossy input<br/>44.1 kHz"] --> ST["STFT<br/>1024 / 256, Hann"]
    ST -- "log-magnitude" --> UN["U-Net 8.8M<br/>freq self-attention<br/>adv. fine-tuned"]
    UN -- "predicted magnitude" --> G{"deficit gate<br/>λ = clip((d−10 dB)/10 dB)<br/>per 1 kHz × 5 frames"}
    ST -- "input magnitude" --> G
    G -- "(1−λ)·input + λ·max(input, pred)" --> IS["iSTFT"]
    ST -- "phase — reused" --> IS
    IS --> OUT["restored<br/>44.1 kHz / 24-bit FLAC"]

    classDef brand fill:#0e7490,stroke:#38bdf8,color:#e6f6ff
    classDef io fill:#13202b,stroke:#38bdf8,color:#bfe9ff
    class UN,G brand
    class IN,ST,IS,OUT io

Full synthesis in codec holes, bit-exact passthrough where the input is already intact — that single decision is behind both the SI-SDR win and the OOD robustness.

Stack: Python 3.12 · FastAPI · Celery + Redis · PostgreSQL + SQLAlchemy + Alembic · MinIO · PyTorch/torchaudio · React + TS + Vite + Tailwind · WaveSurfer.js · SSE.

Reproduce the benchmarks

Every number above comes from a script in backend/scripts/:

script	what it does
`bench_paired.py`	the authoritative paired run: one `degrade()` per sample, both systems on identical input, per-sample JSONL, bootstrap CI + Wilcoxon
`bench_adaptive.py`	validates a gate config against a stored paired run (reuses its inputs deterministically)
`bench_ood.py`	same paired protocol over any folder of lossless files (the CC set)
`bench_audiosr.py`	AudioSR baseline on the same inputs (needs Python 3.11)
`fad_eval.py`	Fréchet Audio Distance, VGGish (≤8 kHz — blind to BWE, probes the low band) and music-CLAP (full band)
`tune_lambda.py`	the λ-threshold grid sweep
`fetch_cc_corpus.py`	builds the openly-licensed corpus from archive.org with a license manifest

Degradation is ffmpeg round-trips with GCC-PHAT delay alignment (codec priming delay corrupts SI-SDR by 25–33 dB if ignored — we learned the hard way; see the benchmark corrections).

Roadmap

Blind MUSHRA listening test — the harness ships in the UI; needs ears.
Public checkpoint trained on openly-licensed music (MUSDB18-HQ + CC netlabels) — unlocks a default NORTH_STAR_CKPT_URL and uvx-style one-command install.
HF phase head — the last structural artifact source (synthesized top reuses the input's phase).
Wider OOD coverage and a public leaderboard for lossy-music restoration.

Security

Secrets and the Telethon session live in .env (gitignored), never in the repo. The optional Telegram importer uses MTProto (userbot), ships disabled, and exists for channels whose content you have rights to. With APP_ENV ≠ dev, the backend refuses to boot on default secrets.

License & rights

Code: Apache-2.0 (see NOTICE).
Your music stays yours. The tool processes your files locally; nothing is uploaded anywhere. The intended training workflow is train on your own collection.
Model weights are not part of this repository. The research checkpoint was trained on a personal collection and is therefore not distributed. A public checkpoint trained on openly-licensed music ships separately with a model card stating exactly what it learned from.
Apollo (CC-BY-SA-4.0) and AudioSR are research baselines only — cloned/installed on demand, never redistributed, no production dependency.
Benchmark/demo audio in the docs is CC-BY / public domain (netlabels on archive.org, Musopen), with per-file attribution in external/cc_ood/manifest.json.

Citation

@software{north_star_2026,
  title   = {north-star: self-hosted neural bandwidth extension for music collections},
  author  = {Lercas},
  year    = {2026},
  url     = {https://github.com/Lercas/north-star},
  license = {Apache-2.0}
}

Built with love for music that deserves to outlive its codecs.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
backend		backend
docs		docs
external/cc_ood		external/cc_ood
frontend		frontend
.env.example		.env.example
.gitignore		.gitignore
LICENSE		LICENSE
MODEL_CARD.md		MODEL_CARD.md
Makefile		Makefile
NOTICE		NOTICE
README.md		README.md
README.ru.md		README.ru.md
SPEC.md		SPEC.md
docker-compose.yml		docker-compose.yml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Features

Results

Quick start

Path A — just restore music (no infrastructure)

Path B — the full station, all in Docker

Path C — native app layer (fastest: MPS/CUDA training)

Architecture

Reproduce the benchmarks

Roadmap

Security

License & rights

Citation

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Features

Results

Quick start

Path A — just restore music (no infrastructure)

Path B — the full station, all in Docker

Path C — native app layer (fastest: MPS/CUDA training)

Architecture

Reproduce the benchmarks

Roadmap

Security

License & rights

Citation

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages