Skip to content

Lercas/north-star

Repository files navigation

north-star — self-hosted music restoration

English · Русский

License: Apache-2.0 Python 3.12 PyTorch Benchmark Model Model card


Before/after spectrogram slider: the codec-truncated top band is synthesized back

Drag a 128 kbps MP3 in — get a full-bandwidth stereo FLAC out, with an A/B player and the spectrogram proof.

Hear it: lossy input (mp3-128k)restoredlossless reference  ·  Artificial.Music — “Gold” (CC-BY 4.0)


Streaming taught us that music is access, not a thing — and access keeps getting pricier and less reliable. People are going back to their own libraries — and those libraries are full of 128–192 kbps files from the 2000s, hard-capped at ~17 kHz by the encoders of the era. north-star is a complete self-hosted station for those collections: import your folders (tags + cover art preserved), restore lossy files to full-bandwidth FLAC, listen in a built-in web player, and — if you want to go deeper — train the model on your own collection and watch every step live.

Scientific honesty: the model synthesizes a plausible reconstruction of the missing band. It does not recover the original lost data, and the UI says so.

Everything runs locally. Your files never leave your machine.

Features

  • restore CLI — the daily driver. Any codec in (mp3/aac/opus/…), 44.1 kHz/24-bit stereo FLAC out, source tags + cover art carried over, input folder tree mirrored, already-done files skipped. ~15× realtime per channel on an M4 Max; runs fully offline after the first checkpoint fetch.
  • Web UI for the rest: drag-drop restore with synchronized A/B/C player and before/after spectrograms, music library with covers, live training dashboards, blind A/B harness, experiment history.
  • Deficit-gated synthesis (this repo's contribution): the model touches a time-frequency band only where the input falls measurably below the model's own estimate — codec holes get filled, intact bands pass through bit-exact. Calibrated so the residual perceptual cost vs the input is −0.020 MOS, and on mp3-96k the restoration beats the input on ViSQOL.
  • Honest benchmarking built in: paired protocols, bootstrap CIs, Wilcoxon tests, per-sample persistence, OOD set, three systems compared. All scripts in the repo.
  • Train on your own collection: degradation pipeline, live-metric trainer, by-artist anti-leak splits, fake-lossless detector that keeps transcodes out of your targets.

Results

Paired benchmark — held-out 100 tracks × 7 codec profiles = 700 pairs (mp3 96/128/192k, aac 96/128k, opus 64/96k), both systems restoring the byte-identical degraded input, by-artist disjoint split, delay-aligned degradation, paired-bootstrap 95% CIs + Wilcoxon over per-sample results. Ours: 8.8 M-param magnitude U-Net (freq self-attention, adversarial fine-tune, phase reuse, deficit-gated bandwidth gate). Baseline: the real Apollo (ICASSP 2025) checkpoint, 16.5 M params, at its intended 44.1 kHz config.

Five metrics, input vs north-star vs Apollo, with paired 95% CIs
metric lossy input north-star Apollo Δ (ours − Apollo) 95% CI win-rate
LSD ↓ 12.23 6.44 8.24 −1.80 [−1.95, −1.64] 80%
HF-LSD ↓ 19.45 9.43 11.42 −1.99 [−2.21, −1.76] 75%
MR-STFT ↓ 0.875 0.454 0.647 −0.193 [−0.208, −0.177] 84%
SI-SDR ↑ 21.98 21.84 19.09 +2.74 [+2.45, +3.04] 89%
ViSQOL ↑ 4.652 4.633 4.362 +0.271 [+0.242, +0.300] 87%

5/5 metrics, every CI excluding zero (Wilcoxon p ≤ 1e−53), with a ~1.9× smaller, fully self-built model. The mechanism is the point: the gate preserves the codec's near-transparent low band exactly and only synthesizes the missing top (HF-LSD improves by 10 dB), while full-spectrum regeneration pays ~3 dB of waveform fidelity for touching what was already fine.

It holds up away from home turf:

  • Out-of-distribution (51 openly-licensed external tracks — netlabel electronica/ambient + Musopen orchestral, 357 pairs): 5/5 again with larger margins — ViSQOL Δ +0.628 [+0.57, +0.69], win-rate 94%; our residual harm vs the input is the same −0.020 MOS as in-distribution.
  • MP3-only (Apollo's exact training degradation, 300 pairs): 5/5, all significant.
  • vs AudioSR (ICASSP 2024 diffusion SR, 105 pairs): 98–100% win-rate on all five metrics — universal super-resolution is not codec restoration.
Per-codec ViSQOL: the deficit-gated gate nearly eliminates the perceptual cost of synthesis
The deficit-gated gate (production default) nearly eliminates the perceptual cost of synthesis on near-transparent profiles — and on mp3-96k the restoration now beats the input.

Honest caveats: no human MUSHRA yet (ViSQOL correlates only moderately with human MOS for neural audio and is near-saturated here); the SI-SDR win is by design (low-band preservation); the held-out corpus is one collection. Full story, including two benchmark bugs we found in our own pipeline and what they overturned: docs/ACHIEVEMENTS.md, research log docs/ml-research.md, and the long-form article in docs/article/.

Quick start

Path A — just restore music (no infrastructure)

Prereqs: uv, ffmpeg.

git clone https://github.com/Lercas/north-star && cd north-star/backend
uv sync
export NORTH_STAR_CKPT_URL=<direct URL to a published checkpoint .pt>   # see Releases
uv run python scripts/restore.py ~/Music/lossy --out ~/Music/restored -r --skip-lossless

The checkpoint is cached under ~/.cache/north-star/ on first run; everything after that is fully offline. Useful flags: --dry-run, --format wav, --no-tags, --no-adaptive (fixed additive gate), --no-gate (raw model output; not recommended).

A publicly distributable checkpoint (trained on openly-licensed music) is in preparation — see License & rights. Until then: bring your own checkpoint, or train one on your collection (Path B).

Path B — the full station, all in Docker

Prereqs: just Docker.

git clone https://github.com/Lercas/north-star && cd north-star
cp .env.example .env                       # fill the secrets
export MUSIC_DIR=~/Music                   # mounted read-only at /music
docker compose --profile app up -d --build # infra + api + worker + web
open http://localhost:8080                 # log in with ADMIN_EMAIL/ADMIN_PASSWORD

Import your library from the UI ("Import folder" → /music) or from the CLI:

docker compose exec api python scripts/import_library.py /music

Inside Docker, torch runs on CPU — plenty for restoring and serving the library (on Linux you can pass --build-arg TORCH_INDEX=https://download.pytorch.org/whl/cu124 for CUDA). For training, prefer the native path below.

Path C — native app layer (fastest: MPS/CUDA training)

Prereqs: Docker (infra), uv, Node 22 + pnpm, ffmpeg.

cp .env.example .env                  # fill the secrets
docker compose up -d                  # infra only: MinIO, Postgres, Redis
cd backend && uv sync && uv run alembic upgrade head
uv run uvicorn app.main:app --port 8000          # API
uv run celery -A app.workers.celery_app:celery_app worker   # jobs (separate shell)
cd ../frontend && pnpm install && pnpm dev       # http://localhost:5173

Log in with ADMIN_EMAIL/ADMIN_PASSWORD from .env, then import your music:

cd backend && uv run python scripts/import_library.py ~/Music
# or the "Import folder" button in the web UI
Library
Library — your collection with covers, formats and genuine-lossless badges (a spectral detector flags transcodes)
Dashboard
Overview — corpus, experiments, champion model, live training and service health at a glance
Restore: per-band report and A/B player
Restore — per-band energy report (“what was added where”) and a synchronized lossy/restored A/B player
Live training
Training — live loss/LSD curves and original/lossy/restored spectrogram triptychs, streamed over SSE
Experiments
Experiments — every run with its held-out metric panel; pick any two to compare side by side
Deficit-gate lambda map
Deficit-gated synthesis — λ-map of where the model is allowed to act: codec holes light up, intact bands stay untouched

Architecture

Everything lives on your machine. The app layer runs natively (Apple Silicon/MPS or CUDA — fastest for training) or fully in Docker (--profile app, CPU torch); the infra layer is Docker either way, locally or on a home server (*_HOST in .env).

flowchart LR
    subgraph you["Your machine — nothing leaves it"]
        direction LR
        MUS[/"your music folders"/]
        subgraph app["app layer — native (MPS/CUDA) or Docker profile (CPU)"]
            UI["React UI<br/>library · restore · training · A/B"]
            API["FastAPI<br/>REST + SSE live progress"]
            WK["Celery worker<br/>import · train · infer"]
            CLI["restore CLI<br/>batch, offline-capable"]
        end
        subgraph infra["infra layer — Docker"]
            PG[("Postgres<br/>metadata · by-artist splits")]
            S3[("MinIO<br/>audio · derived · models")]
            RD[("Redis<br/>queue")]
        end
        CACHE[("checkpoint cache<br/>~/.cache/north-star")]
    end
    URL(["NORTH_STAR_CKPT_URL<br/>fresh-clone bootstrap"])

    MUS -- "import: tags + covers" --> API
    UI <--> API
    API <--> PG
    API <--> S3
    API -- enqueue --> RD
    RD --> WK
    WK <--> S3
    WK <--> PG
    CLI <--> CACHE
    S3 -. "first run" .-> CACHE
    URL -. "no-infra path" .-> CACHE

    classDef brand fill:#0e7490,stroke:#38bdf8,color:#e6f6ff
    classDef store fill:#13202b,stroke:#38bdf8,color:#bfe9ff
    classDef ext fill:#1a1426,stroke:#8b5cf6,color:#e9d5ff
    class UI,API,WK,CLI brand
    class PG,S3,RD,CACHE store
    class URL,MUS ext
Loading

The model itself — log-magnitude STFT U-Net (base 32, depth 4, frequency self-attention, 8.8 M params) + adversarial fine-tune (MPD + MRD), phase reuse from the input, and the deficit-gated bandwidth gate at inference:

flowchart LR
    IN["lossy input<br/>44.1 kHz"] --> ST["STFT<br/>1024 / 256, Hann"]
    ST -- "log-magnitude" --> UN["U-Net 8.8M<br/>freq self-attention<br/>adv. fine-tuned"]
    UN -- "predicted magnitude" --> G{"deficit gate<br/>λ = clip((d−10 dB)/10 dB)<br/>per 1 kHz × 5 frames"}
    ST -- "input magnitude" --> G
    G -- "(1−λ)·input + λ·max(input, pred)" --> IS["iSTFT"]
    ST -- "phase — reused" --> IS
    IS --> OUT["restored<br/>44.1 kHz / 24-bit FLAC"]

    classDef brand fill:#0e7490,stroke:#38bdf8,color:#e6f6ff
    classDef io fill:#13202b,stroke:#38bdf8,color:#bfe9ff
    class UN,G brand
    class IN,ST,IS,OUT io
Loading

Full synthesis in codec holes, bit-exact passthrough where the input is already intact — that single decision is behind both the SI-SDR win and the OOD robustness.

Stack: Python 3.12 · FastAPI · Celery + Redis · PostgreSQL + SQLAlchemy + Alembic · MinIO · PyTorch/torchaudio · React + TS + Vite + Tailwind · WaveSurfer.js · SSE.

Reproduce the benchmarks

Every number above comes from a script in backend/scripts/:

script what it does
bench_paired.py the authoritative paired run: one degrade() per sample, both systems on identical input, per-sample JSONL, bootstrap CI + Wilcoxon
bench_adaptive.py validates a gate config against a stored paired run (reuses its inputs deterministically)
bench_ood.py same paired protocol over any folder of lossless files (the CC set)
bench_audiosr.py AudioSR baseline on the same inputs (needs Python 3.11)
fad_eval.py Fréchet Audio Distance, VGGish (≤8 kHz — blind to BWE, probes the low band) and music-CLAP (full band)
tune_lambda.py the λ-threshold grid sweep
fetch_cc_corpus.py builds the openly-licensed corpus from archive.org with a license manifest

Degradation is ffmpeg round-trips with GCC-PHAT delay alignment (codec priming delay corrupts SI-SDR by 25–33 dB if ignored — we learned the hard way; see the benchmark corrections).

Roadmap

  • Blind MUSHRA listening test — the harness ships in the UI; needs ears.
  • Public checkpoint trained on openly-licensed music (MUSDB18-HQ + CC netlabels) — unlocks a default NORTH_STAR_CKPT_URL and uvx-style one-command install.
  • HF phase head — the last structural artifact source (synthesized top reuses the input's phase).
  • Wider OOD coverage and a public leaderboard for lossy-music restoration.

Security

Secrets and the Telethon session live in .env (gitignored), never in the repo. The optional Telegram importer uses MTProto (userbot), ships disabled, and exists for channels whose content you have rights to. With APP_ENVdev, the backend refuses to boot on default secrets.

License & rights

  • Code: Apache-2.0 (see NOTICE).
  • Your music stays yours. The tool processes your files locally; nothing is uploaded anywhere. The intended training workflow is train on your own collection.
  • Model weights are not part of this repository. The research checkpoint was trained on a personal collection and is therefore not distributed. A public checkpoint trained on openly-licensed music ships separately with a model card stating exactly what it learned from.
  • Apollo (CC-BY-SA-4.0) and AudioSR are research baselines only — cloned/installed on demand, never redistributed, no production dependency.
  • Benchmark/demo audio in the docs is CC-BY / public domain (netlabels on archive.org, Musopen), with per-file attribution in external/cc_ood/manifest.json.

Citation

@software{north_star_2026,
  title   = {north-star: self-hosted neural bandwidth extension for music collections},
  author  = {Lercas},
  year    = {2026},
  url     = {https://github.com/Lercas/north-star},
  license = {Apache-2.0}
}

Built with love for music that deserves to outlive its codecs.

About

Self-hosted neural music restoration: synthesize back the highs your MP3s lost

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors