Open-source AI music attribution & provenance layer
Embed verifiable, auditable provenance metadata into AI-generated audio — binding each output to its training sources, consent records, and generation parameters. Designed for EU AI Act compliance and the emerging music attribution ecosystem of 2026.
As of April 2026, over 20,000 AI-generated tracks are uploaded to Deezer daily, yet almost no open tooling exists to record how those tracks were created — which training data was used, who consented, and whether the output meets legal thresholds. The EU AI Act's full obligations for high-risk AI systems take effect in August 2026, mandating dataset disclosure and provenance tracking.
musicprov is the missing infrastructure layer. It answers "where did this music come from?" in a machine-readable, tamper-evident, embeddable format — with zero required dependencies.
| Capability | Details |
|---|---|
| Provenance records | Structured attribution linking generated audio to training sources, influence weights, and consent status |
| Audio binding | PCM-level SHA-256 hash binds the record to audio content — survives metadata re-tagging |
| EU AI Act compliance | Automatic scoring with human-readable failure reasons (90% consented-weight threshold) |
| WAV embedding | Zero-dependency RIFF chunk (MPRO) embedding and extraction |
| MP3 / FLAC / OGG | Via optional mutagen backend |
| Chain verification | Cryptographic hash linking for remix/continuation chains |
| Tamper detection | hmac.compare_digest constant-time comparison prevents timing oracle attacks |
| CLI tool | musicprov tag, show, verify, init |
| 58 tests, 0 hard dependencies | Core library uses Python stdlib only |
- Language: Python 3.10+
- Core deps: None (stdlib only:
hashlib,hmac,json,struct,wave,uuid) - Optional:
mutagen==1.47.0for MP3/FLAC/OGG tag I/O - Dev tools:
pytest,ruff,mypy,pip-audit - CI: GitHub Actions (test matrix × 3 Python versions, ruff lint, gitleaks, pip-audit)
# WAV support only — zero external dependencies
pip install musicprov
# WAV + MP3 + FLAC + OGG
pip install "musicprov[audio]"
# Development (includes linting and type-checking tools)
pip install "musicprov[dev]"git clone https://github.com/musicprov/musicprov.git
cd musicprov
pip install -e ".[dev]"from musicprov import (
ProvenanceRecord,
SourceContribution,
embed_provenance,
extract_provenance,
verify_chain,
)
from musicprov.provenance import ConsentStatus, GenerationParameters, LicenseType
# 1. Build a provenance record describing the AI generation
record = ProvenanceRecord(
generator_tool="ACE-Step",
generator_version="1.5",
generation_params=GenerationParameters(
model_name="ACE-Step",
model_version="1.5",
prompt="upbeat jazz piano with brushed drums, bossa nova feel",
seed=42,
duration_seconds=120.0,
),
sources=[
SourceContribution(
source_id="USUM71402404",
source_name="Take Five",
artist="Dave Brubeck Quartet",
influence_weight=0.6,
consent_status=ConsentStatus.LICENSED,
license_type=LicenseType.CUSTOM,
license_url="https://your-registry.io/license/USUM71402404",
dataset_name="JazzCorpus-2024",
dataset_version="2.1.0",
contribution_type="rhythmic",
confidence=0.88,
),
SourceContribution(
source_id="CC_PIANO_001",
source_name="Open Piano Recordings Vol. 3",
artist="Community Contributors",
influence_weight=0.4,
consent_status=ConsentStatus.PUBLIC_DOMAIN,
license_type=LicenseType.CC0,
),
],
)
# 2. Embed provenance into the generated audio file
embed_provenance("generated_track.wav", record)
# 3. Extract anywhere — no registry required
extracted = extract_provenance("generated_track.wav")
print(extracted.to_json())
# 4. Verify integrity and EU AI Act compliance
result = verify_chain(extracted, audio_path="generated_track.wav")
print(result)
# Verification: PASS record=…
# ✓ Hash integrity
# ✓ EU AI Act compliance
# ✓ Audio binding# Generate a starter provenance template
musicprov init --tool "ACE-Step" --version "1.5" --output provenance.json
# Embed provenance into a WAV file (modifies in place)
musicprov tag output.wav --record provenance.json
# Embed and write to a new file
musicprov tag output.wav --record provenance.json --output tagged.wav
# Display embedded provenance
musicprov show tagged.wav
# Output raw JSON
musicprov show tagged.wav --json
# Verify chain integrity and EU AI Act compliance
musicprov verify tagged.wav
# exits 0 on PASS, 1 on FAIL| Field | Type | Description |
|---|---|---|
record_id |
str |
UUID, auto-generated |
generation_id |
str |
UUID identifying this generation run |
generator_tool |
str |
Name of the AI tool (e.g. "ACE-Step") |
generator_version |
str |
Tool version |
sources |
list[SourceContribution] |
Training data sources (max 256) |
generation_params |
GenerationParameters |
Prompt, seed, checkpoint… |
output_sha256 |
str |
sha256:<hex> of raw PCM audio data |
output_fingerprint |
str |
Spectral fingerprint (robust to re-encode) |
parent_record_id |
str | None |
For remixes / continuations |
parent_hash |
str | None |
sha256:<hex> of parent record |
eu_ai_act_compliant |
bool | None |
Set by assess_compliance() |
compliance_notes |
str | None |
Human-readable failure reason |
| Field | Type | Description |
|---|---|---|
source_id |
str |
ISRC, catalogue ID, or UUID |
source_name |
str |
Track or work title (max 4 096 chars) |
artist |
str |
Rights-holder name |
influence_weight |
float |
0.0–1.0; all sources must sum to ≤ 1.0 |
consent_status |
ConsentStatus |
licensed / opt_in / public_domain / unknown / refused |
license_type |
LicenseType |
CC0-1.0 / CC-BY-4.0 / custom / … |
dataset_name |
str | None |
Training dataset that included this source |
contribution_type |
str |
One of: style, melodic, harmonic, rhythmic, timbral, lyrical, structural, other |
confidence |
float |
Model's self-reported attribution confidence, 0.0–1.0 |
A record passes compliance assessment if the combined influence_weight of non-consented sources is ≤ 10% of the total:
compliant = sum(s.influence_weight for s in sources
if s.consent_status in {LICENSED, OPT_IN, PUBLIC_DOMAIN}) >= 0.90
The threshold is configurable:
verify_chain(record, compliance_threshold=0.95) # stricteroutput_sha256 is a SHA-256 hash of the raw PCM audio data only, not the full file bytes. This design means:
- ✅ Embedding or updating provenance metadata does not invalidate the binding
- ✅ Verifying the hash does not require re-parsing the container format
- ❌ Re-encoding to a different bitrate does invalidate the binding — this is intentional, as the audio content changed
Provenance JSON is stored in a custom MPRO RIFF chunk appended to the WAV file:
RIFF <size> WAVE
fmt <size> <format data>
data <size> <PCM samples>
MPRO <size> <UTF-8 JSON> ← musicprov chunk
The MPRO chunk is ignored by audio players and preserved by most DAW export workflows. It is human-readable and extractable by any RIFF-aware tool.
- All string fields are capped at 4 096 characters; JSON payloads at 32 MB; WAV
MPROchunks at 1 MB. These prevent resource exhaustion when processing untrusted audio files. - Unknown enum values in deserialised JSON fall back to
UNKNOWNrather than raising exceptions. - The
metadatadict onSourceContributiononly accepts JSON primitive types (str,int,float,bool,None).
- All hash equality checks in
verify_chain()usehmac.compare_digest()for constant-time comparison, preventing timing oracle attacks.
- The core library makes no network requests. Registry integration is optional and requires explicit configuration via environment variables.
- Deserialisation uses only
json.loads()with noobject_hook. Nopickle,yaml.load, or dynamic code execution anywhere in the library.
- All file paths are resolved via
Path.resolve()before use.is_file()is checked before reads and writes.
For a full threat model, see docs/security.md.
- Truthfulness is not verified. A generator tool that lies about its training sources produces a cryptographically valid but factually incorrect record. External dataset audit is required to verify source attributions.
- No signing in v0.1. ed25519 record signatures are on the roadmap. Until then, records are integrity-hashed but not cryptographically attested by the generator.
- MP3/FLAC/OGG security depends on
mutagen's parser hardening. If processing untrusted files in those formats, use the latestmutagenrelease.
Copy .env.example to .env and fill in values. The library operates in fully offline mode if no variables are set.
cp .env.example .env| Variable | Default | Description |
|---|---|---|
MUSICPROV_REGISTRY_URL |
(unset) | URL of an external provenance registry |
MUSICPROV_REGISTRY_API_KEY |
(unset) | API key for registry authentication |
MUSICPROV_SIGNING_KEY_PATH |
(unset) | Path to ed25519 private key (future) |
MUSICPROV_COMPLIANCE_THRESHOLD |
0.90 |
Override the default compliance threshold |
MUSICPROV_LOG_LEVEL |
WARNING |
Log verbosity (do not use DEBUG in production) |
# Run the full test suite
pytest tests/ -v
# With coverage
pytest tests/ --cov=musicprov --cov=cli --cov-report=term-missing
# Run only security tests
pytest tests/ -v -k "Security or Bounds or Constant"Current status: 58 tests, 0 failures.
musicprov/
├── musicprov/ # Core library (zero external deps)
│ ├── __init__.py # Public API surface
│ ├── provenance.py # Data model: ProvenanceRecord, SourceContribution
│ ├── embed.py # WAV/MP3/FLAC/OGG read-write
│ ├── verify.py # Chain verification
│ └── fingerprint.py # Spectral audio fingerprinting
├── cli/
│ └── __main__.py # musicprov CLI entry point
├── tests/
│ ├── fixtures/ # Small test audio fixtures (< 100 KB each)
│ └── test_musicprov.py
├── examples/
│ └── quickstart.py # End-to-end demonstration
├── dashboard/
│ └── index.html # Browser-based provenance inspector
├── docs/
│ └── security.md # Threat model and security architecture
├── .github/
│ └── workflows/ci.yml # CI: test, lint, secret-scan, dep-audit
├── .env.example # Environment variable reference
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE # Apache 2.0
├── SECURITY.md # Vulnerability disclosure policy
└── pyproject.toml # Build config, pinned dev deps, ruff/mypy config
- ed25519 signing support for attested records
- IPFS-pinned registry backend
- Chromaprint integration for robust cross-encode fingerprinting
- ACE-Step / YuE / SongGeneration integration adaptors
- REST API server for centralised provenance registry
- MusicXML / MIDI provenance for symbolic music outputs
See CONTRIBUTING.md. All contributions are welcome — especially integrations with AI music generation tools and new audio format backends.
Please read our Code of Conduct before participating.
To report a vulnerability, see SECURITY.md. Do not open a public issue for security bugs.
Apache 2.0 — see LICENSE.