musicprov

Open-source AI music attribution & provenance layer

Embed verifiable, auditable provenance metadata into AI-generated audio — binding each output to its training sources, consent records, and generation parameters. Designed for EU AI Act compliance and the emerging music attribution ecosystem of 2026.

Why this exists

As of April 2026, over 20,000 AI-generated tracks are uploaded to Deezer daily, yet almost no open tooling exists to record how those tracks were created — which training data was used, who consented, and whether the output meets legal thresholds. The EU AI Act's full obligations for high-risk AI systems take effect in August 2026, mandating dataset disclosure and provenance tracking.

musicprov is the missing infrastructure layer. It answers "where did this music come from?" in a machine-readable, tamper-evident, embeddable format — with zero required dependencies.

Features

Capability	Details
Provenance records	Structured attribution linking generated audio to training sources, influence weights, and consent status
Audio binding	PCM-level SHA-256 hash binds the record to audio content — survives metadata re-tagging
EU AI Act compliance	Automatic scoring with human-readable failure reasons (90% consented-weight threshold)
WAV embedding	Zero-dependency RIFF chunk (`MPRO`) embedding and extraction
MP3 / FLAC / OGG	Via optional `mutagen` backend
Chain verification	Cryptographic hash linking for remix/continuation chains
Tamper detection	`hmac.compare_digest` constant-time comparison prevents timing oracle attacks
CLI tool	`musicprov tag`, `show`, `verify`, `init`
58 tests, 0 hard dependencies	Core library uses Python stdlib only

Tech stack

Language: Python 3.10+
Core deps: None (stdlib only: hashlib, hmac, json, struct, wave, uuid)
Optional: mutagen==1.47.0 for MP3/FLAC/OGG tag I/O
Dev tools: pytest, ruff, mypy, pip-audit
CI: GitHub Actions (test matrix × 3 Python versions, ruff lint, gitleaks, pip-audit)

Installation

# WAV support only — zero external dependencies
pip install musicprov

# WAV + MP3 + FLAC + OGG
pip install "musicprov[audio]"

# Development (includes linting and type-checking tools)
pip install "musicprov[dev]"

From source

git clone https://github.com/musicprov/musicprov.git
cd musicprov
pip install -e ".[dev]"

Quick start

from musicprov import (
    ProvenanceRecord,
    SourceContribution,
    embed_provenance,
    extract_provenance,
    verify_chain,
)
from musicprov.provenance import ConsentStatus, GenerationParameters, LicenseType

# 1. Build a provenance record describing the AI generation
record = ProvenanceRecord(
    generator_tool="ACE-Step",
    generator_version="1.5",
    generation_params=GenerationParameters(
        model_name="ACE-Step",
        model_version="1.5",
        prompt="upbeat jazz piano with brushed drums, bossa nova feel",
        seed=42,
        duration_seconds=120.0,
    ),
    sources=[
        SourceContribution(
            source_id="USUM71402404",
            source_name="Take Five",
            artist="Dave Brubeck Quartet",
            influence_weight=0.6,
            consent_status=ConsentStatus.LICENSED,
            license_type=LicenseType.CUSTOM,
            license_url="https://your-registry.io/license/USUM71402404",
            dataset_name="JazzCorpus-2024",
            dataset_version="2.1.0",
            contribution_type="rhythmic",
            confidence=0.88,
        ),
        SourceContribution(
            source_id="CC_PIANO_001",
            source_name="Open Piano Recordings Vol. 3",
            artist="Community Contributors",
            influence_weight=0.4,
            consent_status=ConsentStatus.PUBLIC_DOMAIN,
            license_type=LicenseType.CC0,
        ),
    ],
)

# 2. Embed provenance into the generated audio file
embed_provenance("generated_track.wav", record)

# 3. Extract anywhere — no registry required
extracted = extract_provenance("generated_track.wav")
print(extracted.to_json())

# 4. Verify integrity and EU AI Act compliance
result = verify_chain(extracted, audio_path="generated_track.wav")
print(result)
# Verification: PASS  record=…
#   ✓ Hash integrity
#   ✓ EU AI Act compliance
#   ✓ Audio binding

CLI

# Generate a starter provenance template
musicprov init --tool "ACE-Step" --version "1.5" --output provenance.json

# Embed provenance into a WAV file (modifies in place)
musicprov tag output.wav --record provenance.json

# Embed and write to a new file
musicprov tag output.wav --record provenance.json --output tagged.wav

# Display embedded provenance
musicprov show tagged.wav

# Output raw JSON
musicprov show tagged.wav --json

# Verify chain integrity and EU AI Act compliance
musicprov verify tagged.wav
# exits 0 on PASS, 1 on FAIL

Data model

`ProvenanceRecord`

Field	Type	Description
`record_id`	`str`	UUID, auto-generated
`generation_id`	`str`	UUID identifying this generation run
`generator_tool`	`str`	Name of the AI tool (e.g. `"ACE-Step"`)
`generator_version`	`str`	Tool version
`sources`	`list[SourceContribution]`	Training data sources (max 256)
`generation_params`	`GenerationParameters`	Prompt, seed, checkpoint…
`output_sha256`	`str`	`sha256:<hex>` of raw PCM audio data
`output_fingerprint`	`str`	Spectral fingerprint (robust to re-encode)
`parent_record_id`	`str \| None`	For remixes / continuations
`parent_hash`	`str \| None`	`sha256:<hex>` of parent record
`eu_ai_act_compliant`	`bool \| None`	Set by `assess_compliance()`
`compliance_notes`	`str \| None`	Human-readable failure reason

`SourceContribution`

Field	Type	Description
`source_id`	`str`	ISRC, catalogue ID, or UUID
`source_name`	`str`	Track or work title (max 4 096 chars)
`artist`	`str`	Rights-holder name
`influence_weight`	`float`	0.0–1.0; all sources must sum to ≤ 1.0
`consent_status`	`ConsentStatus`	`licensed` / `opt_in` / `public_domain` / `unknown` / `refused`
`license_type`	`LicenseType`	`CC0-1.0` / `CC-BY-4.0` / `custom` / …
`dataset_name`	`str \| None`	Training dataset that included this source
`contribution_type`	`str`	One of: `style`, `melodic`, `harmonic`, `rhythmic`, `timbral`, `lyrical`, `structural`, `other`
`confidence`	`float`	Model's self-reported attribution confidence, 0.0–1.0

EU AI Act compliance logic

A record passes compliance assessment if the combined influence_weight of non-consented sources is ≤ 10% of the total:

compliant = sum(s.influence_weight for s in sources
                if s.consent_status in {LICENSED, OPT_IN, PUBLIC_DOMAIN}) >= 0.90

The threshold is configurable:

verify_chain(record, compliance_threshold=0.95)  # stricter

Audio binding

output_sha256 is a SHA-256 hash of the raw PCM audio data only, not the full file bytes. This design means:

✅ Embedding or updating provenance metadata does not invalidate the binding
✅ Verifying the hash does not require re-parsing the container format
❌ Re-encoding to a different bitrate does invalidate the binding — this is intentional, as the audio content changed

WAV storage format

Provenance JSON is stored in a custom MPRO RIFF chunk appended to the WAV file:

RIFF <size> WAVE
  fmt  <size> <format data>
  data <size> <PCM samples>
  MPRO <size> <UTF-8 JSON>   ← musicprov chunk

The MPRO chunk is ignored by audio players and preserved by most DAW export workflows. It is human-readable and extractable by any RIFF-aware tool.

Security considerations

Input validation

All string fields are capped at 4 096 characters; JSON payloads at 32 MB; WAV MPRO chunks at 1 MB. These prevent resource exhaustion when processing untrusted audio files.
Unknown enum values in deserialised JSON fall back to UNKNOWN rather than raising exceptions.
The metadata dict on SourceContribution only accepts JSON primitive types (str, int, float, bool, None).

Hash comparison

All hash equality checks in verify_chain() use hmac.compare_digest() for constant-time comparison, preventing timing oracle attacks.

No network I/O

The core library makes no network requests. Registry integration is optional and requires explicit configuration via environment variables.

No `eval`, `pickle`, or `yaml.load`

Deserialisation uses only json.loads() with no object_hook. No pickle, yaml.load, or dynamic code execution anywhere in the library.

Path safety

All file paths are resolved via Path.resolve() before use. is_file() is checked before reads and writes.

For a full threat model, see docs/security.md.

Known limitations

Truthfulness is not verified. A generator tool that lies about its training sources produces a cryptographically valid but factually incorrect record. External dataset audit is required to verify source attributions.
No signing in v0.1. ed25519 record signatures are on the roadmap. Until then, records are integrity-hashed but not cryptographically attested by the generator.
MP3/FLAC/OGG security depends on mutagen's parser hardening. If processing untrusted files in those formats, use the latest mutagen release.

Environment variables

Copy .env.example to .env and fill in values. The library operates in fully offline mode if no variables are set.

cp .env.example .env

Variable	Default	Description
`MUSICPROV_REGISTRY_URL`	(unset)	URL of an external provenance registry
`MUSICPROV_REGISTRY_API_KEY`	(unset)	API key for registry authentication
`MUSICPROV_SIGNING_KEY_PATH`	(unset)	Path to ed25519 private key (future)
`MUSICPROV_COMPLIANCE_THRESHOLD`	`0.90`	Override the default compliance threshold
`MUSICPROV_LOG_LEVEL`	`WARNING`	Log verbosity (do not use `DEBUG` in production)

Testing

# Run the full test suite
pytest tests/ -v

# With coverage
pytest tests/ --cov=musicprov --cov=cli --cov-report=term-missing

# Run only security tests
pytest tests/ -v -k "Security or Bounds or Constant"

Current status: 58 tests, 0 failures.

Project structure

musicprov/
├── musicprov/           # Core library (zero external deps)
│   ├── __init__.py      # Public API surface
│   ├── provenance.py    # Data model: ProvenanceRecord, SourceContribution
│   ├── embed.py         # WAV/MP3/FLAC/OGG read-write
│   ├── verify.py        # Chain verification
│   └── fingerprint.py   # Spectral audio fingerprinting
├── cli/
│   └── __main__.py      # musicprov CLI entry point
├── tests/
│   ├── fixtures/        # Small test audio fixtures (< 100 KB each)
│   └── test_musicprov.py
├── examples/
│   └── quickstart.py    # End-to-end demonstration
├── dashboard/
│   └── index.html       # Browser-based provenance inspector
├── docs/
│   └── security.md      # Threat model and security architecture
├── .github/
│   └── workflows/ci.yml # CI: test, lint, secret-scan, dep-audit
├── .env.example         # Environment variable reference
├── .gitignore
├── CODE_OF_CONDUCT.md
├── CONTRIBUTING.md
├── LICENSE              # Apache 2.0
├── SECURITY.md          # Vulnerability disclosure policy
└── pyproject.toml       # Build config, pinned dev deps, ruff/mypy config

Roadmap

ed25519 signing support for attested records
IPFS-pinned registry backend
Chromaprint integration for robust cross-encode fingerprinting
ACE-Step / YuE / SongGeneration integration adaptors
REST API server for centralised provenance registry
MusicXML / MIDI provenance for symbolic music outputs

Contributing

See CONTRIBUTING.md. All contributions are welcome — especially integrations with AI music generation tools and new audio format backends.

Please read our Code of Conduct before participating.

Security

To report a vulnerability, see SECURITY.md. Do not open a public issue for security bugs.

Licence

Apache 2.0 — see LICENSE.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

musicprov

Why this exists

Features

Tech stack

Installation

From source

Quick start

CLI

Data model

`ProvenanceRecord`

`SourceContribution`

EU AI Act compliance logic

Audio binding

WAV storage format

Security considerations

Input validation

Hash comparison

No network I/O

No `eval`, `pickle`, or `yaml.load`

Path safety

Known limitations

Environment variables

Testing

Project structure

Roadmap

Contributing

Security

Licence

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 3 Commits
.env.example		.env.example
.gitignore		.gitignore
CODE_OF_CONDUCT.md		CODE_OF_CONDUCT.md
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
SECURITY.md		SECURITY.md
__init__.py		__init__.py
__main__.py		__main__.py
ci.yml		ci.yml
embed.py		embed.py
provenance.py		provenance.py
pyproject.toml		pyproject.toml
test_musicprov.py		test_musicprov.py
verify.py		verify.py

Folders and files

Latest commit

History

Repository files navigation

musicprov

Why this exists

Features

Tech stack

Installation

From source

Quick start

CLI

Data model

ProvenanceRecord

SourceContribution

EU AI Act compliance logic

Audio binding

WAV storage format

Security considerations

Input validation

Hash comparison

No network I/O

No eval, pickle, or yaml.load

Path safety

Known limitations

Environment variables

Testing

Project structure

Roadmap

Contributing

Security

Licence

About

Topics

Resources

License

Code of conduct

Contributing

Security policy

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

`ProvenanceRecord`

`SourceContribution`

No `eval`, `pickle`, or `yaml.load`

Packages