Skip to content

fix(validator): refuse to overwrite corrupt PATS_FILE in save_pat / remove_pat#829

Open
ebios-star wants to merge 1 commit intoentrius:testfrom
ebios-star:fix/pat-storage-refuse-overwrite-corrupt
Open

fix(validator): refuse to overwrite corrupt PATS_FILE in save_pat / remove_pat#829
ebios-star wants to merge 1 commit intoentrius:testfrom
ebios-star:fix/pat-storage-refuse-overwrite-corrupt

Conversation

@ebios-star
Copy link
Copy Markdown
Contributor

Summary

Closes #781.

`_read_file` in gittensor/validator/pat_storage.py was swallowing `(json.JSONDecodeError, OSError)` and returning `[]` on every failure, so a partial-write or on-disk corruption of `data/miner_pats.json` produced this sequence:

  1. `_read_file()` silently returns `[]`.
  2. `save_pat()` appends one fresh entry to the empty list and writes it.
  3. The atomic `os.replace` overwrites the corrupt file with a single-entry list — every previously stored PAT is gone forever.

The read path keeps its graceful-degradation contract (`load_all_pats` and `get_pat_by_uid` still return empty/None on corrupt files so the validator scoring round can continue), but the write path now refuses to overwrite an unreadable file:

  • `_read_file(*, raise_on_corrupt=False)` is the new signature.
  • Read callers keep the default and additionally log a warning so the corruption is at least visible in operator logs.
  • `save_pat` and `remove_pat` pass `raise_on_corrupt=True`; the `json.JSONDecodeError` / `OSError` now propagates up to the bittensor axon handler, which surfaces it to the broadcasting miner as a retriable error rather than acknowledging a silent data-loss write.

Related Issues

Closes #781.

Type of Change

  • Bug fix
  • New feature
  • Refactor
  • Documentation
  • Other

Testing

  • New `TestCorruptFileRefusesToOverwrite` covers four invariants:
    • `save_pat` raises `json.JSONDecodeError` when file is corrupt.
    • The on-disk corrupt bytes are preserved verbatim after a refused save.
    • `remove_pat` raises `json.JSONDecodeError` when file is corrupt.
    • `load_all_pats` still returns `[]` for a corrupt file (read-path graceful-degradation contract preserved).
  • The pre-existing `test_load_handles_corrupt_file` is unchanged and still passes.
  • `uv run --extra dev pytest tests/` — 447 tests pass (443 baseline + 4 new).

Checklist

  • Code follows project style guidelines
  • Self-review completed
  • Changes are documented (if applicable)

…emove_pat

Previously `_read_file` swallowed `(json.JSONDecodeError, OSError)` and
returned `[]` on any failure, so a partial-write or on-disk corruption
of `data/miner_pats.json` produced this sequence:

1. `_read_file()` silently returns `[]`.
2. `save_pat()` appends one fresh entry to the empty list and writes it.
3. The atomic `os.replace` overwrites the corrupt file with a single-entry
   list — every previously stored PAT is gone forever.

The read path keeps its graceful-degradation contract (load_all_pats /
get_pat_by_uid still return empty/None on corrupt files so the validator
scoring round can continue), but the write path now refuses to overwrite
an unreadable file:

- `_read_file(*, raise_on_corrupt=False)` is the new signature.
- Read callers keep the default and additionally log a warning so the
  corruption is at least visible in operator logs.
- `save_pat` and `remove_pat` pass `raise_on_corrupt=True`; the
  json.JSONDecodeError / OSError now propagates up to the bittensor
  axon handler, which surfaces it to the broadcasting miner as a
  retriable error rather than acknowledging a silent data-loss write.

Adds `TestCorruptFileRefusesToOverwrite` covering the two write paths
and the on-disk preservation invariant. The existing
`test_load_handles_corrupt_file` is unchanged and still passes.

Closes entrius#781.
@xiao-xiao-mao xiao-xiao-mao Bot added the bug Something isn't working label Apr 28, 2026
@ebios-star
Copy link
Copy Markdown
Contributor Author

@anderdc @LandynDev — friendly ping for review when you have a moment.

Closes #781. pat_storage._read_file was silently swallowing json.JSONDecodeError and returning []; the next save_pat then overwrote the corrupt file with a single fresh entry, permanently losing every previously stored PAT.

Read paths (load_all_pats, get_pat_by_uid) keep their graceful-degradation contract. Write paths (save_pat, remove_pat) now propagate the JSONDecodeError so the bittensor axon surfaces a retriable error to the broadcasting miner instead of acknowledging a silent data-loss write. The existing test_load_handles_corrupt_file is unchanged. 4 new regression tests including the on-disk-preservation invariant.

Happy to rebase if needed.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Bug] pat_storage._read_file returns [] on corrupt JSON — next save_pat overwrites and permanently loses PATs

1 participant