Conversation
Three correlated tuning passes informed by analysis of live scoring data
from api.gittensor.io (2,958 scored PRs, density distribution).
token_weights.json (structural_bonus):
- function_declaration 2.5 -> 2.0 (equalize with function_definition;
same semantic event, only the grammar convention differs across
Python/Ruby/Lua vs JS/Go/Rust/C — current 2.5 over-rewards the latter)
- enum_definition 1.0 -> 1.5 (raise to type-declaration tier alongside
struct_definition 1.75; was under-counting Rust/Swift/Kotlin enums)
- impl_item 1.0 -> 1.75 (Rust impl block is semantically a class body;
was under-counted vs class_definition 2.5 / struct_definition 1.75)
- with_statement 0.6 -> 0.3 (Python-only construct previously second
highest control-flow node, above for_statement 0.5)
programming_languages.json:
- Header files: .h .hh .hpp .hxx .cuh -> 1.8 (one tier below their
implementation file; declarations only, lower information density)
- Same-language extension consistency:
.cjs .mjs 1.30 -> match .js
.cts 1.50 -> match .ts/.mts
- JS/TS family modest bump (currently barely above 1.0 floor reserved
for configs/templates):
.js .cjs .mjs 1.05 -> 1.15
.jsx 1.10 -> 1.20
.ts .cts .mts 1.05 -> 1.20
.tsx 1.10 -> 1.25
- Java/C# 1.75 -> 2.0 (verbose typed languages; weight should
compensate for low per-line node density)
- Haskell .hs/.lhs 2.0 -> 1.75 (concise pure functional;
was over-weighted at the systems-language tier)
- Python/Ruby/Lua/Clojure/Racket/Scheme 1.75 -> 1.5 (concise
dynamic/lispy languages; structural_bonus already rewards
per-line expressiveness, language weight should not double-count)
- Shell family bash/sh/zsh/fish/ps1 -> 1.6 (consistent across the
family; previously inconsistent with .bash=1.5 vs .sh/.zsh=1.75)
constants.py:
- MAX_CODE_DENSITY_MULTIPLIER 1.5 -> 1.15
Live-data analysis: only 14 PRs (0.5%) currently saturate the 1.5
cap, and inspection of that cohort shows trivial 4-9 line micro-edits
scoring base_score 37.7 — same as substantive 100+ line PRs. p98
density across the network is 1.07; a 1.15 cap aligns with the p98
cutoff and surgically reduces over-rewarding of cap-saturating PRs
without disrupting the 98.5% of PRs below that threshold. Considered
raising MIN_TOKEN_SCORE_FOR_BASE_SCORE instead, but threshold=10
would kill 42.5% of all scored PRs (mostly low-density legitimate
bug fixes in the 5-10 token band) to remove only ~8 trivial PRs —
much worse surgical ratio than the cap drop.
The Python language weight reduction (1.75 → 1.5) in the parent commit narrows the source-vs-non-code score gap by ~14% in fixtures, tripping three magic-number assertions in test_base_score.py: - test_non_code_contributes_modestly: < 0.10 → < 0.15 - test_source_code_scores_much_higher_than_non_code: * 10 → * 7 - test_non_code_does_not_bypass_threshold: * 0.10 → * 0.15 Semantic intent of each test is preserved: source still scores much higher than non-code, non-code stays a modest contribution, and sub-threshold PRs stay well below real source PRs. The thresholds were previously calibrated tightly to the old weights; widening them keeps the invariants meaningful while making the suite less brittle to future weight tuning.
LandynDev
approved these changes
Apr 29, 2026
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Three correlated tuning passes informed by analysis of live scoring data from
api.gittensor.io(2,958 scored PRs, density distribution).token_weights.json(structural_bonus)function_declaration2.5 → 2.0 — equalize withfunction_definition; same semantic event, only the grammar convention differs across Python/Ruby/Lua vs JS/Go/Rust/Cenum_definition1.0 → 1.5 — raise to type-declaration tier alongsidestruct_definition1.75impl_item1.0 → 1.75 — Rust impl block is semantically a class body; was under-countedwith_statement0.6 → 0.3 — Python-only construct previously ranked abovefor_statementprogramming_languages.json.h,.hh,.hpp,.hxx,.cuh) → 1.8 (one tier below their implementation file).cjs/.mjs→ match.js;.cts→ match.ts/.mts.js/.cjs/.mjs1.05 → 1.15.jsx1.10 → 1.20.ts/.cts/.mts1.05 → 1.20.tsx1.10 → 1.25.hs/.lhs2.0 → 1.75 (concise pure functional; was over-weighted)constants.pyMAX_CODE_DENSITY_MULTIPLIER1.5 → 1.15Density cap rationale
Live-data analysis of 2,958 scored PRs:
Only 14 PRs (0.5%) currently saturate the 1.5 cap. Inspection of that cohort shows trivial 4-9 line micro-edits scoring base_score 37.7 — the same as substantive 100+ line PRs (e.g. PR #769). New cap of 1.15 aligns with the p98 cutoff:
Considered raising
MIN_TOKEN_SCORE_FOR_BASE_SCOREinstead, but threshold=10 would kill 42.5% of all scored PRs (mostly low-density legitimate bug fixes in the 5-10 token band) to remove only ~8 trivial PRs. Cap drop has a much better surgical ratio.Test file detection (
tests/validator/test_unscored_stale_prs.py) and 0.05× test-file weight were already working correctly — verified against live mirror data. The over-reward came from a combination of Python language weight, structural over-counting, and a too-loose density cap.Test plan
tests/validator/test_scoring_classes.pydensity tests pass (they reference the constant by name, not value)gittensorscoring against mirror data and compare deltas