weights: rebalance language tier, structural bonuses, and density cap by anderdc · Pull Request #867 · entrius/gittensor

anderdc · 2026-04-29T21:06:51Z

Summary

Three correlated tuning passes informed by analysis of live scoring data from api.gittensor.io (2,958 scored PRs, density distribution).

`token_weights.json` (structural_bonus)

function_declaration 2.5 → 2.0 — equalize with function_definition; same semantic event, only the grammar convention differs across Python/Ruby/Lua vs JS/Go/Rust/C
enum_definition 1.0 → 1.5 — raise to type-declaration tier alongside struct_definition 1.75
impl_item 1.0 → 1.75 — Rust impl block is semantically a class body; was under-counted
with_statement 0.6 → 0.3 — Python-only construct previously ranked above for_statement

`programming_languages.json`

Header files (.h, .hh, .hpp, .hxx, .cuh) → 1.8 (one tier below their implementation file)
Same-language consistency: .cjs/.mjs → match .js; .cts → match .ts/.mts
JS/TS modest bump (out of "barely above 1.0 floor" territory):
- .js/.cjs/.mjs 1.05 → 1.15
- .jsx 1.10 → 1.20
- .ts/.cts/.mts 1.05 → 1.20
- .tsx 1.10 → 1.25
Java, C# 1.75 → 2.0 (verbose typed languages need weight to compensate for low per-line node density)
Haskell .hs/.lhs 2.0 → 1.75 (concise pure functional; was over-weighted)
Python, Ruby, Lua, Clojure, Racket, Scheme 1.75 → 1.5 (concise; structural_bonus already rewards expressiveness, language weight should not double-count)
Shell family (bash/sh/zsh/fish/ps1) → 1.6 (consistent across the family; was inconsistent at 1.5 / 1.75)

`constants.py`

MAX_CODE_DENSITY_MULTIPLIER 1.5 → 1.15

Density cap rationale

Live-data analysis of 2,958 scored PRs:

density distribution
  median: 0.140
  p98:    1.07
  p99:    1.27
  p99.5:  1.47

Only 14 PRs (0.5%) currently saturate the 1.5 cap. Inspection of that cohort shows trivial 4-9 line micro-edits scoring base_score 37.7 — the same as substantive 100+ line PRs (e.g. PR #769). New cap of 1.15 aligns with the p98 cutoff:

cap=1.30:  29 PRs (1.0%) capped
cap=1.20:  39 PRs (1.3%) capped
cap=1.15:  45 PRs (1.5%) capped   ← p98
cap=1.10:  52 PRs (1.8%) capped
cap=1.00:  84 PRs (2.8%) capped

Considered raising MIN_TOKEN_SCORE_FOR_BASE_SCORE instead, but threshold=10 would kill 42.5% of all scored PRs (mostly low-density legitimate bug fixes in the 5-10 token band) to remove only ~8 trivial PRs. Cap drop has a much better surgical ratio.

Test file detection (tests/validator/test_unscored_stale_prs.py) and 0.05× test-file weight were already working correctly — verified against live mirror data. The over-reward came from a combination of Python language weight, structural over-counting, and a too-loose density cap.

Test plan

Existing tests/validator/test_scoring_classes.py density tests pass (they reference the constant by name, not value)
Spot-check a handful of recent PRs with known scores via gittensor scoring against mirror data and compare deltas
Monitor density distribution and cap-saturation count for ~2 weeks post-merge to confirm 1.15 cap is hitting p98 in practice

Three correlated tuning passes informed by analysis of live scoring data from api.gittensor.io (2,958 scored PRs, density distribution). token_weights.json (structural_bonus): - function_declaration 2.5 -> 2.0 (equalize with function_definition; same semantic event, only the grammar convention differs across Python/Ruby/Lua vs JS/Go/Rust/C — current 2.5 over-rewards the latter) - enum_definition 1.0 -> 1.5 (raise to type-declaration tier alongside struct_definition 1.75; was under-counting Rust/Swift/Kotlin enums) - impl_item 1.0 -> 1.75 (Rust impl block is semantically a class body; was under-counted vs class_definition 2.5 / struct_definition 1.75) - with_statement 0.6 -> 0.3 (Python-only construct previously second highest control-flow node, above for_statement 0.5) programming_languages.json: - Header files: .h .hh .hpp .hxx .cuh -> 1.8 (one tier below their implementation file; declarations only, lower information density) - Same-language extension consistency: .cjs .mjs 1.30 -> match .js .cts 1.50 -> match .ts/.mts - JS/TS family modest bump (currently barely above 1.0 floor reserved for configs/templates): .js .cjs .mjs 1.05 -> 1.15 .jsx 1.10 -> 1.20 .ts .cts .mts 1.05 -> 1.20 .tsx 1.10 -> 1.25 - Java/C# 1.75 -> 2.0 (verbose typed languages; weight should compensate for low per-line node density) - Haskell .hs/.lhs 2.0 -> 1.75 (concise pure functional; was over-weighted at the systems-language tier) - Python/Ruby/Lua/Clojure/Racket/Scheme 1.75 -> 1.5 (concise dynamic/lispy languages; structural_bonus already rewards per-line expressiveness, language weight should not double-count) - Shell family bash/sh/zsh/fish/ps1 -> 1.6 (consistent across the family; previously inconsistent with .bash=1.5 vs .sh/.zsh=1.75) constants.py: - MAX_CODE_DENSITY_MULTIPLIER 1.5 -> 1.15 Live-data analysis: only 14 PRs (0.5%) currently saturate the 1.5 cap, and inspection of that cohort shows trivial 4-9 line micro-edits scoring base_score 37.7 — same as substantive 100+ line PRs. p98 density across the network is 1.07; a 1.15 cap aligns with the p98 cutoff and surgically reduces over-rewarding of cap-saturating PRs without disrupting the 98.5% of PRs below that threshold. Considered raising MIN_TOKEN_SCORE_FOR_BASE_SCORE instead, but threshold=10 would kill 42.5% of all scored PRs (mostly low-density legitimate bug fixes in the 5-10 token band) to remove only ~8 trivial PRs — much worse surgical ratio than the cap drop.

…ights

The Python language weight reduction (1.75 → 1.5) in the parent commit narrows the source-vs-non-code score gap by ~14% in fixtures, tripping three magic-number assertions in test_base_score.py: - test_non_code_contributes_modestly: < 0.10 → < 0.15 - test_source_code_scores_much_higher_than_non_code: * 10 → * 7 - test_non_code_does_not_bypass_threshold: * 0.10 → * 0.15 Semantic intent of each test is preserved: source still scores much higher than non-code, non-code stays a modest contribution, and sub-threshold PRs stay well below real source PRs. The thresholds were previously calibrated tightly to the old weights; widening them keeps the invariants meaningful while making the suite less brittle to future weight tuning.

xiao-xiao-mao Bot added the enhancement New feature or request label Apr 29, 2026

anderdc and others added 2 commits April 29, 2026 16:09

Merge branch 'test' into weights/rebalance-language-and-structural-we…

62a5977

…ights

LandynDev approved these changes Apr 29, 2026

View reviewed changes

LandynDev merged commit 7f49ca3 into test Apr 29, 2026
3 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

weights: rebalance language tier, structural bonuses, and density cap#867

weights: rebalance language tier, structural bonuses, and density cap#867
LandynDev merged 3 commits intotestfrom
weights/rebalance-language-and-structural-weights

anderdc commented Apr 29, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

anderdc commented Apr 29, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

token_weights.json (structural_bonus)

programming_languages.json

constants.py

Density cap rationale

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

anderdc commented Apr 29, 2026 •

edited

Loading

`token_weights.json` (structural_bonus)

`programming_languages.json`

`constants.py`