Skip to content

weights: rebalance language tier, structural bonuses, and density cap#867

Merged
LandynDev merged 3 commits intotestfrom
weights/rebalance-language-and-structural-weights
Apr 29, 2026
Merged

weights: rebalance language tier, structural bonuses, and density cap#867
LandynDev merged 3 commits intotestfrom
weights/rebalance-language-and-structural-weights

Conversation

@anderdc
Copy link
Copy Markdown
Collaborator

@anderdc anderdc commented Apr 29, 2026

Summary

Three correlated tuning passes informed by analysis of live scoring data from api.gittensor.io (2,958 scored PRs, density distribution).

token_weights.json (structural_bonus)

  • function_declaration 2.5 → 2.0 — equalize with function_definition; same semantic event, only the grammar convention differs across Python/Ruby/Lua vs JS/Go/Rust/C
  • enum_definition 1.0 → 1.5 — raise to type-declaration tier alongside struct_definition 1.75
  • impl_item 1.0 → 1.75 — Rust impl block is semantically a class body; was under-counted
  • with_statement 0.6 → 0.3 — Python-only construct previously ranked above for_statement

programming_languages.json

  • Header files (.h, .hh, .hpp, .hxx, .cuh) → 1.8 (one tier below their implementation file)
  • Same-language consistency: .cjs/.mjs → match .js; .cts → match .ts/.mts
  • JS/TS modest bump (out of "barely above 1.0 floor" territory):
    • .js/.cjs/.mjs 1.05 → 1.15
    • .jsx 1.10 → 1.20
    • .ts/.cts/.mts 1.05 → 1.20
    • .tsx 1.10 → 1.25
  • Java, C# 1.75 → 2.0 (verbose typed languages need weight to compensate for low per-line node density)
  • Haskell .hs/.lhs 2.0 → 1.75 (concise pure functional; was over-weighted)
  • Python, Ruby, Lua, Clojure, Racket, Scheme 1.75 → 1.5 (concise; structural_bonus already rewards expressiveness, language weight should not double-count)
  • Shell family (bash/sh/zsh/fish/ps1) → 1.6 (consistent across the family; was inconsistent at 1.5 / 1.75)

constants.py

  • MAX_CODE_DENSITY_MULTIPLIER 1.5 → 1.15

Density cap rationale

Live-data analysis of 2,958 scored PRs:

density distribution
  median: 0.140
  p98:    1.07
  p99:    1.27
  p99.5:  1.47

Only 14 PRs (0.5%) currently saturate the 1.5 cap. Inspection of that cohort shows trivial 4-9 line micro-edits scoring base_score 37.7 — the same as substantive 100+ line PRs (e.g. PR #769). New cap of 1.15 aligns with the p98 cutoff:

cap=1.30:  29 PRs (1.0%) capped
cap=1.20:  39 PRs (1.3%) capped
cap=1.15:  45 PRs (1.5%) capped   ← p98
cap=1.10:  52 PRs (1.8%) capped
cap=1.00:  84 PRs (2.8%) capped

Considered raising MIN_TOKEN_SCORE_FOR_BASE_SCORE instead, but threshold=10 would kill 42.5% of all scored PRs (mostly low-density legitimate bug fixes in the 5-10 token band) to remove only ~8 trivial PRs. Cap drop has a much better surgical ratio.

Test file detection (tests/validator/test_unscored_stale_prs.py) and 0.05× test-file weight were already working correctly — verified against live mirror data. The over-reward came from a combination of Python language weight, structural over-counting, and a too-loose density cap.

Test plan

  • Existing tests/validator/test_scoring_classes.py density tests pass (they reference the constant by name, not value)
  • Spot-check a handful of recent PRs with known scores via gittensor scoring against mirror data and compare deltas
  • Monitor density distribution and cap-saturation count for ~2 weeks post-merge to confirm 1.15 cap is hitting p98 in practice

Three correlated tuning passes informed by analysis of live scoring data
from api.gittensor.io (2,958 scored PRs, density distribution).

token_weights.json (structural_bonus):
- function_declaration 2.5 -> 2.0 (equalize with function_definition;
  same semantic event, only the grammar convention differs across
  Python/Ruby/Lua vs JS/Go/Rust/C — current 2.5 over-rewards the latter)
- enum_definition 1.0 -> 1.5 (raise to type-declaration tier alongside
  struct_definition 1.75; was under-counting Rust/Swift/Kotlin enums)
- impl_item 1.0 -> 1.75 (Rust impl block is semantically a class body;
  was under-counted vs class_definition 2.5 / struct_definition 1.75)
- with_statement 0.6 -> 0.3 (Python-only construct previously second
  highest control-flow node, above for_statement 0.5)

programming_languages.json:
- Header files: .h .hh .hpp .hxx .cuh -> 1.8 (one tier below their
  implementation file; declarations only, lower information density)
- Same-language extension consistency:
    .cjs .mjs   1.30 -> match .js
    .cts        1.50 -> match .ts/.mts
- JS/TS family modest bump (currently barely above 1.0 floor reserved
  for configs/templates):
    .js .cjs .mjs    1.05 -> 1.15
    .jsx              1.10 -> 1.20
    .ts .cts .mts    1.05 -> 1.20
    .tsx              1.10 -> 1.25
- Java/C# 1.75 -> 2.0 (verbose typed languages; weight should
  compensate for low per-line node density)
- Haskell .hs/.lhs 2.0 -> 1.75 (concise pure functional;
  was over-weighted at the systems-language tier)
- Python/Ruby/Lua/Clojure/Racket/Scheme 1.75 -> 1.5 (concise
  dynamic/lispy languages; structural_bonus already rewards
  per-line expressiveness, language weight should not double-count)
- Shell family bash/sh/zsh/fish/ps1 -> 1.6 (consistent across the
  family; previously inconsistent with .bash=1.5 vs .sh/.zsh=1.75)

constants.py:
- MAX_CODE_DENSITY_MULTIPLIER 1.5 -> 1.15
  Live-data analysis: only 14 PRs (0.5%) currently saturate the 1.5
  cap, and inspection of that cohort shows trivial 4-9 line micro-edits
  scoring base_score 37.7 — same as substantive 100+ line PRs. p98
  density across the network is 1.07; a 1.15 cap aligns with the p98
  cutoff and surgically reduces over-rewarding of cap-saturating PRs
  without disrupting the 98.5% of PRs below that threshold. Considered
  raising MIN_TOKEN_SCORE_FOR_BASE_SCORE instead, but threshold=10
  would kill 42.5% of all scored PRs (mostly low-density legitimate
  bug fixes in the 5-10 token band) to remove only ~8 trivial PRs —
  much worse surgical ratio than the cap drop.
@xiao-xiao-mao xiao-xiao-mao Bot added the enhancement New feature or request label Apr 29, 2026
anderdc and others added 2 commits April 29, 2026 16:09
The Python language weight reduction (1.75 → 1.5) in the parent commit
narrows the source-vs-non-code score gap by ~14% in fixtures, tripping
three magic-number assertions in test_base_score.py:

- test_non_code_contributes_modestly: < 0.10 → < 0.15
- test_source_code_scores_much_higher_than_non_code: * 10 → * 7
- test_non_code_does_not_bypass_threshold: * 0.10 → * 0.15

Semantic intent of each test is preserved: source still scores much
higher than non-code, non-code stays a modest contribution, and
sub-threshold PRs stay well below real source PRs. The thresholds were
previously calibrated tightly to the old weights; widening them keeps
the invariants meaningful while making the suite less brittle to future
weight tuning.
@LandynDev LandynDev merged commit 7f49ca3 into test Apr 29, 2026
3 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants