Skip to content

feat: HERM v2 — H8 rule for evaluation pipeline bias detection #10

Description

@roli-lpci

Problem

Current HERM taxonomy covers H1-H7. A new failure mode was identified from experiment L2-003: evaluation pipeline bias — where the scoring/evaluation instructions in agent configs systematically favor certain output styles over quality.

Evidence (L2-003): A keyword-based scorer in an agent's evaluation config matched "rate limiting" but missed "size/rate limiting" — Opus used richer vocabulary for the same concept. This caused the evaluation to report Sonnet > Opus when the opposite was true. The scoring instruction was the bug, not the model.

Proposed H8 Rule

H8: Evaluation Methodology Bias — Agent configs that include evaluation or scoring instructions should be checked for:

  1. Keyword-match dependency — scoring that relies on exact phrase matching
  2. Length bias — rubrics that penalize or reward response length without semantic grounding
  3. Format preference — evaluation instructions that prefer bullet lists over prose regardless of task type
  4. Model-specific artifacts — scoring criteria calibrated to one model's output style
H8_SIGNALS = [
    r"score.*if.*contains",         # keyword-match scoring
    r"points.*for.*bullet",         # format preference
    r"penalize.*longer",            # length bias
    r"compare.*to.*baseline",       # potentially model-specific
]

Why This Matters

As agents are used to evaluate other agents (judge LLM pattern), evaluation config quality becomes critical. H8 catches bias at the config layer — before it inverts conclusions in production.

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions