Skip to content

fix(judge): improve 'poor' tier description to focus on errors, not missing data#497

Open
archasek wants to merge 1 commit intoax-llm:mainfrom
archasek:fix/judge-poor-tier-description
Open

fix(judge): improve 'poor' tier description to focus on errors, not missing data#497
archasek wants to merge 1 commit intoax-llm:mainfrom
archasek:fix/judge-poor-tier-description

Conversation

@archasek
Copy link
Copy Markdown
Contributor

Summary

The poor quality tier description in AxJudge reference-free mode currently reads:

poor: Response has significant issues or missing elements

This causes false poor classifications in data extraction tasks where empty/null output fields are correct — the source data is genuinely absent. The judge interprets correct empty fields as "missing elements" and downgrades to poor instead of good.

Fix: Change the poor tier description to:

poor: Response contains fabricated, incorrect, or contradictory information

This aligns the tier with actual quality problems (fabrication, errors) rather than penalizing structurally sparse but correct outputs.

Impact

  • 1 line changed in src/ax/dsp/judge.ts
  • No API changes, no new options, no breaking changes
  • Default behavior for all other tiers unchanged
  • Extraction eval tasks see ~5-8% score improvement by eliminating false-poor classifications

Test plan

  • Tested on 25-case procurement data extraction eval
  • Build passes

…issing data

The previous description "Response has significant issues or missing
elements" causes false 'poor' classifications in extraction tasks where
empty fields are correct (source data genuinely absent).  The new
wording "contains fabricated, incorrect, or contradictory information"
aligns the tier with actual quality problems.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 18, 2026 20:39
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Updates the reference-free rubric in AxJudge so the poor tier penalizes fabrication/incorrectness/contradictions rather than “missing elements,” which is important for extraction-style outputs where empty/null fields can be correct when source data is absent.

Changes:

  • Adjusts the poor tier description in the reference-free quality rubric to focus on incorrect/fabricated/contradictory content.
  • Leaves all other tiers, scoring, and APIs unchanged.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants