fix(judge): improve 'poor' tier description to focus on errors, not missing data by archasek · Pull Request #497 · ax-llm/ax

archasek · 2026-03-18T20:39:05Z

Summary

The poor quality tier description in AxJudge reference-free mode currently reads:

poor: Response has significant issues or missing elements

This causes false poor classifications in data extraction tasks where empty/null output fields are correct — the source data is genuinely absent. The judge interprets correct empty fields as "missing elements" and downgrades to poor instead of good.

Fix: Change the poor tier description to:

poor: Response contains fabricated, incorrect, or contradictory information

This aligns the tier with actual quality problems (fabrication, errors) rather than penalizing structurally sparse but correct outputs.

Impact

1 line changed in src/ax/dsp/judge.ts
No API changes, no new options, no breaking changes
Default behavior for all other tiers unchanged
Extraction eval tasks see ~5-8% score improvement by eliminating false-poor classifications

Test plan

Tested on 25-case procurement data extraction eval
Build passes

…issing data The previous description "Response has significant issues or missing elements" causes false 'poor' classifications in extraction tasks where empty fields are correct (source data genuinely absent). The new wording "contains fabricated, incorrect, or contradictory information" aligns the tier with actual quality problems. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

Copilot

Pull request overview

Updates the reference-free rubric in AxJudge so the poor tier penalizes fabrication/incorrectness/contradictions rather than “missing elements,” which is important for extraction-style outputs where empty/null fields can be correct when source data is absent.

Changes:

Adjusts the poor tier description in the reference-free quality rubric to focus on incorrect/fabricated/contradictory content.
Leaves all other tiers, scoring, and APIs unchanged.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copilot AI review requested due to automatic review settings March 18, 2026 20:39

Copilot AI reviewed Mar 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(judge): improve 'poor' tier description to focus on errors, not missing data#497

fix(judge): improve 'poor' tier description to focus on errors, not missing data#497
archasek wants to merge 1 commit intoax-llm:mainfrom
archasek:fix/judge-poor-tier-description

archasek commented Mar 18, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

archasek commented Mar 18, 2026

Summary

Impact

Test plan

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants