Rubrics Refinement Overview

What is Rubrics Refinement?

Rubrics Refinement is a powerful capability in Copilot Studio Kit that enables you to create, test, and iteratively improve reusable evaluation standards (rubrics) for AI-generated responses. This feature helps ensure that AI grading of your agent's responses aligns with human judgment and organizational quality standards.

The Problem It Solves

AI judges are capable evaluators, but the quality of their grading depends entirely on the rubric they are given. Without a systematic way to refine evaluation criteria, organizations struggle to:

Define clear, domain-specific evaluation standards
Compare AI grading with human expectations
Identify where rubric instructions fail to capture intended quality criteria
Build trust in AI evaluation outputs for critical enterprise use cases

Rubrics Refinement addresses these challenges by providing an iterative workflow that brings AI grading into alignment with human judgment.

Key Benefits

Reusable Evaluation Standards: Define rubrics once and reuse them across multiple agents and test runs
Alignment with Human Judgment: Systematically minimize misalignment between AI and human graders
Quality Assurance: Establish durable enterprise assets that encode organizational quality standards
Confidence in AI Evaluation: Build trust in automated grading through transparent, iterative refinement

Core Concepts

Rubrics as Evaluation Standards

A rubric is a structured set of natural-language grading instructions used by an AI judge to evaluate the quality of an agent's response. A rubric includes:

A description of "what a good response looks like"
Grade definitions for a 5-point scale (5 = Exemplary → 1 = Needs improvement)
Optional good and bad examples that illustrate quality standards

AI Grading vs Human Grading

AI Grading: An AI judge (LLM) evaluates responses using the rubric, producing a grade (1-5) and a rationale explaining its assessment
Human Grading: A human evaluator (maker) assesses the same responses, providing their own grade (1-5) and reasoning
Comparison: By comparing these two assessments, you can identify where the rubric needs refinement

Alignment and Misalignment

Alignment: When the AI grade matches the human grade, the rubric is working as intended
Misalignment: When the AI grade differs from the human grade, it indicates the rubric needs improvement

The Iterative Refinement Process

Rubrics Refinement is fundamentally an iterative process:

Define an initial rubric with evaluation criteria
Run tests using the rubric to generate AI grades
Review agent responses and provide human grades
Compare AI and human assessments to identify misalignment
Mark good/bad examples to guide refinement
Refine the rubric using AI analysis of misalignment patterns
Re-run tests with the updated rubric
Repeat until alignment is acceptable

Primary Goal

Important: The goal of rubric refinement is not to get all responses graded as 5 (Exemplary). The goal is to minimize misalignment between AI and human graders.

Response optimization—actually improving the quality of your agent's answers—happens in Copilot Studio itself. Rubrics Refinement focuses purely on ensuring your evaluation criteria accurately reflect human judgment, so you can trust automated grading results.

Two Modes of Rubric Usage

Rubrics in Copilot Studio Kit serve two distinct purposes:

1. Testing Mode (Test Case Level)

Purpose: Regular test automation with custom grading criteria
Configuration: Assign rubric at the individual test case level
Use Case: Ongoing quality assurance for Generative Answer test cases
Pass/Fail: Grade ≥ passing grade threshold (default: 5)

2. Refinement Mode (Test Run Level)

Purpose: Iteratively refine and improve the rubric itself
Configuration: Assign rubric at the test run level
Use Case: Dedicated rubric refinement workflow
Pass/Fail: Passing grade is informational only—the goal is alignment, not passing scores
Key Difference: AI provides detailed rationale (more expensive) to support analysis

Who Should Use Rubrics Refinement?

Rubrics Refinement is ideal for:

Quality Assurance Teams: Establishing consistent evaluation standards across agents
Makers: Creating reliable automated tests using Generative Answer responses
Enterprise Organizations: Defining domain-specific, organizational quality standards
Anyone Seeking Trust in AI Evaluation: Teams that need confidence in automated grading for critical use cases

What's Included in the Current Release

Complete rubric management (create, view, edit, duplicate, delete)
Rubric assignment at test run level (for refinement) and test case level (for testing)
Passing grade selection with clear differentiation between testing and refinement modes
Two refinement views: standard (AI grades hidden to avoid bias) and full (AI grades visible)
Detailed view for grading longer responses
Ability to mark test cases as good/bad examples
AI-powered rubric refinement based on alignment analysis
Save and save-as options for preserving rubric versions
Support for iterative refinement workflow

What's planned for Future Releases

Auto-generate test sets from conversation transcripts
Enhanced diagnostics and analytics
Rubric governance (approvals, lifecycle, publishing)
Improved dedicated rubric refinement interface

Getting Started

To begin using Rubrics Refinement:

Create a rubric defining your quality standards (see Rubric Management)
Prepare test cases with Generative Answer test types
Configure a test run for rubric refinement (see Using Rubrics in Tests)
Follow the refinement workflow to align AI with human judgment (see Rubric Refinement Workflow)

Next: Rubric Management →

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Rubrics Refinement Overview

What is Rubrics Refinement?

The Problem It Solves

Key Benefits

Core Concepts

Rubrics as Evaluation Standards

AI Grading vs Human Grading

Alignment and Misalignment

The Iterative Refinement Process

Primary Goal

Two Modes of Rubric Usage

1. Testing Mode (Test Case Level)

2. Refinement Mode (Test Run Level)

Who Should Use Rubrics Refinement?

What's Included in the Current Release

What's planned for Future Releases

Getting Started

FilesExpand file tree

01-rubrics-refinement-overview.md

Latest commit

History

01-rubrics-refinement-overview.md

File metadata and controls

Rubrics Refinement Overview

What is Rubrics Refinement?

The Problem It Solves

Key Benefits

Core Concepts

Rubrics as Evaluation Standards

AI Grading vs Human Grading

Alignment and Misalignment

The Iterative Refinement Process

Primary Goal

Two Modes of Rubric Usage

1. Testing Mode (Test Case Level)

2. Refinement Mode (Test Run Level)

Who Should Use Rubrics Refinement?

What's Included in the Current Release

What's planned for Future Releases

Getting Started