Skip to content

Consolidate SKILL-REPORT filename matching into shared API module#1765

Open
msalaman wants to merge 6 commits intomainfrom
masalama/moreDashboardImprovements
Open

Consolidate SKILL-REPORT filename matching into shared API module#1765
msalaman wants to merge 6 commits intomainfrom
masalama/moreDashboardImprovements

Conversation

@msalaman
Copy link
Copy Markdown
Contributor

@msalaman msalaman commented Apr 7, 2026

This PR is a follow-up on the dashboard confidence-level work (which reads Average Confidence from per-skill *-SKILL-REPORT.md files). It addresses review feedback by removing duplicated filename-matching logic in the API to avoid drift between endpoints.

  • What changed

    • Shared report pattern
      • Added dashboard/api/src/skillReport.ts with a single exported SKILL_REPORT_PATTERN.
    • Function updates
      • Updated getReports.ts to import the shared pattern instead of defining a local regex.
      • Updated getTestResults.ts to import the same shared pattern.
    • Documentation
      • Added JSDoc to the shared constant describing expected filename format.
  • Result

    • Both report consumers now use one source of truth for identifying SKILL-REPORT blobs, reducing maintenance risk as report naming evolves.
// dashboard/api/src/skillReport.ts
/**
 * Matches per-run markdown skill report filenames, e.g. test-run-123-SKILL-REPORT.md.
 */
export const SKILL_REPORT_PATTERN = /^test-run-.*-SKILL-REPORT\.md$/;

@msalaman msalaman marked this pull request as ready for review April 7, 2026 22:57
Copilot AI review requested due to automatic review settings April 7, 2026 22:57
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new “Confidence Level per Skill” panel to the repository health dashboard by extracting the “Average Confidence” metric from per-skill SKILL-REPORT markdown files and surfacing it via the dashboard API.

Changes:

  • Adds a new dashboard panel section for “Confidence Level per Skill”.
  • Extends the dashboard client to fetch/render the per-skill confidence metric with pass/fail filtering and a progress-bar visualization.
  • Updates the /api/test-results/{date} function to parse SKILL-REPORT.md blobs and include averageConfidence in the per-skill JSON response.
Show a summary per file
File Description
dashboard/index.html Adds the new “Confidence Level per Skill” panel container to the dashboard layout.
dashboard/assets/dashboard.js Implements loading + rendering logic for the confidence-per-skill panel and hooks it into initialization.
dashboard/api/src/functions/getTestResults.ts Parses SKILL-REPORT.md files to compute averageConfidence and returns it alongside existing skill stats.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 3

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings April 7, 2026 23:43
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Dashboard panel that surfaces “Confidence Level per Skill” by extracting the Average Confidence metric from per-skill *-SKILL-REPORT.md artifacts and exposing it via the existing test-results API.

Changes:

  • Added a new “Confidence Level per Skill” panel section to the dashboard HTML layout.
  • Implemented client-side loading/rendering for the confidence table (including filtering/collapsing behavior).
  • Extended GET /api/test-results/{date} to parse *-SKILL-REPORT.md files and return averageConfidence per skill.
Show a summary per file
File Description
dashboard/index.html Adds a new dashboard panel container for “Confidence Level per Skill”.
dashboard/assets/dashboard.js Fetches averageConfidence from /api/test-results/{date} and renders a sortable, filterable list with progress bars.
dashboard/api/src/functions/getTestResults.ts Collects SKILL-REPORT blobs, extracts “Average Confidence”, and returns it as averageConfidence in the API response.

Copilot's findings

  • Files reviewed: 3/3 changed files
  • Comments generated: 2

Comment on lines +1482 to +1510
* A skill is considered passing when its average rate is >= SIR_THRESHOLD (80%).
*/
async function loadConfidenceLevelPerSkill() {
const section = document.getElementById("panel-confidence-level");
if (!section) return;

try {
const { latestDate, skillResults } = await fetchLatestTestResults();

if (!latestDate) {
renderConfidenceLevelPanel(section, [], "skip", 0, 0, null);
return;
}

// Build one entry per skill using its averageConfidence from the SKILL-REPORT
const skills = [];
for (const [skillName, stats] of Object.entries(skillResults)) {
if (stats.averageConfidence === null || stats.averageConfidence === undefined) continue;
// averageConfidence is stored as 0–100 in the API response
skills.push({ skillName, rate: stats.averageConfidence / 100 });
}

// Sort: below threshold first (worst first), then ascending within each group
skills.sort(function (a, b) {
const aPass = a.rate >= SIR_THRESHOLD;
const bPass = b.rate >= SIR_THRESHOLD;
if (aPass !== bPass) return aPass ? 1 : -1;
return a.rate - b.rate;
});
Copy link

Copilot AI Apr 7, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This panel reuses SIR_THRESHOLD / SIR_THRESHOLD_PCT (documented as the minimum passing rate for skill invocation tests) to judge/label “confidence level”. Even if the numeric threshold is intentionally the same (80%), it couples two different metrics and makes the code harder to reason about. Consider introducing a dedicated CONFIDENCE_THRESHOLD (and % constant) or renaming the shared constant to something metric-agnostic if it’s meant to apply to both.

Copilot uses AI. Check for mistakes.
Copilot AI review requested due to automatic review settings April 8, 2026 00:10
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new dashboard panel to surface per-skill “Average Confidence” values (sourced from SKILL-REPORT artifacts) alongside existing skill health panels.

Changes:

  • Add a new “Confidence Level per Skill” panel to the dashboard UI and initialize it on page load.
  • Extend the dashboard API’s /api/test-results/{date} response to include averageConfidence per skill by parsing *-SKILL-REPORT.md blobs.
  • Render a per-skill confidence table with pass/fail thresholding and sorting.
Show a summary per file
File Description
dashboard/index.html Adds a new panel section for “Confidence Level per Skill”.
dashboard/assets/dashboard.js Fetches averageConfidence and renders the new confidence panel with threshold-based status.
dashboard/api/src/functions/getTestResults.ts Parses SKILL-REPORT markdown to compute and return averageConfidence per skill.

Copilot's findings

  • Files reviewed: 5/5 changed files
  • Comments generated: 1

Comment on lines +232 to +236
const stats = computeSkillStats(results);
const confValues = confidenceBySkill.get(skillName);
if (confValues && confValues.length > 0) {
stats.averageConfidence = confValues.reduce((a, b) => a + b, 0) / confValues.length;
}
Copy link

Copilot AI Apr 8, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

averageConfidence is computed as an unweighted average across all matching *-SKILL-REPORT.md files for a skill. On dates with multiple runs (or multiple azure-deploy test groups), this can skew the final value because each report contributes equally regardless of how many tests/runs it represents. Consider either selecting a single canonical report per skill (e.g., most recent) or computing a weighted average (e.g., weight by Total Individual Runs / Total Test Cases extracted from the report).

Copilot uses AI. Check for mistakes.
Copilot AI changed the title Add table for confidence level per skill to Dashboard Consolidate SKILL-REPORT filename matching into shared API module Apr 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants