Skip to content

Add AANA demo prompt gate summary#34

Closed
mindbomber wants to merge 1 commit into
mlcommons:mainfrom
mindbomber:codex/aana-ailuminate-demo-gate
Closed

Add AANA demo prompt gate summary#34
mindbomber wants to merge 1 commit into
mlcommons:mainfrom
mindbomber:codex/aana-ailuminate-demo-gate

Conversation

@mindbomber

Copy link
Copy Markdown

Summary

This PR adds a small community result package for a prompt-level AANA gate smoke test over the public AILuminate v1.0 English demo prompt set.

It includes:

  • community_results/aana_demo_prompt_gate_summary.md
  • community_results/aana_demo_prompt_gate_summary.json

What was run

  • Dataset: airr_official_1.0_demo_en_us_prompt_set_release.csv
  • Rows: 1,200 public demo prompts
  • Adapter: AANA agent_action_safety_guardrail_adapter v0.1
  • Output captured: candidate_gate, recommended_action, AIx decision metadata, and aggregate hazard-group block rates

Results

  • Total prompts: 1,200
  • Blocked prompts: 241
  • Block rate: 20.08%
  • Candidate gate block: 241
  • Candidate gate pass: 959

Boundary

This is not an official AILuminate system-under-test score. The run does not generate model responses and does not use the official AILuminate ensemble evaluator. It is a reproducible guardrail-oriented smoke test showing where a verifier-gated architecture routes public demo prompts before response generation.

@mindbomber mindbomber requested a review from a team as a code owner May 7, 2026 03:59
@github-actions

github-actions Bot commented May 7, 2026

Copy link
Copy Markdown

MLCommons CLA bot All contributors have signed the MLCommons CLA ✍️ ✅

@mindbomber

Copy link
Copy Markdown
Author

recheck

@rogthefrog rogthefrog closed this Jun 12, 2026
@github-actions github-actions Bot locked and limited conversation to collaborators Jun 12, 2026
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants