Skip to content

Pull requests: openai/evals

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

Avoid mutating dicts in jsondumps
#1678 opened Jun 10, 2026 by RaghunandanKumar Loading…
Add BGPT REFUTE benchmark
#1674 opened Jun 5, 2026 by connerlambden Loading…
Fix EVALS_SHOW_EVAL_PROGRESS env var parsing
#1670 opened May 21, 2026 by LeSingh1 Loading…
Refresh contribution documentation references
#1662 opened May 15, 2026 by MukundaKatta Loading…
feat: add MiniMax provider with M3 as default model
#1661 opened May 13, 2026 by octo-patch Loading…
chore: remove obsolete GPT-4 PR guidance
#1659 opened May 11, 2026 by extrasmall0 Loading…
Add agent pre-action control eval
#1658 opened May 10, 2026 by mindbomber Loading…
13 tasks done
Add atr_prompt_injection eval (modelgraded safety, 16 multilingual samples)
#1657 opened May 10, 2026 by eeee2345 Loading…
13 tasks done
Add agent-tool-abstention eval (13 samples, Match template)
#1656 opened May 8, 2026 by MukundaKatta Loading…
5 tasks done
Add agent-tool-routing eval (12 samples, Match template)
#1655 opened May 8, 2026 by MukundaKatta Loading…
5 tasks done
Fix OpenAI completion args routing
#1653 opened Apr 23, 2026 by kayametehan Loading…
Add explain mode to HumanCliSolver
#1652 opened Apr 23, 2026 by kayametehan Loading…
Handle nested token usage details in oaieval
#1650 opened Apr 23, 2026 by kayametehan Loading…
Add Turkish proverbs eval
#1649 opened Apr 23, 2026 by kayametehan Loading…
eval: add RAIL Score responsible AI evaluation across 8 dimensions
#1640 opened Apr 2, 2026 by SumitVermakgp Loading…
12 tasks done
ProTip! What’s not been updated in a month: updated:<2026-05-10.