Skip to content

Pull requests: scitix/sieval

Author
Filter by author
Loading
Label
Filter by label
Loading
Use alt + click/return to exclude labels
or + click/return for logical OR
Projects
Filter by project
Loading
Milestones
Filter by milestone
Loading
Reviews
Assignee
Filter by who’s assigned
Assigned to nobody Loading
Sort

Pull requests list

feat(c-eval): add dataset and few-shot base-model task
#15 opened Jun 24, 2026 by jack-scitix-ai Contributor Loading…
14 tasks done
feat(livecodebench): add dataset and few-shot base-model task
#14 opened Jun 23, 2026 by jack-scitix-ai Contributor Loading…
14 tasks done
feat(ifbench): add dataset and few-shot base-model task
#13 opened Jun 23, 2026 by jack-scitix-ai Contributor Loading…
18 tasks done
feat(mbpp): add dataset and few-shot base-model task
#12 opened Jun 22, 2026 by jack-scitix-ai Contributor Loading…
14 tasks done
feat(ruler): add RULER long-context benchmark (datasets + tasks)
#11 opened Jun 22, 2026 by Mea1Ma Loading…
21 tasks done
feat(cmmlu): add CMMLU few-shot base-model task and dataset
#10 opened Jun 22, 2026 by jack-scitix-ai Contributor Loading…
13 tasks done
feat(datasets): pin HF revisions + checksum URL datasets, enforced in preflight
#8 opened Jun 20, 2026 by ethan-scitix Collaborator Loading…
12 tasks done
feat(datasets): add stratified_select operation for group-balanced sampling
#7 opened Jun 20, 2026 by ethan-scitix Collaborator Loading…
9 tasks done
feat(humaneval): add HumanEval 0-shot base model task
#6 opened Jun 18, 2026 by jack-scitix-ai Contributor Loading…
14 tasks done
feat(session): tolerate throughput-only config diffs on --resume
#4 opened Jun 17, 2026 by ethan-scitix Collaborator Loading…
9 tasks done
feat(theoremqa): add dataset and k-shot base-model task
#3 opened Jun 17, 2026 by jack-scitix-ai Contributor Loading…
15 tasks done
ProTip! What’s not been updated in a month: updated:<2026-05-24.