Description Goals
Expand Scope: Unify and expand embedding evaluation across text, image, audio, and video, including cross-modal settings and new applications.
Improve Quality: Measure capabilities better through improved datasets, robustness evaluations, and harder benchmarks.
Increase Efficiency: Develop faster and more informative evaluation methodologies and infrastructure.
Strengthen Governance: Improve maintainability, reproducibility, trust, and open benchmark practices.
Tracks
1. Modality Expansion (MOEB)
Unify MMTEB, MIEB, MAEB, and MVEB under an omni-modal benchmark
Add important missing models and datasets to MTEB, MIEB, MAEB, and MVEB
Expand cross-modal evaluations for missing combinations
2. Quality
Refresh saturated benchmarks.
Improve dataset quality and filtering.
Harder and more robust evaluations.
Contamination: autodetect dataset contamination #1636
Better measurement: BrowseComp, MTEB-gym
3. Efficiency & Methodology
Faster evaluation pipelines and infrastructure.
Informative task selection.
Benchmark compression.
IRT and optimal experiment design.
Partial-score estimation.
4. Governance
5. Human Annotations Baselines[Optional]:
Ideas to be Built on Top of MOEB
AutoResearch with Sentence Transformers.
Explore new domains (e.g. RAG, agents).
Reactions are currently unavailable
You can’t perform that action at this time.
Goals
Tracks
1. Modality Expansion (MOEB)
2. Quality
3. Efficiency & Methodology
4. Governance
5. Human Annotations Baselines[Optional]:
Ideas to be Built on Top of MOEB