Skip to content
Change the repository type filter

All

    Repositories list

    • Reproduction code for paper "MineExplorer: Evaluating Open-World Exploration of MLLM Agents in Minecraft"
      Python
      01210Updated Jun 12, 2026Jun 12, 2026
    • WBench

      Public
      WBench: A Comprehensive Multi-turn Benchmark for Interactive Video World Model Evaluation
      Python
      MIT License
      515610Updated Jun 11, 2026Jun 11, 2026
    • MIT License
      671.3k121Updated Jun 11, 2026Jun 11, 2026
    • LARYBench

      Public
      Python
      MIT License
      815040Updated Jun 10, 2026Jun 10, 2026
    • Python
      MIT License
      22910Updated Jun 4, 2026Jun 4, 2026
    • Python
      MIT License
      6834.3k607Updated May 27, 2026May 27, 2026
    • Meeseeks

      Public
      A iterative feedback driven benchmark on LLM's instruction following ability
      Python
      65710Updated May 25, 2026May 25, 2026
    • MIT License
      2928520Updated May 13, 2026May 13, 2026
    • R-HORIZON

      Public
      [ICLR'26] R-HORIZON: How Far Can Your Large Reasoning Model Really Go in Breadth and Depth?
      Python
      MIT License
      32600Updated May 9, 2026May 9, 2026
    • A flagship 560-billion-parameter open-source MoE model that advances Native Formal Reasoning in Lean4 for Mathematics Formalization and Proving through agentic …
      Python
      MIT License
      98610Updated May 9, 2026May 9, 2026
    • Python
      Apache License 2.0
      60695110Updated May 9, 2026May 9, 2026
    • Python
      MIT License
      1800Updated May 9, 2026May 9, 2026
    • LongCat Audio Tokenizer and Detokenizer
      Python
      MIT License
      2330110Updated May 9, 2026May 9, 2026
    • This is the official repo for the paper "LongCat-Flash-Omni Technical Report"
      Python
      MIT License
      32492182Updated May 9, 2026May 9, 2026
    • MIT License
      1125440Updated May 9, 2026May 9, 2026
    • MIT License
      2243690Updated May 9, 2026May 9, 2026
    • Python
      63100Updated Apr 29, 2026Apr 29, 2026
    • Python
      Apache License 2.0
      68420Updated Apr 29, 2026Apr 29, 2026
    • PIE_bench

      Public
      PIEbench is a multilingual benchmark evaluating models performance on both general and regional QAs.
      1400Updated Apr 21, 2026Apr 21, 2026
    • This is the official repo for the paper "General365: Benchmarking General Reasoning in LLMs under High Difficulty and Diversity".
      Python
      MIT License
      38010Updated Apr 14, 2026Apr 14, 2026
    • Python
      MIT License
      47522132Updated Apr 3, 2026Apr 3, 2026
    • vitabench

      Public
      [ICLR 2026] VitaBench: Benchmarking LLM Agents with Versatile Interactive Tasks in Real-world Applications
      Python
      MIT License
      1614562Updated Feb 22, 2026Feb 22, 2026
    • eps

      Public
      C++
      1201Updated Feb 6, 2026Feb 6, 2026
    • mscclpp

      Public
      MSCCL++: A GPU-driven communication stack for scalable AI applications
      C++
      MIT License
      100000Updated Feb 6, 2026Feb 6, 2026
    • DeepGEMM

      Public
      DeepGEMM: clean and efficient FP8 GEMM kernels with fine-grained scaling
      Cuda
      MIT License
      1k000Updated Feb 6, 2026Feb 6, 2026
    • AMO-Bench

      Public
      This is the official repo for the paper "AMO-Bench: Large Language Models Still Struggle in High School Math Competitions".
      Python
      MIT License
      312810Updated Feb 6, 2026Feb 6, 2026
    • 0000Updated Feb 5, 2026Feb 5, 2026
    • Fast Hadamard transform in CUDA, with a PyTorch interface
      C
      BSD 3-Clause "New" or "Revised" License
      63000Updated Feb 5, 2026Feb 5, 2026
    • DeepEP

      Public
      DeepEP: an efficient expert-parallel communication library
      Cuda
      MIT License
      1.3k000Updated Feb 5, 2026Feb 5, 2026
    • FlashInfer: Kernel Library for LLM Serving
      Python
      Apache License 2.0
      1k000Updated Feb 5, 2026Feb 5, 2026
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.