Skip to content
Change the repository type filter

All

    Repositories list

    • The official repository for Toxic Commons and Celadon. Toxicity Classification for public domain data.
      Python
      MIT License
      22212Updated Jun 10, 2026Jun 10, 2026
    • Adaptation of nanotron to run on tractoai
      Python
      Apache License 2.0
      318004Updated Jun 10, 2026Jun 10, 2026
    • Minimalistic large language model 3D-parallelism training
      Python
      Apache License 2.0
      318301Updated Jun 10, 2026Jun 10, 2026
    • Python library to use Pleias-RAG models
      Python
      Other
      97242Updated Jun 10, 2026Jun 10, 2026
    • Python
      57154Updated May 7, 2026May 7, 2026
    • Python
      6000Updated Apr 9, 2026Apr 9, 2026
    • Fine-tune a small model to extract and harmonize funding entities from scientific acknowledgements
      Python
      0100Updated Mar 27, 2026Mar 27, 2026
    • Python
      0000Updated Apr 28, 2025Apr 28, 2025
    • Python
      11810Updated Feb 25, 2025Feb 25, 2025
    • Python
      0400Updated Feb 6, 2025Feb 6, 2025
    • Collection of resources for RL and Reasoning
      Jupyter Notebook
      22700Updated Feb 3, 2025Feb 3, 2025
    • A library for open-source data processing tools to create language model training datasets
      0500Updated Jan 9, 2025Jan 9, 2025
    • RAG-Eval

      Public
      Python
      0000Updated Jan 7, 2025Jan 7, 2025
    • Project provides an API for correcting mistakes in written essays using AI model
      Python
      0300Updated Dec 20, 2024Dec 20, 2024
    • An introduction to LLM Sampling
      Jupyter Notebook
      MIT License
      38000Updated Dec 15, 2024Dec 15, 2024
    • Jupyter Notebook
      Apache License 2.0
      1500Updated Dec 12, 2024Dec 12, 2024
    • Testing language adherence of models, i.e. how much they are able to continue the generation in the desired output language.
      R
      Apache License 2.0
      0100Updated Dec 5, 2024Dec 5, 2024
    • logos

      Public
      0000Updated Dec 4, 2024Dec 4, 2024
    • Set of scripts to generate synthetic correction/rewriting of problematic OCR
      0100Updated Apr 12, 2024Apr 12, 2024
    • Set of scripts to finetune LLMs
      Python
      33800Updated Mar 30, 2024Mar 30, 2024
    • Jupyter Notebook
      MIT License
      36740Updated Mar 4, 2024Mar 4, 2024
    • OCRoscope

      Public
      Small python package to measure OCR quality and other related metrics.
      Python
      MIT License
      32700Updated Feb 19, 2024Feb 19, 2024
    • MIT License
      0000Updated Feb 18, 2024Feb 18, 2024
    ProTip! When viewing an organization's repositories, you can use the props. filter to filter by custom property.