sekopylov

Follow

Sergei Kopylov sekopylov

Follow

Research Engineer @yandex | LLM post-training

0 followers · 5 following

Achievements

Achievements

Pinned Loading

from-unsolvable-to-solvable-rlvr from-unsolvable-to-solvable-rlvr Public

RLVR experiment: test if training on hard tasks with hints helps solve hard tasks without hints (generalization). Paired GRPO runs compare hint-mixed training vs no-hint control on the same no-hint…

Python
Practical_RL Practical_RL Public

Forked from yandexdataschool/Practical_RL

A course in reinforcement learning in the wild

Jupyter Notebook