Research Engineer @yandex | LLM post-training
Pinned Loading
-
from-unsolvable-to-solvable-rlvr
from-unsolvable-to-solvable-rlvr PublicRLVR experiment: test if training on hard tasks with hints helps solve hard tasks without hints (generalization). Paired GRPO runs compare hint-mixed training vs no-hint control on the same no-hint…
Python
-
Practical_RL
Practical_RL PublicForked from yandexdataschool/Practical_RL
A course in reinforcement learning in the wild
Jupyter Notebook
Something went wrong, please refresh the page to try again.
If the problem persists, check the GitHub status page or contact support.
If the problem persists, check the GitHub status page or contact support.
