Paper: ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle
Authors: Mihran Miroyan*, Rose Niousha*, Joseph E. Gonzalez, Gireeja Ranade, Narges Norouzi (UC Berkeley)
TL;DR We study student modeling through generating student-like code submissions. We find that fine-tuning is important for "unlearning" professional code and learn how to code like a student. Evaluation is important for capturing different aspects of student-like code.
-
Evaluation metrics. We introduce a set of metrics, including code semantics, error type, and code style, to evaluate the realism of “student-like” code.
-
Sequential code modeling. We fine-tune on low- and high-resolution student code streams to simulate realistic learning trajectories on different levels of granularity.
-
Fine-tuning vs. prompting. We find that when models are fine-tuned on student code to specific homework problems, they outperform prompting-only models along the proposed set of metrics.
Please read our paper for more details. This repo contains scripts for fine-tuning, generation, and evaluation code. We do not release real / synthetic data and our data processing pipeline due to student privacy.
Install all necessary dependencies: pip3 install -r requirements.txt
fine_tuning/: Fine-tuning prompt templates and the training script (see more details in fine_tuning/README.md)
data_generation/: Prompt templates and script for data generation (see more details in data_generation/README.md).
evals/: Scripts for running evaluations, computing metrics, and visualizing results (see more details in evals/README.md).
@misc{miroyan2025parastudentgeneratingevaluatingrealistic,
title={ParaStudent: Generating and Evaluating Realistic Student Code by Teaching LLMs to Struggle},
author={Mihran Miroyan and Rose Niousha and Joseph E. Gonzalez and Gireeja Ranade and Narges Norouzi},
year={2025},
eprint={2507.12674},
archivePrefix={arXiv},
primaryClass={cs.CY},
url={https://arxiv.org/abs/2507.12674},
}
