Mujin Kwun

Mujin Kwun is an Engineering Fellow on the Kempner Institute Research and Engineering Team at Harvard University and a Research Fellow at the Harvard Architecture, Circuits, and Compilers Lab. His research focuses on improving the efficiency of large language models, and he is a co-author of work on energy-based fine-tuning for language models.

Energy-Based Fine-Tuning Improves Accuracy Without RLVR’s Validation-Loss Penalty

Mujin Kwun and Carles Domingo-Enrich present energy-based fine-tuning as a post-training method that replaces next-token imitation or task-specific rewards with sequence-level feature matching. Their argument is that supervised fine-tuning remains efficient but is trained under teacher forcing, while RL with verifiable rewards can improve accuracy without preserving the target completion distribution. EBFT instead samples model rollouts, compares their frozen-model feature embeddings with reference completions, and uses that signal for policy-gradient updates; in the reported coding and translation experiments, it matched or exceeded RLVR accuracy while producing lower validation cross-entropy than both RLVR and SFT.

Microsoft ResearchMay 26, 202618 min read