Carles Domingo-Enrich

Senior Researcher at Microsoft Research New England working on generative AI models, including diffusion and flow models, language models, and energy-based fine-tuning, at the intersection of machine learning, statistics, and AI for science.

Energy-Based Fine-Tuning Trains Language Models on Whole Responses

Microsoft Research’s presentation on energy-based fine-tuning argues that language-model post-training can be aimed at whole responses rather than next-token imitation. Carles Domingo-Enrich presents EBFT as a middle path between supervised fine-tuning and reinforcement learning: it samples model completions, compares them with ground-truth answers in a model-derived feature space, and turns that comparison into a policy-gradient update without a separate reward model or verifier. The reported results show gains over SFT on several coding and translation measures, with performance often comparable to RLVR while avoiding explicit correctness rewards.

Microsoft ResearchMay 14, 20267 min read