UFT: Unifying Supervised and Reinforcement Fine-Tuning

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This paper introduces Unified Fine-Tuning (UFT), a novel method for enhancing the reasoning capabilities of large language models (LLMs) by integrating supervised fine-tuning (SFT) and reinforcement fine-tuning (RFT). The authors argue that traditional SFT and RFT have limitations, with SFT potentially overfitting and RFT being constrained by the base model's initial capacity. UFT addresses these issues by blending memorization through supervised signals (hints) with exploration through reinforcement learning, enabling more effective knowledge acquisition and generalization. Theoretical analysis suggests that UFT offers an exponential improvement in sample complexity for long-horizon reasoning tasks compared to standard RFT. Empirical results across different model sizes and tasks demonstrate that UFT consistently outperforms both SFT and RFT, showcasing its ability to adapt and leverage the strengths of both approaches.