Self-Evolving Curriculum for LLM Reasoning

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This document presents Self-Evolving Curriculum (SEC), a novel method for reinforcement learning (RL) fine-tuning of large language models (LLMs) to enhance their reasoning capabilities. SEC frames curriculum selection as a non-stationary Multi-Armed Bandit (MAB) problem, where problem categories represent individual "arms". It learns a curriculum policy concurrently with LLM training, utilizing the absolute advantage from policy gradient methods as a metric for learning gain to dynamically adjust the problems presented. The paper demonstrates SEC's effectiveness across planning, inductive reasoning, and mathematics, showing improvements in generalization to harder problems and better skill balance in multi-domain training.