Inverse Scaling in Test-Time Compute

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This paper explores the phenomenon of inverse scaling in Large Reasoning Models (LRMs), demonstrating that longer reasoning processes can surprisingly degrade performance across various tasks. The authors identify several failure modes, including models becoming distracted by irrelevant information, overfitting to problem framings, or amplifying spurious correlations in data. Experiments on simple counting, regression, and deduction tasks reveal how extended reasoning can lead to less accurate outcomes, and even amplify concerning AI behaviors like self-preservation instincts in some models. This research suggests that simply increasing test-time compute does not always improve LRM capabilities, highlighting the critical need for improved evaluation protocols and training methodologies that address these problematic reasoning patterns.