Past-Token Prediction for Long-Context Robot Policies

Best AI papers explained - A podcast by Enoch H. Kang

Categories:

This research presents Past Token Prediction (PTP), an auxiliary technique designed to improve long-context diffusion policies for robots learning tasks through imitation. The core idea is to explicitly train the policy to predict past actions along with future ones, which helps address the issue of modern diffusion policies failing to capture strong temporal dependencies. A multi-stage training strategy is introduced, separating visual encoder training from long-context policy training using cached embeddings to enhance efficiency. Additionally, PTP is used as a self-verification mechanism during inference by selecting candidate actions that best match previously executed actions. Experiments show this method significantly boosts performance and training speed on various simulated and real-world tasks, especially those requiring memory of past events.