Optimal Pure Exploration in Linear Bandits via Sampling

Best AI papers explained - A podcast by Enoch H. Kang - Fridays

Categories:

This research addresses the challenge of efficient exploration in linear bandit problems, aiming to identify the optimal action with minimal measurements. Existing optimal methods often involve computationally intensive steps like projections or maintaining subsets of actions. The paper introduces a novel algorithm, PEPS, which achieves asymptotic optimality using only sampling and argmax oracles, similar to the simpler Thompson Sampling. Unlike Thompson Sampling, which is suboptimal for pure exploration, PEPS leverages a sampling distribution and an online learner to guide exploration. Theoretical analysis demonstrates that PEPS achieves an exponential convergence rate, matching the optimal fixed allocation. Preliminary experiments suggest that this sampling-based approach performs competitively with existing, more complex algorithms in various scenarios.