Title : Non-parametric exploration in multi-armed bandits
Abstract : The multi-armed bandit model is useful for sequential resource allocation tasks in a stochastic environment, e.g., the design of a recommendation algorithm or an adaptive clinical trial. This simple model also captures the exploration/exploitation dilemma that is central in more structured reinforcement learning problems. The two most famous approaches to MABs, namely Upper Confidence Bounds and Thompson Sampling, share the need for some prior information about the arms’ distributions in order to attain optimal performance. We will discuss other families of algorithms based on re-sampling and in particular sub-sampling that perform well in practice and can be proved to be optimal for different families of distributions. Moreover, they can also be used when the reward maximization objective is modified to take into account some notion of risk.
Bio: Emilie Kaufmann in a CNRS researcher in the CRIStAL lab at Université de Lille (France) and a member of the Inria team School. Her research interests lie in statistics and machine learning with an emphasis on sequential learning. She is an expert of the theory of multi-armed bandits and is now more broadly interested in reinforcement learning.