Authors: Anirudh Vemula,Wen Sun,J. Andrew Bagnell
ArXiv: 1901.11503
Document:
PDF
DOI
Abstract URL: http://arxiv.org/abs/1901.11503v1
Black-box optimizers that explore in parameter space have often been shown to
outperform more sophisticated action space exploration methods developed
specifically for the reinforcement learning problem. We examine these black-box
methods closely to identify situations in which they are worse than action
space exploration methods and those in which they are superior. Through simple
theoretical analyses, we prove that complexity of exploration in parameter
space depends on the dimensionality of parameter space, while complexity of
exploration in action space depends on both the dimensionality of action space
and horizon length. This is also demonstrated empirically by comparing simple
exploration methods on several model problems, including Contextual Bandit,
Linear Regression and Reinforcement Learning in continuous control.