Open library

This portal has been archived. Explore the next generation of this technology.

Quick Best Action Identification in Linear Bandit Problems

lib:0fd07fd98490da1e (v1.0.0)

Authors: Jun Geng,Lifeng Lai
ArXiv: 1812.00365
Document: PDF DOI

Abstract URL: http://arxiv.org/abs/1812.00365v1

In this paper, we consider a best action identification problem in the stochastic linear bandit setup with a fixed confident constraint. In the considered best action identification problem, instead of minimizing the accumulative regret as done in existing works, the learner aims to obtain an accurate estimate of the underlying parameter based on his action and reward sequences. To improve the estimation efficiency, the learner is allowed to select his action based his historical information; hence the whole procedure is designed in a sequential adaptive manner. We first show that the existing algorithms designed to minimize the accumulative regret is not a consistent estimator and hence is not a good policy for our problem. We then characterize a lower bound on the estimation error for any policy. We further design a simple policy and show that the estimation error of the designed policy achieves the same scaling order as that of the derived lower bound.

Relevant initiatives

Related knowledge about this paper

Search on this portal

Reproduced results (crowd-benchmarking and competitions)

Artifact and reproducibility checklists

Common formats for research projects and shared artifacts

Collective Knowledge (organizing research projects based on FAIR principles)

Reproducibility initiatives

Comments

Please log in to add your comments!

If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!

Quick Best Action Identification in Linear Bandit Problems

Relevant initiatives Hide

Comments Hide

Relevant initiatives

Comments