Open library

This portal has been archived. Explore the next generation of this technology.

MULEX: Disentangling Exploitation from Exploration in Deep RL

lib:c2d725407aa92d6f (v1.0.0)

Authors: Lucas Beyer,Damien Vincent,Olivier Teboul,Sylvain Gelly,Matthieu Geist,Olivier Pietquin
ArXiv: 1907.00868
Document: PDF DOI

Abstract URL: https://arxiv.org/abs/1907.00868v1

An agent learning through interactions should balance its action selection process between probing the environment to discover new rewards and using the information acquired in the past to adopt useful behaviour. This trade-off is usually obtained by perturbing either the agent's actions (e.g., e-greedy or Gibbs sampling) or the agent's parameters (e.g., NoisyNet), or by modifying the reward it receives (e.g., exploration bonus, intrinsic motivation, or hand-shaped rewards). Here, we adopt a disruptive but simple and generic perspective, where we explicitly disentangle exploration and exploitation. Different losses are optimized in parallel, one of them coming from the true objective (maximizing cumulative rewards from the environment) and others being related to exploration. Every loss is used in turn to learn a policy that generates transitions, all shared in a single replay buffer. Off-policy methods are then applied to these transitions to optimize each loss. We showcase our approach on a hard-exploration environment, show its sample-efficiency and robustness, and discuss further implications.

Relevant initiatives

Related knowledge about this paper

Search on this portal

Reproduced results (crowd-benchmarking and competitions)

Artifact and reproducibility checklists

Common formats for research projects and shared artifacts

Collective Knowledge (organizing research projects based on FAIR principles)

Reproducibility initiatives

Comments

Please log in to add your comments!

If you notice any inapropriate content that should not be here, please report us as soon as possible and we will try to remove it within 48 hours!

MULEX: Disentangling Exploitation from Exploration in Deep RL

Relevant initiatives Hide

Comments Hide

Relevant initiatives

Comments