A Meta-MDP Approach to Exploration for Lifelong Reinforcement Learning | Francisco Garcia · Philip Thomas |
Limiting Extrapolation in Linear Approximate Value Iteration | Andrea Zanette · Alessandro Lazaric · Mykel J Kochenderfer · Emma Brunskill |
Propagating Uncertainty in Reinforcement Learning via Wasserstein Barycenters | Alberto Maria Metelli · Amarildo Likmeta · Marcello Restelli |
Provably Efficient Q-Learning with Low Switching Cost | Yu Bai · Tengyang Xie · Nan Jiang · Yu-Xiang Wang |
Regret Bounds for Learning State Representations in Reinforcement Learning | Ronald Ortner · Matteo Pirotta · Alessandro Lazaric · Ronan Fruit · Odalric-Ambrym Maillard |
Safe Exploration for Interactive Machine Learning | Matteo Turchetta · Felix Berkenkamp · Andreas Krause |
Successor Uncertainties: Exploration and Uncertainty in Temporal Difference Learning | David Janz · Jiri Hron · Przemysław Mazur · Katja Hofmann · José Miguel Hernández-Lobato · Sebastian Tschiatschek |
Almost Horizon-Free Structure-Aware Best Policy Identification with a Generative Model | Andrea Zanette · Mykel J Kochenderfer · Emma Brunskill |
Better Exploration with Optimistic Actor Critic | Kamil Ciosek · Quan Vuong · Robert Loftin · Katja Hofmann |
Provably Efficient Q-learning with Function Approximation via Distribution Shift Error Checking Oracle | Simon Du · Yuping Luo · Ruosong Wang · Hanrui Zhang |
Explicit Planning for Efficient Exploration in Reinforcement Learning | Liangpeng Zhang · Ke Tang · Xin Yao |
Exploration Bonus for Regret Minimization in Discrete and Continuous Average Reward MDPs | Jian QIAN · Ronan Fruit · Matteo Pirotta · Alessandro Lazaric |
Information-Theoretic Confidence Bounds for Reinforcement Learning | Xiuyuan Lu · Benjamin Van Roy |
Worst-Case Regret Bounds for Exploration via Randomized Value Functions | Daniel Russo |