Reinforcement Learning and Planning · Reinforcement Learning

Title	Authors
Convergent Policy Optimization for Safe Reinforcement Learning	Ming Yu · Zhuoran Yang · Mladen Kolar · Zhaoran Wang
Experience Replay for Continual Learning	David Rolnick · Arun Ahuja · Jonathan Schwarz · Timothy Lillicrap · Gregory Wayne
Exploration via Hindsight Goal Generation	Zhizhou Ren · Kefan Dong · Yuan Zhou · Qiang Liu · Jian Peng
Hindsight Credit Assignment	Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos
Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement	Chao Yang · Xiaojian Ma · Wenbing Huang · Fuchun Sun · Huaping Liu · Junzhou Huang · Chuang Gan
Importance Resampling for Off-policy Prediction	Matthew Schlegel · Wesley Chung · Daniel Graves · Jian Qian · Martha White
Learning Compositional Neural Programs with Recursive Tree Search and Planning	Thomas PIERROT · Guillaume Ligner · Scott Reed · Olivier Sigaud · Nicolas Perrin · Alexandre Laterre · David Kas · Karim Beguir · Nando de Freitas
Multi-View Reinforcement Learning	Minne Li · Lisheng Wu · Jun WANG · Haitham Bou Ammar
Real-Time Reinforcement Learning	Simon Ramstedt · Chris Pal
Reconciling λ-Returns with Experience Replay	Brett Daley · Christopher Amato
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function	Zihan Zhang · Xiangyang Ji
Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update	Su Young Lee · Choi Sungik · Sae-Young Chung
Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling	Andrey Kolobov · Yuval Peres · Cheng Lu · Eric Horvitz
Trust Region-Guided Proximal Policy Optimization	Yuhui Wang · Hao He · Xiaoyang Tan · Yaozhong Gan
Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning	Harm Van Seijen · Mehdi Fatemi · Arash Tavakoli
A Geometric Perspective on Optimal Representations for Reinforcement Learning	Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle
A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning	Wenhao Yang · Xiang Li · Zhihua Zhang
Constrained Reinforcement Learning Has Zero Duality Gap	Santiago Paternain · Luiz Chamon · Miguel Calvo-Fullana · Alejandro Ribeiro
Distributional Reward Decomposition for Reinforcement Learning	Zichuan Lin · Li Zhao · Derek Yang · Tao Qin · Tie-Yan Liu · Guangwen Yang
Divergence-Augmented Policy Optimization	Qing Wang · Yingru Li · Jiechao Xiong · Tong Zhang
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections	Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li
Fast Efficient Hyperparameter Tuning for Policy Gradient Methods	Supratik Paul · Vitaly Kurin · Shimon Whiteson
Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning	Harsh Gupta · R. Srikant · Lei Ying
Fully Parameterized Quantile Function for Distributional Reinforcement Learning	Derek Yang · Li Zhao · Zichuan Lin · Tao Qin · Jiang Bian · Tie-Yan Liu
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning	Nathan Kallus · Masatoshi Uehara
Learning Reward Machines for Partially Observable Reinforcement Learning	Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Sheila McIlraith
Off-Policy Evaluation via Off-Policy Classification	Alexander Irpan · Kanishka Rao · Konstantinos Bousmalis · Chris Harris · Julian Ibarz · Sergey Levine
SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies	Seyed Kamyar Seyed Ghasemipour · Shixiang (Shane) Gu · Richard Zemel
Variance Reduced Policy Evaluation with Smooth Function Approximation	Hoi-To Wai · Mingyi Hong · Zhuoran Yang · Zhaoran Wang · Kexin Tang
VIREL: A Variational Inference Framework for Reinforcement Learning	Matthew Fellows · Anuj Mahajan · Tim G. J. Rudner · Shimon Whiteson
Budgeted Reinforcement Learning in Continuous State Space	Nicolas Carrara · Edouard Leurent · Romain Laroche · Tanguy Urvoy · Odalric-Ambrym Maillard · Olivier Pietquin
Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory	Bin Hu · Usman Syed
From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization	Krzysztof M Choromanski · Aldo Pacchiano · Jack Parker-Holder · Yunhao Tang · Vikas Sindhwani
Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards	Alexander Trott · Stephan Zheng · Caiming Xiong · Richard Socher
Learning from Trajectories via Subgoal Discovery	Sujoy Paul · Jeroen Vanbaar · Amit Roy-Chowdhury
Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning	Gregory Farquhar · Shimon Whiteson · Jakob Foerster
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling	Tengyang Xie · Yifei Ma · Yu-Xiang Wang
Meta-Inverse Reinforcement Learning with Probabilistic Context Variables	Lantao Yu · Tianhe Yu · Chelsea Finn · Stefano Ermon
Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy	Boyi Liu · Qi Cai · Zhuoran Yang · Zhaoran Wang
Neural Temporal-Difference Learning Converges to Global Optima	Qi Cai · Zhuoran Yang · Jason Lee · Zhaoran Wang
Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost	Zhuoran Yang · Yongxin Chen · Mingyi Hong · Zhaoran Wang
Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning	Wenjie Shi · Shiji Song · Hui Wu · Ya-Chu Hsu · Cheng Wu · Gao Huang
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction	Aviral Kumar · Justin Fu · George Tucker · Sergey Levine
Surrogate Objectives for Batch Policy Optimization in One-step Decision Making	Minmin Chen · Ramki Gummadi · Chris Harris · Dale Schuurmans
Discovery of Useful Questions as Auxiliary Tasks	Vivek Veeriah · Matteo Hessel · Zhongwen Xu · Janarthanan Rajendran · Richard L Lewis · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh
A Composable Specification Language for Reinforcement Learning Tasks	Kishor Jothimurugan · Rajeev Alur · Osbert Bastani
A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation	Runzhe Yang · Xingyuan Sun · Karthik Narasimhan
A Kernel Loss for Solving the Bellman Equation	Yihao Feng · Lihong Li · Qiang Liu
Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates	Carlos Riquelme · Hugo Penedones · Damien Vincent · Hartmut Maennel · Sylvain Gelly · Timothy A Mann · Andre Barreto · Gergely Neu
Curriculum-guided Hindsight Experience Replay	Meng Fang · Tianyi Zhou · Yali Du · Lei Han · Zhengyou Zhang
Distributional Policy Optimization: An Alternative Approach for Continuous Control	Chen Tessler · Guy Tennenholtz · Shie Mannor
Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation	Samuel Ainsworth · Matt Barnes · Siddhartha Srinivasa
Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck	Maximilian Igl · Kamil Ciosek · Yingzhen Li · Sebastian Tschiatschek · Cheng Zhang · Sam Devlin · Katja Hofmann
Goal-conditioned Imitation Learning	Yiming Ding · Carlos Florensa · Pieter Abbeel · Mariano Phielipp
Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning	Mahmoud ("Mido") Assran · Joshua Romoff · Nicolas Ballas · Joelle Pineau · Mike Rabbat
Imitation-Projected Programmatic Reinforcement Learning	Abhinav Verma · Hoang Le · Yisong Yue · Swarat Chaudhuri
Reinforcement Learning with Convex Constraints	Sobhan Miryoosefi · Kianté Brantley · Hal Daume III · Miro Dudik · Robert Schapire
RUDDER: Return Decomposition for Delayed Rewards	Jose A. Arjona-Medina · Michael Gillhofer · Michael Widrich · Thomas Unterthiner · Johannes Brandstetter · Sepp Hochreiter
Shaping Belief States with Generative Environment Models for RL	Karol Gregor · Danilo Jimenez Rezende · Frederic Besse · Yan Wu · Hamza Merzic · Aaron van den Oord
Towards Interpretable Reinforcement Learning Using Attention Augmented Agents	Alexander Mott · Daniel Zoran · Mike Chrzanowski · Daan Wierstra · Danilo Jimenez Rezende