Convergent Policy Optimization for Safe Reinforcement Learning | Ming Yu · Zhuoran Yang · Mladen Kolar · Zhaoran Wang |
Experience Replay for Continual Learning | David Rolnick · Arun Ahuja · Jonathan Schwarz · Timothy Lillicrap · Gregory Wayne |
Exploration via Hindsight Goal Generation | Zhizhou Ren · Kefan Dong · Yuan Zhou · Qiang Liu · Jian Peng |
Hindsight Credit Assignment | Anna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos |
Imitation Learning from Observations by Minimizing Inverse Dynamics Disagreement | Chao Yang · Xiaojian Ma · Wenbing Huang · Fuchun Sun · Huaping Liu · Junzhou Huang · Chuang Gan |
Importance Resampling for Off-policy Prediction | Matthew Schlegel · Wesley Chung · Daniel Graves · Jian Qian · Martha White |
Learning Compositional Neural Programs with Recursive Tree Search and Planning | Thomas PIERROT · Guillaume Ligner · Scott Reed · Olivier Sigaud · Nicolas Perrin · Alexandre Laterre · David Kas · Karim Beguir · Nando de Freitas |
Multi-View Reinforcement Learning | Minne Li · Lisheng Wu · Jun WANG · Haitham Bou Ammar |
Real-Time Reinforcement Learning | Simon Ramstedt · Chris Pal |
Reconciling λ-Returns with Experience Replay | Brett Daley · Christopher Amato |
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias Function | Zihan Zhang · Xiangyang Ji |
Sample-Efficient Deep Reinforcement Learning via Episodic Backward Update | Su Young Lee · Choi Sungik · Sae-Young Chung |
Staying up to Date with Online Content Changes Using Reinforcement Learning for Scheduling | Andrey Kolobov · Yuval Peres · Cheng Lu · Eric Horvitz |
Trust Region-Guided Proximal Policy Optimization | Yuhui Wang · Hao He · Xiaoyang Tan · Yaozhong Gan |
Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement Learning | Harm Van Seijen · Mehdi Fatemi · Arash Tavakoli |
A Geometric Perspective on Optimal Representations for Reinforcement Learning | Marc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle |
A Regularized Approach to Sparse Optimal Policy in Reinforcement Learning | Wenhao Yang · Xiang Li · Zhihua Zhang |
Constrained Reinforcement Learning Has Zero Duality Gap | Santiago Paternain · Luiz Chamon · Miguel Calvo-Fullana · Alejandro Ribeiro |
Distributional Reward Decomposition for Reinforcement Learning | Zichuan Lin · Li Zhao · Derek Yang · Tao Qin · Tie-Yan Liu · Guangwen Yang |
Divergence-Augmented Policy Optimization | Qing Wang · Yingru Li · Jiechao Xiong · Tong Zhang |
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution Corrections | Ofir Nachum · Yinlam Chow · Bo Dai · Lihong Li |
Fast Efficient Hyperparameter Tuning for Policy Gradient Methods | Supratik Paul · Vitaly Kurin · Shimon Whiteson |
Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement Learning | Harsh Gupta · R. Srikant · Lei Ying |
Fully Parameterized Quantile Function for Distributional Reinforcement Learning | Derek Yang · Li Zhao · Zichuan Lin · Tao Qin · Jiang Bian · Tie-Yan Liu |
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement Learning | Nathan Kallus · Masatoshi Uehara |
Learning Reward Machines for Partially Observable Reinforcement Learning | Rodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Sheila McIlraith |
Off-Policy Evaluation via Off-Policy Classification | Alexander Irpan · Kanishka Rao · Konstantinos Bousmalis · Chris Harris · Julian Ibarz · Sergey Levine |
SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional Policies | Seyed Kamyar Seyed Ghasemipour · Shixiang (Shane) Gu · Richard Zemel |
Variance Reduced Policy Evaluation with Smooth Function Approximation | Hoi-To Wai · Mingyi Hong · Zhuoran Yang · Zhaoran Wang · Kexin Tang |
VIREL: A Variational Inference Framework for Reinforcement Learning | Matthew Fellows · Anuj Mahajan · Tim G. J. Rudner · Shimon Whiteson |
Budgeted Reinforcement Learning in Continuous State Space | Nicolas Carrara · Edouard Leurent · Romain Laroche · Tanguy Urvoy · Odalric-Ambrym Maillard · Olivier Pietquin |
Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System Theory | Bin Hu · Usman Syed |
From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox Optimization | Krzysztof M Choromanski · Aldo Pacchiano · Jack Parker-Holder · Yunhao Tang · Vikas Sindhwani |
Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped Rewards | Alexander Trott · Stephan Zheng · Caiming Xiong · Richard Socher |
Learning from Trajectories via Subgoal Discovery | Sujoy Paul · Jeroen Vanbaar · Amit Roy-Chowdhury |
Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement Learning | Gregory Farquhar · Shimon Whiteson · Jakob Foerster |
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance Sampling | Tengyang Xie · Yifei Ma · Yu-Xiang Wang |
Meta-Inverse Reinforcement Learning with Probabilistic Context Variables | Lantao Yu · Tianhe Yu · Chelsea Finn · Stefano Ermon |
Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal Policy | Boyi Liu · Qi Cai · Zhuoran Yang · Zhaoran Wang |
Neural Temporal-Difference Learning Converges to Global Optima | Qi Cai · Zhuoran Yang · Jason Lee · Zhaoran Wang |
Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic Cost | Zhuoran Yang · Yongxin Chen · Mingyi Hong · Zhaoran Wang |
Regularized Anderson Acceleration for Off-Policy Deep Reinforcement Learning | Wenjie Shi · Shiji Song · Hui Wu · Ya-Chu Hsu · Cheng Wu · Gao Huang |
Stabilizing Off-Policy Q-Learning via Bootstrapping Error Reduction | Aviral Kumar · Justin Fu · George Tucker · Sergey Levine |
Surrogate Objectives for Batch Policy Optimization in One-step Decision Making | Minmin Chen · Ramki Gummadi · Chris Harris · Dale Schuurmans |
Discovery of Useful Questions as Auxiliary Tasks | Vivek Veeriah · Matteo Hessel · Zhongwen Xu · Janarthanan Rajendran · Richard L Lewis · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh |
A Composable Specification Language for Reinforcement Learning Tasks | Kishor Jothimurugan · Rajeev Alur · Osbert Bastani |
A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy Adaptation | Runzhe Yang · Xingyuan Sun · Karthik Narasimhan |
A Kernel Loss for Solving the Bellman Equation | Yihao Feng · Lihong Li · Qiang Liu |
Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty Estimates | Carlos Riquelme · Hugo Penedones · Damien Vincent · Hartmut Maennel · Sylvain Gelly · Timothy A Mann · Andre Barreto · Gergely Neu |
Curriculum-guided Hindsight Experience Replay | Meng Fang · Tianyi Zhou · Yali Du · Lei Han · Zhengyou Zhang |
Distributional Policy Optimization: An Alternative Approach for Continuous Control | Chen Tessler · Guy Tennenholtz · Shie Mannor |
Mo' States Mo' Problems: Emergency Stop Mechanisms from Observation | Samuel Ainsworth · Matt Barnes · Siddhartha Srinivasa |
Generalization in Reinforcement Learning with Selective Noise Injection and Information Bottleneck | Maximilian Igl · Kamil Ciosek · Yingzhen Li · Sebastian Tschiatschek · Cheng Zhang · Sam Devlin · Katja Hofmann |
Goal-conditioned Imitation Learning | Yiming Ding · Carlos Florensa · Pieter Abbeel · Mariano Phielipp |
Gossip-based Actor-Learner Architectures for Deep Reinforcement Learning | Mahmoud ("Mido") Assran · Joshua Romoff · Nicolas Ballas · Joelle Pineau · Mike Rabbat |
Imitation-Projected Programmatic Reinforcement Learning | Abhinav Verma · Hoang Le · Yisong Yue · Swarat Chaudhuri |
Reinforcement Learning with Convex Constraints | Sobhan Miryoosefi · Kianté Brantley · Hal Daume III · Miro Dudik · Robert Schapire |
RUDDER: Return Decomposition for Delayed Rewards | Jose A. Arjona-Medina · Michael Gillhofer · Michael Widrich · Thomas Unterthiner · Johannes Brandstetter · Sepp Hochreiter |
Shaping Belief States with Generative Environment Models for RL | Karol Gregor · Danilo Jimenez Rezende · Frederic Besse · Yan Wu · Hamza Merzic · Aaron van den Oord |
Towards Interpretable Reinforcement Learning Using Attention Augmented Agents | Alexander Mott · Daniel Zoran · Mike Chrzanowski · Daan Wierstra · Danilo Jimenez Rezende |