Reinforcement Learning and Planning · Reinforcement Learning

TitleAuthors
Convergent Policy Optimization for Safe Reinforcement LearningMing Yu · Zhuoran Yang · Mladen Kolar · Zhaoran Wang
Experience Replay for Continual LearningDavid Rolnick · Arun Ahuja · Jonathan Schwarz · Timothy Lillicrap · Gregory Wayne
Exploration via Hindsight Goal GenerationZhizhou Ren · Kefan Dong · Yuan Zhou · Qiang Liu · Jian Peng
Hindsight Credit AssignmentAnna Harutyunyan · Will Dabney · Thomas Mesnard · Mohammad Gheshlaghi Azar · Bilal Piot · Nicolas Heess · Hado van Hasselt · Gregory Wayne · Satinder Singh · Doina Precup · Remi Munos
Imitation Learning from Observations by Minimizing Inverse Dynamics DisagreementChao Yang · Xiaojian Ma · Wenbing Huang · Fuchun Sun · Huaping Liu · Junzhou Huang · Chuang Gan
Importance Resampling for Off-policy PredictionMatthew Schlegel · Wesley Chung · Daniel Graves · Jian Qian · Martha White
Learning Compositional Neural Programs with Recursive Tree Search and PlanningThomas PIERROT · Guillaume Ligner · Scott Reed · Olivier Sigaud · Nicolas Perrin · Alexandre Laterre · David Kas · Karim Beguir · Nando de Freitas
Multi-View Reinforcement LearningMinne Li · Lisheng Wu · Jun WANG · Haitham Bou Ammar
Real-Time Reinforcement LearningSimon Ramstedt · Chris Pal
Reconciling λ-Returns with Experience ReplayBrett Daley · Christopher Amato
Regret Minimization for Reinforcement Learning by Evaluating the Optimal Bias FunctionZihan Zhang · Xiangyang Ji
Sample-Efficient Deep Reinforcement Learning via Episodic Backward UpdateSu Young Lee · Choi Sungik · Sae-Young Chung
Staying up to Date with Online Content Changes Using Reinforcement Learning for SchedulingAndrey Kolobov · Yuval Peres · Cheng Lu · Eric Horvitz
Trust Region-Guided Proximal Policy OptimizationYuhui Wang · Hao He · Xiaoyang Tan · Yaozhong Gan
Using a Logarithmic Mapping to Enable Lower Discount Factors in Reinforcement LearningHarm Van Seijen · Mehdi Fatemi · Arash Tavakoli
A Geometric Perspective on Optimal Representations for Reinforcement LearningMarc Bellemare · Will Dabney · Robert Dadashi · Adrien Ali Taiga · Pablo Samuel Castro · Nicolas Le Roux · Dale Schuurmans · Tor Lattimore · Clare Lyle
A Regularized Approach to Sparse Optimal Policy in Reinforcement LearningWenhao Yang · Xiang Li · Zhihua Zhang
Constrained Reinforcement Learning Has Zero Duality GapSantiago Paternain · Luiz Chamon · Miguel Calvo-Fullana · Alejandro Ribeiro
Distributional Reward Decomposition for Reinforcement LearningZichuan Lin · Li Zhao · Derek Yang · Tao Qin · Tie-Yan Liu · Guangwen Yang
Divergence-Augmented Policy OptimizationQing Wang · Yingru Li · Jiechao Xiong · Tong Zhang
DualDICE: Behavior-Agnostic Estimation of Discounted Stationary Distribution CorrectionsOfir Nachum · Yinlam Chow · Bo Dai · Lihong Li
Fast Efficient Hyperparameter Tuning for Policy Gradient MethodsSupratik Paul · Vitaly Kurin · Shimon Whiteson
Finite-Time Performance Bounds and Adaptive Learning Rate Selection for Two Time-Scale Reinforcement LearningHarsh Gupta · R. Srikant · Lei Ying
Fully Parameterized Quantile Function for Distributional Reinforcement LearningDerek Yang · Li Zhao · Zichuan Lin · Tao Qin · Jiang Bian · Tie-Yan Liu
Intrinsically Efficient, Stable, and Bounded Off-Policy Evaluation for Reinforcement LearningNathan Kallus · Masatoshi Uehara
Learning Reward Machines for Partially Observable Reinforcement LearningRodrigo Toro Icarte · Ethan Waldie · Toryn Klassen · Rick Valenzano · Margarita Castro · Sheila McIlraith
Off-Policy Evaluation via Off-Policy ClassificationAlexander Irpan · Kanishka Rao · Konstantinos Bousmalis · Chris Harris · Julian Ibarz · Sergey Levine
SMILe: Scalable Meta Inverse Reinforcement Learning through Context-Conditional PoliciesSeyed Kamyar Seyed Ghasemipour · Shixiang (Shane) Gu · Richard Zemel
Variance Reduced Policy Evaluation with Smooth Function ApproximationHoi-To Wai · Mingyi Hong · Zhuoran Yang · Zhaoran Wang · Kexin Tang
VIREL: A Variational Inference Framework for Reinforcement LearningMatthew Fellows · Anuj Mahajan · Tim G. J. Rudner · Shimon Whiteson
Budgeted Reinforcement Learning in Continuous State SpaceNicolas Carrara · Edouard Leurent · Romain Laroche · Tanguy Urvoy · Odalric-Ambrym Maillard · Olivier Pietquin
Characterizing the Exact Behaviors of Temporal Difference Learning Algorithms Using Markov Jump Linear System TheoryBin Hu · Usman Syed
From Complexity to Simplicity: Adaptive ES-Active Subspaces for Blackbox OptimizationKrzysztof M Choromanski · Aldo Pacchiano · Jack Parker-Holder · Yunhao Tang · Vikas Sindhwani
Keeping Your Distance: Solving Sparse Reward Tasks Using Self-Balancing Shaped RewardsAlexander Trott · Stephan Zheng · Caiming Xiong · Richard Socher
Learning from Trajectories via Subgoal DiscoverySujoy Paul · Jeroen Vanbaar · Amit Roy-Chowdhury
Loaded DiCE: Trading off Bias and Variance in Any-Order Score Function Gradient Estimators for Reinforcement LearningGregory Farquhar · Shimon Whiteson · Jakob Foerster
Towards Optimal Off-Policy Evaluation for Reinforcement Learning with Marginalized Importance SamplingTengyang Xie · Yifei Ma · Yu-Xiang Wang
Meta-Inverse Reinforcement Learning with Probabilistic Context VariablesLantao Yu · Tianhe Yu · Chelsea Finn · Stefano Ermon
Neural Trust Region/Proximal Policy Optimization Attains Globally Optimal PolicyBoyi Liu · Qi Cai · Zhuoran Yang · Zhaoran Wang
Neural Temporal-Difference Learning Converges to Global OptimaQi Cai · Zhuoran Yang · Jason Lee · Zhaoran Wang
Provably Global Convergence of Actor-Critic: A Case for Linear Quadratic Regulator with Ergodic CostZhuoran Yang · Yongxin Chen · Mingyi Hong · Zhaoran Wang
Regularized Anderson Acceleration for Off-Policy Deep Reinforcement LearningWenjie Shi · Shiji Song · Hui Wu · Ya-Chu Hsu · Cheng Wu · Gao Huang
Stabilizing Off-Policy Q-Learning via Bootstrapping Error ReductionAviral Kumar · Justin Fu · George Tucker · Sergey Levine
Surrogate Objectives for Batch Policy Optimization in One-step Decision MakingMinmin Chen · Ramki Gummadi · Chris Harris · Dale Schuurmans
Discovery of Useful Questions as Auxiliary TasksVivek Veeriah · Matteo Hessel · Zhongwen Xu · Janarthanan Rajendran · Richard L Lewis · Junhyuk Oh · Hado van Hasselt · David Silver · Satinder Singh
A Composable Specification Language for Reinforcement Learning TasksKishor Jothimurugan · Rajeev Alur · Osbert Bastani
A Generalized Algorithm for Multi-Objective Reinforcement Learning and Policy AdaptationRunzhe Yang · Xingyuan Sun · Karthik Narasimhan
A Kernel Loss for Solving the Bellman EquationYihao Feng · Lihong Li · Qiang Liu
Adaptive Temporal-Difference Learning for Policy Evaluation with Per-State Uncertainty EstimatesCarlos Riquelme · Hugo Penedones · Damien Vincent · Hartmut Maennel · Sylvain Gelly · Timothy A Mann · Andre Barreto · Gergely Neu
Curriculum-guided Hindsight Experience ReplayMeng Fang · Tianyi Zhou · Yali Du · Lei Han · Zhengyou Zhang
Distributional Policy Optimization: An Alternative Approach for Continuous ControlChen Tessler · Guy Tennenholtz · Shie Mannor
Mo' States Mo' Problems: Emergency Stop Mechanisms from ObservationSamuel Ainsworth · Matt Barnes · Siddhartha Srinivasa
Generalization in Reinforcement Learning with Selective Noise Injection and Information BottleneckMaximilian Igl · Kamil Ciosek · Yingzhen Li · Sebastian Tschiatschek · Cheng Zhang · Sam Devlin · Katja Hofmann
Goal-conditioned Imitation LearningYiming Ding · Carlos Florensa · Pieter Abbeel · Mariano Phielipp
Gossip-based Actor-Learner Architectures for Deep Reinforcement LearningMahmoud ("Mido") Assran · Joshua Romoff · Nicolas Ballas · Joelle Pineau · Mike Rabbat
Imitation-Projected Programmatic Reinforcement LearningAbhinav Verma · Hoang Le · Yisong Yue · Swarat Chaudhuri
Reinforcement Learning with Convex ConstraintsSobhan Miryoosefi · Kianté Brantley · Hal Daume III · Miro Dudik · Robert Schapire
RUDDER: Return Decomposition for Delayed RewardsJose A. Arjona-Medina · Michael Gillhofer · Michael Widrich · Thomas Unterthiner · Johannes Brandstetter · Sepp Hochreiter
Shaping Belief States with Generative Environment Models for RLKarol Gregor · Danilo Jimenez Rezende · Frederic Besse · Yan Wu · Hamza Merzic · Aaron van den Oord
Towards Interpretable Reinforcement Learning Using Attention Augmented AgentsAlexander Mott · Daniel Zoran · Mike Chrzanowski · Daan Wierstra · Danilo Jimenez Rezende