Generalized Off-Policy Actor-Critic | Shangtong Zhang · Wendelin Boehmer · Shimon Whiteson |
Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints | Sebastian Tschiatschek · Ahana Ghosh · Luis Haug · Rati Devidze · Adish Singla |
Logarithmic Regret for Online Control | Naman Agarwal · Elad Hazan · Karan Singh |
Adaptive Auxiliary Task Weighting for Reinforcement Learning | Xingyu Lin · Harjatin Baweja · George Kantor · David Held |
Causal Confusion in Imitation Learning | Pim de Haan · Dinesh Jayaraman · Sergey Levine |
Hierarchical Decision Making by Generating and Following Natural Language Instructions | Hengyuan Hu · Denis Yarats · Qucheng Gong · Yuandong Tian · Mike Lewis |
Non-Cooperative Inverse Reinforcement Learning | Xiangyuan Zhang · Kaiqing Zhang · Erik Miehling · Tamer Basar |
Robust exploration in linear quadratic reinforcement learning | Jack Umenberger · Mina Ferizbegovic · Thomas Schön · Håkan Hjalmarsson |
Compositional Plan Vectors | Coline Devin · Daniel Geng · Pieter Abbeel · Trevor Darrell · Sergey Levine |
Online Optimal Control with Linear Dynamics and Predictions: Algorithms and Regret Analysis | Yingying Li · Xin Chen · Na Li |
Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games | Kaiqing Zhang · Zhuoran Yang · Tamer Basar |
Policy Continuation with Hindsight Inverse Dynamics | Hao Sun · Zhizhong Li · Xiaotong Liu · Bolei Zhou · Dahua Lin |