| Generalized Off-Policy Actor-Critic | Shangtong Zhang · Wendelin Boehmer · Shimon Whiteson |
| Learner-aware Teaching: Inverse Reinforcement Learning with Preferences and Constraints | Sebastian Tschiatschek · Ahana Ghosh · Luis Haug · Rati Devidze · Adish Singla |
| Logarithmic Regret for Online Control | Naman Agarwal · Elad Hazan · Karan Singh |
| Adaptive Auxiliary Task Weighting for Reinforcement Learning | Xingyu Lin · Harjatin Baweja · George Kantor · David Held |
| Causal Confusion in Imitation Learning | Pim de Haan · Dinesh Jayaraman · Sergey Levine |
| Hierarchical Decision Making by Generating and Following Natural Language Instructions | Hengyuan Hu · Denis Yarats · Qucheng Gong · Yuandong Tian · Mike Lewis |
| Non-Cooperative Inverse Reinforcement Learning | Xiangyuan Zhang · Kaiqing Zhang · Erik Miehling · Tamer Basar |
| Robust exploration in linear quadratic reinforcement learning | Jack Umenberger · Mina Ferizbegovic · Thomas Schön · Håkan Hjalmarsson |
| Compositional Plan Vectors | Coline Devin · Daniel Geng · Pieter Abbeel · Trevor Darrell · Sergey Levine |
| Online Optimal Control with Linear Dynamics and Predictions: Algorithms and Regret Analysis | Yingying Li · Xin Chen · Na Li |
| Policy Optimization Provably Converges to Nash Equilibria in Zero-Sum Linear Quadratic Games | Kaiqing Zhang · Zhuoran Yang · Tamer Basar |
| Policy Continuation with Hindsight Inverse Dynamics | Hao Sun · Zhizhong Li · Xiaotong Liu · Bolei Zhou · Dahua Lin |