| A Family of Robust Stochastic Operators for Reinforcement Learning | Yingdong Lu · Mark Squillante · Chai Wah Wu |
| A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment | Felix Leibfried · Sergio Pascual-Díaz · Jordi Grau-Moya |
| Finite-Sample Analysis for SARSA with Linear Function Approximation | Shaofeng Zou · Tengyu Xu · Yingbin Liang |
| Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards | Falcon Dai · Matthew Walter |
| Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs | Max Simchowitz · Kevin Jamieson |
| Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives | Wang Chi Cheung |
| Sampling Networks and Aggregate Simulation for Online POMDP Planning | Hao(Jackson) Cui · Roni Khardon |
| Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples | Tengyu Xu · Shaofeng Zou · Yingbin Liang |
| Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm | Amir-massoud Farahmand |