A Family of Robust Stochastic Operators for Reinforcement Learning | Yingdong Lu · Mark Squillante · Chai Wah Wu |
A Unified Bellman Optimality Principle Combining Reward Maximization and Empowerment | Felix Leibfried · Sergio Pascual-Díaz · Jordi Grau-Moya |
Finite-Sample Analysis for SARSA with Linear Function Approximation | Shaofeng Zou · Tengyu Xu · Yingbin Liang |
Maximum Expected Hitting Cost of a Markov Decision Process and Informativeness of Rewards | Falcon Dai · Matthew Walter |
Non-Asymptotic Gap-Dependent Regret Bounds for Tabular MDPs | Max Simchowitz · Kevin Jamieson |
Regret Minimization for Reinforcement Learning with Vectorial Feedback and Complex Objectives | Wang Chi Cheung |
Sampling Networks and Aggregate Simulation for Online POMDP Planning | Hao(Jackson) Cui · Roni Khardon |
Two Time-scale Off-Policy TD Learning: Non-asymptotic Analysis over Markovian Samples | Tengyu Xu · Shaofeng Zou · Yingbin Liang |
Value Function in Frequency Domain and the Characteristic Value Iteration Algorithm | Amir-massoud Farahmand |