| Double Quantization for Communication-Efficient Distributed Optimization | Yue Yu · Jiaxiang Wu · Longbo Huang |
| Optimal Decision Tree with Noisy Outcomes | Su Jia · viswanath nagarajan · Fatemeh Navidi · R Ravi |
| Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up | Dominic Richards · Patrick Rebeschini |
| RSN: Randomized Subspace Newton | Robert Gower · Dmitry Koralev · Felix Lieder · Peter Richtarik |
| Towards closing the gap between the theory and practice of SVRG | Othmane Sebbouh · Nidham Gazagnadou · Samy Jelassi · Francis Bach · Robert Gower |
| UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization | Ali Kavis · Kfir Y. Levy · Francis Bach · Volkan Cevher |
| A Latent Variational Framework for Stochastic Optimization | Philippe Casgrain |
| A Stochastic Composite Gradient Method with Incremental Variance Reduction | Junyu Zhang · Lin Xiao |
| A Universally Optimal Multistage Accelerated Stochastic Gradient Method | Necdet Serhat Aybat · Alireza Fallah · Mert Gurbuzbalaban · Asuman Ozdaglar |
| On the convergence of single-call stochastic extra-gradient methods | Yu-Guan Hsieh · Franck Iutzeler · Jérôme Malick · Panayotis Mertikopoulos |
| On the Ineffectiveness of Variance Reduced Optimization for Deep Learning | Aaron Defazio · Leon Bottou |
| Principal Component Projection and Regression in Nearly Linear Time through Asymmetric SVRG | Yujia Jin · Aaron Sidford |
| Understanding the Role of Momentum in Stochastic Gradient Methods | Igor Gitman · Hunter Lang · Pengchuan Zhang · Lin Xiao |
| Alleviating Label Switching with Optimal Transport | Pierre Monteiller · Sebastian Claici · Edward Chien · Farzaneh Mirzazadeh · Justin M Solomon · Mikhail Yurochkin |
| Beating SGD Saturation with Tail-Averaging and Minibatching | Nicole Muecke · Gergely Neu · Lorenzo Rosasco |
| Continuous-time Models for Stochastic Optimization Algorithms | Antonio Orvieto · Aurelien Lucchi |
| Distributed estimation of the inverse Hessian by determinantal averaging | Michal Derezinski · Michael W Mahoney |
| The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares | Rong Ge · Sham Kakade · Rahul Kidambi · Praneeth Netrapalli |
| Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging | Pooria Joulani · András György · Csaba Szepesvari |
| Variance Reduction for Matrix Games | Yair Carmon · Yujia Jin · Aaron Sidford · Kevin Tian |