Double Quantization for Communication-Efficient Distributed Optimization | Yue Yu · Jiaxiang Wu · Longbo Huang |
Optimal Decision Tree with Noisy Outcomes | Su Jia · viswanath nagarajan · Fatemeh Navidi · R Ravi |
Optimal Statistical Rates for Decentralised Non-Parametric Regression with Linear Speed-Up | Dominic Richards · Patrick Rebeschini |
RSN: Randomized Subspace Newton | Robert Gower · Dmitry Koralev · Felix Lieder · Peter Richtarik |
Towards closing the gap between the theory and practice of SVRG | Othmane Sebbouh · Nidham Gazagnadou · Samy Jelassi · Francis Bach · Robert Gower |
UniXGrad: A Universal, Adaptive Algorithm with Optimal Guarantees for Constrained Optimization | Ali Kavis · Kfir Y. Levy · Francis Bach · Volkan Cevher |
A Latent Variational Framework for Stochastic Optimization | Philippe Casgrain |
A Stochastic Composite Gradient Method with Incremental Variance Reduction | Junyu Zhang · Lin Xiao |
A Universally Optimal Multistage Accelerated Stochastic Gradient Method | Necdet Serhat Aybat · Alireza Fallah · Mert Gurbuzbalaban · Asuman Ozdaglar |
On the convergence of single-call stochastic extra-gradient methods | Yu-Guan Hsieh · Franck Iutzeler · Jérôme Malick · Panayotis Mertikopoulos |
On the Ineffectiveness of Variance Reduced Optimization for Deep Learning | Aaron Defazio · Leon Bottou |
Principal Component Projection and Regression in Nearly Linear Time through Asymmetric SVRG | Yujia Jin · Aaron Sidford |
Understanding the Role of Momentum in Stochastic Gradient Methods | Igor Gitman · Hunter Lang · Pengchuan Zhang · Lin Xiao |
Alleviating Label Switching with Optimal Transport | Pierre Monteiller · Sebastian Claici · Edward Chien · Farzaneh Mirzazadeh · Justin M Solomon · Mikhail Yurochkin |
Beating SGD Saturation with Tail-Averaging and Minibatching | Nicole Muecke · Gergely Neu · Lorenzo Rosasco |
Continuous-time Models for Stochastic Optimization Algorithms | Antonio Orvieto · Aurelien Lucchi |
Distributed estimation of the inverse Hessian by determinantal averaging | Michal Derezinski · Michael W Mahoney |
The Step Decay Schedule: A Near Optimal, Geometrically Decaying Learning Rate Procedure For Least Squares | Rong Ge · Sham Kakade · Rahul Kidambi · Praneeth Netrapalli |
Think out of the "Box": Generically-Constrained Asynchronous Composite Optimization and Hedging | Pooria Joulani · András György · Csaba Szepesvari |
Variance Reduction for Matrix Games | Yair Carmon · Yujia Jin · Aaron Sidford · Kevin Tian |