Deep Learning · Optimization for Deep Networks

TitleAuthors
An Improved Analysis of Training Over-parameterized Deep Neural NetworksDifan Zou · Quanquan Gu
Controlling Neural Level SetsMatan Atzmon · Niv Haim · Lior Yariv · Ofer Israelov · Haggai Maron · Yaron Lipman
Deep Equilibrium ModelsShaojie Bai · J. Zico Kolter · Vladlen Koltun
Differentiable Cloth Simulation for Inverse ProblemsJunbang Liang · Ming Lin · Vladlen Koltun
Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural NetworksMahyar Fazlyab · Alexander Robey · Hamed Hassani · Manfred Morari · George Pappas
Fine-grained Optimization of Deep Neural NetworksMete Ozay
Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural NetworksYuan Cao · Quanquan Gu
On Learning Over-parameterized Neural Networks: A Functional Approximation PerspectiveLili Su · Pengkun Yang
Stagewise Training Accelerates Convergence of Testing Error Over SGDZhuoning Yuan · Yan Yan · Rong Jin · Tianbao Yang
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural NetworksYuanzhi Li · Colin Wei · Tengyu Ma
You Only Propagate Once: Accelerating Adversarial Training via Maximal PrincipleDinghuai Zhang · Tianyuan Zhang · Yiping Lu · Zhanxing Zhu · Bin Dong
Constrained deep neural network architecture search for IoT devices accounting for hardware calibrationFlorian Scheidegger · Luca Benini · Costas Bekas · A. Cristiano I. Malossi
Implicit Regularization of Discrete Gradient Dynamics in Linear Neural NetworksGauthier Gidel · Francis Bach · Simon Lacoste-Julien
In-Place Zero-Space Memory Protection for CNNHui Guan · Lin Ning · Zhen Lin · Xipeng Shen · Huiyang Zhou · Seung-Hwan Lim
Large Scale Structure of Neural Network Loss LandscapesStanislav Fort · Stanislaw Jastrzebski
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two LayersZeyuan Allen-Zhu · Yuanzhi Li · Yingyu Liang
Limitations of the empirical Fisher approximation for natural gradient descentFrederik Kunstner · Philipp Hennig · Lukas Balles
Maximum Mean Discrepancy Gradient FlowMichael Arbel · Anna Korba · Adil SALIM · Arthur Gretton
On Lazy Training in Differentiable ProgrammingLénaïc Chizat · Edouard Oyallon · Francis Bach
Reducing the variance in online optimization by transporting past gradientsSébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux
Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural NetworksYuan Cao · Quanquan Gu
Understanding and Improving Layer NormalizationJingjing Xu · Xu Sun · Zhiyuan Zhang · Guangxiang Zhao · Junyang Lin
LCA: Loss Change Allocation for Neural Network TrainingJanice Lan · Rosanne Liu · Hattie Zhou · Jason Yosinski
Explaining Landscape Connectivity of Low-cost Solutions for Multilayer NetsRohith Kuditipudi · Xiang Wang · Holden Lee · Yi Zhang · Zhiyuan Li · Wei Hu · Rong Ge · Sanjeev Arora
Leader Stochastic Gradient Descent for Distributed Training of Deep Learning ModelsYunfei Teng · Wenbo Gao · François Chalus · Anna Choromanska · Donald Goldfarb · Adrian Weller
Learning Neural Networks with Adaptive RegularizationHan Zhao · Yao-Hung Hubert Tsai · Russ Salakhutdinov · Geoffrey Gordon
Memory Efficient Adaptive OptimizationRohan Anil · Vineet Gupta · Tomer Koren · Yoram Singer
On the Convergence Rate of Training Recurrent Neural NetworksZeyuan Allen-Zhu · Yuanzhi Li · Zhao Song
SGD on Neural Networks Learns Functions of Increasing ComplexityDimitris Kalimeris · Gal Kaplun · Preetum Nakkiran · Benjamin Edelman · Tristan Yang · Boaz Barak · Haofeng Zhang
Towards Understanding the Importance of Shortcut Connections in Residual NetworksTianyi Liu · Minshuo Chen · Mo Zhou · Simon Du · Enlu Zhou · Tuo Zhao
Trivializations for Gradient-Based Optimization on ManifoldsMario Lezcano Casado
Using Statistics to Automate Stochastic OptimizationHunter Lang · Lin Xiao · Pengchuan Zhang
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic ModelGuodong Zhang · Lala Li · Zachary Nado · James Martens · Sushant Sachdeva · George Dahl · Chris Shallue · Roger Grosse
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient DescentJaehoon Lee · Lechao Xiao · Samuel Schoenholz · Yasaman Bahri · Roman Novak · Jascha Sohl-Dickstein · Jeffrey Pennington
Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual NetworksSpencer Frei · Yuan Cao · Quanquan Gu
Are deep ResNets provably better than linear predictors?Chulhee Yun · Suvrit Sra · Ali Jadbabaie
Efficient Rematerialization for Deep NetworksRavi Kumar · Manish Purohit · Zoya Svitkina · Erik Vee · Joshua Wang
Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural NetworksGuodong Zhang · James Martens · Roger Grosse
How to Initialize your Network? Robust Initialization for WeightNorm & ResNetsDevansh Arpit · Víctor Campos · Yoshua Bengio
Lookahead Optimizer: k steps forward, 1 step backMichael Zhang · James Lucas · Jimmy Ba · Geoffrey E Hinton
Global Convergence of Gradient Descent for Deep Linear Residual NetworksLei Wu · Qingcan Wang · Chao Ma
Piecewise Strong Convexity of Neural NetworksTristan Milne
PowerSGD: Practical Low-Rank Gradient Compression for Distributed OptimizationThijs Vogels · Sai Praneeth Karimireddy · Martin Jaggi
A Primal Dual Formulation For Deep Learning With ConstraintsYatin Nandwani · Abhishek Pathak · Mausam · Parag Singla
Surfing: Iterative Optimization Over Incrementally Trained Deep NetworksGanlin Song · Zhou Fan · John Lafferty
Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep LearningIgor Colin · Ludovic DOS SANTOS · Kevin Scaman