An Improved Analysis of Training Over-parameterized Deep Neural Networks | Difan Zou · Quanquan Gu |
Controlling Neural Level Sets | Matan Atzmon · Niv Haim · Lior Yariv · Ofer Israelov · Haggai Maron · Yaron Lipman |
Deep Equilibrium Models | Shaojie Bai · J. Zico Kolter · Vladlen Koltun |
Differentiable Cloth Simulation for Inverse Problems | Junbang Liang · Ming Lin · Vladlen Koltun |
Efficient and Accurate Estimation of Lipschitz Constants for Deep Neural Networks | Mahyar Fazlyab · Alexander Robey · Hamed Hassani · Manfred Morari · George Pappas |
Fine-grained Optimization of Deep Neural Networks | Mete Ozay |
Generalization Bounds of Stochastic Gradient Descent for Wide and Deep Neural Networks | Yuan Cao · Quanquan Gu |
On Learning Over-parameterized Neural Networks: A Functional Approximation Perspective | Lili Su · Pengkun Yang |
Stagewise Training Accelerates Convergence of Testing Error Over SGD | Zhuoning Yuan · Yan Yan · Rong Jin · Tianbao Yang |
Towards Explaining the Regularization Effect of Initial Large Learning Rate in Training Neural Networks | Yuanzhi Li · Colin Wei · Tengyu Ma |
You Only Propagate Once: Accelerating Adversarial Training via Maximal Principle | Dinghuai Zhang · Tianyuan Zhang · Yiping Lu · Zhanxing Zhu · Bin Dong |
Constrained deep neural network architecture search for IoT devices accounting for hardware calibration | Florian Scheidegger · Luca Benini · Costas Bekas · A. Cristiano I. Malossi |
Implicit Regularization of Discrete Gradient Dynamics in Linear Neural Networks | Gauthier Gidel · Francis Bach · Simon Lacoste-Julien |
In-Place Zero-Space Memory Protection for CNN | Hui Guan · Lin Ning · Zhen Lin · Xipeng Shen · Huiyang Zhou · Seung-Hwan Lim |
Large Scale Structure of Neural Network Loss Landscapes | Stanislav Fort · Stanislaw Jastrzebski |
Learning and Generalization in Overparameterized Neural Networks, Going Beyond Two Layers | Zeyuan Allen-Zhu · Yuanzhi Li · Yingyu Liang |
Limitations of the empirical Fisher approximation for natural gradient descent | Frederik Kunstner · Philipp Hennig · Lukas Balles |
Maximum Mean Discrepancy Gradient Flow | Michael Arbel · Anna Korba · Adil SALIM · Arthur Gretton |
On Lazy Training in Differentiable Programming | Lénaïc Chizat · Edouard Oyallon · Francis Bach |
Reducing the variance in online optimization by transporting past gradients | Sébastien Arnold · Pierre-Antoine Manzagol · Reza Babanezhad Harikandeh · Ioannis Mitliagkas · Nicolas Le Roux |
Tight Sample Complexity of Learning One-hidden-layer Convolutional Neural Networks | Yuan Cao · Quanquan Gu |
Understanding and Improving Layer Normalization | Jingjing Xu · Xu Sun · Zhiyuan Zhang · Guangxiang Zhao · Junyang Lin |
LCA: Loss Change Allocation for Neural Network Training | Janice Lan · Rosanne Liu · Hattie Zhou · Jason Yosinski |
Explaining Landscape Connectivity of Low-cost Solutions for Multilayer Nets | Rohith Kuditipudi · Xiang Wang · Holden Lee · Yi Zhang · Zhiyuan Li · Wei Hu · Rong Ge · Sanjeev Arora |
Leader Stochastic Gradient Descent for Distributed Training of Deep Learning Models | Yunfei Teng · Wenbo Gao · François Chalus · Anna Choromanska · Donald Goldfarb · Adrian Weller |
Learning Neural Networks with Adaptive Regularization | Han Zhao · Yao-Hung Hubert Tsai · Russ Salakhutdinov · Geoffrey Gordon |
Memory Efficient Adaptive Optimization | Rohan Anil · Vineet Gupta · Tomer Koren · Yoram Singer |
On the Convergence Rate of Training Recurrent Neural Networks | Zeyuan Allen-Zhu · Yuanzhi Li · Zhao Song |
SGD on Neural Networks Learns Functions of Increasing Complexity | Dimitris Kalimeris · Gal Kaplun · Preetum Nakkiran · Benjamin Edelman · Tristan Yang · Boaz Barak · Haofeng Zhang |
Towards Understanding the Importance of Shortcut Connections in Residual Networks | Tianyi Liu · Minshuo Chen · Mo Zhou · Simon Du · Enlu Zhou · Tuo Zhao |
Trivializations for Gradient-Based Optimization on Manifolds | Mario Lezcano Casado |
Using Statistics to Automate Stochastic Optimization | Hunter Lang · Lin Xiao · Pengchuan Zhang |
Which Algorithmic Choices Matter at Which Batch Sizes? Insights From a Noisy Quadratic Model | Guodong Zhang · Lala Li · Zachary Nado · James Martens · Sushant Sachdeva · George Dahl · Chris Shallue · Roger Grosse |
Wide Neural Networks of Any Depth Evolve as Linear Models Under Gradient Descent | Jaehoon Lee · Lechao Xiao · Samuel Schoenholz · Yasaman Bahri · Roman Novak · Jascha Sohl-Dickstein · Jeffrey Pennington |
Algorithm-Dependent Generalization Bounds for Overparameterized Deep Residual Networks | Spencer Frei · Yuan Cao · Quanquan Gu |
Are deep ResNets provably better than linear predictors? | Chulhee Yun · Suvrit Sra · Ali Jadbabaie |
Efficient Rematerialization for Deep Networks | Ravi Kumar · Manish Purohit · Zoya Svitkina · Erik Vee · Joshua Wang |
Fast Convergence of Natural Gradient Descent for Over-Parameterized Neural Networks | Guodong Zhang · James Martens · Roger Grosse |
How to Initialize your Network? Robust Initialization for WeightNorm & ResNets | Devansh Arpit · Víctor Campos · Yoshua Bengio |
Lookahead Optimizer: k steps forward, 1 step back | Michael Zhang · James Lucas · Jimmy Ba · Geoffrey E Hinton |
Global Convergence of Gradient Descent for Deep Linear Residual Networks | Lei Wu · Qingcan Wang · Chao Ma |
Piecewise Strong Convexity of Neural Networks | Tristan Milne |
PowerSGD: Practical Low-Rank Gradient Compression for Distributed Optimization | Thijs Vogels · Sai Praneeth Karimireddy · Martin Jaggi |
A Primal Dual Formulation For Deep Learning With Constraints | Yatin Nandwani · Abhishek Pathak · Mausam · Parag Singla |
Surfing: Iterative Optimization Over Incrementally Trained Deep Networks | Ganlin Song · Zhou Fan · John Lafferty |
Theoretical Limits of Pipeline Parallel Optimization and Application to Distributed Deep Learning | Igor Colin · Ludovic DOS SANTOS · Kevin Scaman |