A Linearly Convergent Proximal Gradient Algorithm for Decentralized Optimization | Sulaiman Alghunaim · Kun Yuan · Ali H Sayed |
Asymptotics for Sketching in Least Squares Regression | Edgar Dobriban · Sifan Liu |
DETOX: A Redundancy-based Framework for Faster and More Robust Gradient Aggregation | Shashank Rajput · Hongyi Wang · Zachary Charles · Dimitris Papailiopoulos |
Large-scale optimal transport map estimation using projection pursuit | Cheng Meng · Yuan Ke · Jingyi Zhang · Mengrui Zhang · Wenxuan Zhong · Ping Ma |
Locality-Sensitive Hashing for f-Divergences: Mutual Information Loss and Beyond | Lin Chen · Hossein Esfandiari · Gang Fu · Vahab Mirrokni |
Massively scalable Sinkhorn distances via the Nyström method | Jason Altschuler · Francis Bach · Alessandro Rudi · Jonathan Niles-Weed |
On the Global Convergence of (Fast) Incremental Expectation Maximization Methods | Belhal Karimi · Hoi-To Wai · Eric Moulines · Marc Lavielle |
Optimal Sparsity-Sensitive Bounds for Distributed Mean Estimation | zengfeng Huang · Ziyue Huang · Yilei WANG · Ke Yi |
Qsparse-local-SGD: Distributed SGD with Quantization, Sparsification and Local Computations | Debraj Basu · Deepesh Data · Can Karakus · Suhas Diggavi |
Random Projections with Asymmetric Quantization | Xiaoyun Li · Ping Li |
Re-randomized Densification for One Permutation Hashing and Bin-wise Consistent Weighted Sampling | Ping Li · Xiaoyun Li · Cun-Hui Zhang |
Robust and Communication-Efficient Collaborative Learning | Amirhossein Reisizadeh · Hossein Taheri · Aryan Mokhtari · Hamed Hassani · Ramtin Pedarsani |
Sampled Softmax with Random Fourier Features | Ankit Singh Rawat · Jiecao Chen · Felix Xinnan Yu · Ananda Theertha Suresh · Sanjiv Kumar |
Extreme Classification in Log Memory using Count-Min Sketch: A Case Study of Amazon Search with 50M Products | Tharun Kumar Reddy Medini · Qixuan Huang · Yiqiu Wang · Vijai Mohan · Anshumali Shrivastava |
Sliced Gromov-Wasserstein | Vayer Titouan · Rémi Flamary · Nicolas Courty · Romain Tavenard · Laetitia Chapel |
SySCD: A System-Aware Parallel Coordinate Descent Algorithm | Nikolas Ioannou · Celestine Mendler-Dünner · Thomas Parnell |
GPipe: Efficient Training of Giant Neural Networks using Pipeline Parallelism | Yanping Huang · Youlong Cheng · Ankur Bapna · Orhan Firat · Dehao Chen · Mia Chen · HyoukJoong Lee · Jiquan Ngiam · Quoc V Le · Yonghui Wu · zhifeng Chen |