| Can SGD Learn Recurrent Neural Networks with Provable Generalization? | Zeyuan Allen-Zhu · Yuanzhi Li | 
| Input-Cell Attention Reduces Vanishing Saliency of Recurrent Neural Networks | Aya Abdelsalam Ismail · Mohamed Gunady · Luiz Pessoa · Hector Corrada Bravo · Soheil Feizi | 
| Input-Output Equivalence of Unitary and Contractive RNNs | Melikasadat Emami · Mojtaba Sahraee Ardakan · Sundeep Rangan · Alyson Fletcher | 
| Kernel-Based Approaches for Sequence Modeling: Connections to Neural Methods | Kevin Liang · Guoyin Wang · Yitong Li · Ricardo Henao · Lawrence Carin | 
| Legendre Memory Units: Continuous-Time Representation in Recurrent Neural Networks | Aaron Voelker · Ivana Kajić · Chris Eliasmith | 
| Non-normal Recurrent Neural Network (nnRNN): learning long time dependencies while improving expressivity with transient dynamics | Giancarlo Kerg · Kyle Goyette · Maximilian Puelma Touzel · Gauthier Gidel · Eugene Vorontsov · Yoshua Bengio · Guillaume Lajoie | 
| Reverse engineering recurrent networks for sentiment classification reveals line attractor dynamics | Niru Maheswaranathan · Alex Williams · Matthew Golub · Surya Ganguli · David Sussillo | 
| Root Mean Square Layer Normalization | Biao Zhang · Rico Sennrich | 
| Universal Approximation of Input-Output Maps by Temporal Convolutional Nets | Joshua Hanson · Maxim Raginsky |