Deep Learning · Attention Models

TitleAuthors
Are Sixteen Heads Really Better than One?Paul Michel · Omer Levy · Graham Neubig
Compositional De-Attention NetworksYi Tay · Anh Tuan Luu · Aston Zhang · Shuohang Wang · Siu Cheung Hui
Geometry-Aware Neural RenderingJoshua Tobin · Wojciech Zaremba · Pieter Abbeel
Image Captioning: Transforming Objects into WordsSimao Herdade · Armin Kappeler · Kofi Boakye · Joao Soares
Learning by Abstraction: The Neural State MachineDrew Hudson · Christopher Manning
Neural Shuffle-Exchange Networks - Sequence Processing in O(n log n) TimeKarlis Freivalds · Emīls Ozoliņš · Agris Šostaks
Novel positional encodings to enable tree-based transformersVighnesh Shiv · Chris Quirk
Self-attention with Functional Time Representation LearningDa Xu · Chuanwei Ruan · Evren Korpeoglu · Sushant Kumar · Kannan Achan
Understanding Attention and Generalization in Graph Neural NetworksBoris Knyazev · Graham W Taylor · Mohamed Amer